Rooting Out Racial Bias in Health Care AI, Part 1

Artificial intelligence could revolutionize health care. It could also perpetuate and exacerbate generations of racial inequities. In the first of a two-part series, we explore the challenge of diagnosing bias in AI and what one health system is trying to do about it.

Tradeoffs coverage on Diagnostic Excellence is supported in part by the Gordon and Betty Moore Foundation.

Episode Transcript and Resources

Episode Transcript

Dan Gorenstein: Lots of people are talking about artificial intelligence — or AI — in health care these days.

Clip: An algorithm that is able to tell who might be at risk for lung cancer.
Clip: The hope? That AI will soon be able to provide real-time health care recommendations.
Clip: But will this new technology replace the need for humans in the hospital altogether?

DG: Hospitals are using these AI tools to help clinicians diagnose breast cancer, read X-rays and predict which patients need more care. There’s growing excitement that these tools can make health care better, but there’s also a risk that these powerful new tools will perpetuate long-standing inequities.

Mark Sendak: If you mess this up, you can really, really harm people by entrenching systemic racism further into the health system.

DG: Bias in AI is complex and important, so we’re devoting back-to-back episodes to the issue. Today, in Part 1, the challenge of diagnosing racial bias in AI and what one health system is trying to do about it.

From the studio at the Leonard Davis Institute at the University of Pennsylvania, I’m Dan Gorenstein. This is Tradeoffs.

****

DG: Before we get started, a definition: When we talk about artificial intelligence, or AI, we’re talking about a computer program trained to perform tasks normally done by humans. And a growing number of hospitals are using AI to improve care for their patients and make life easier for their staff — people like Emily Sterrett.

Emily Sterrett: I am a pediatric emergency medicine physician at Duke University Hospital.

There are days that I never sit down, constantly moving from patient to patient to patient and my phone is ringing and ambulances are coming in.

DG: Actually, Emily loves it.

ES1: I enjoy going fast. That is me. (laugh)

DG: Sure, she craves the rush of making quick decisions, but she’s obsessed with finding ways to deliver better care, especially trying to prevent her young patients from getting sepsis.

ES: I have a constant worry for 8 hours nonstop when I’m on shift that I’m missing sepsis.

DG: Sepsis happens when the body overreacts to an infection and starts attacking its own organs. At least 1.7 million people in the U.S. develop sepsis every year, including about 75,000 kids. Data show about 7,000 of them die. Sepsis can be treated effectively with antibiotics, but speed is critical. Studies in adults show that delays of just hours can significantly increase the risk of death. The challenge for Emily and every other emergency room doc is diagnosing who has sepsis. Common early symptoms include high fever, high heart rate and high white blood cell count.

1ES: Some children will have this response with a common cold and they need some Tylenol and popsicle and a chicken soup. However, for other children, they actually have a life threatening process going on, and it is very hard to know the difference between the two.

DG: Emily knows it’s a cliche, but she says it’s like looking for a needle in a haystack — a constant vigilance that wears on her and her colleagues day after day. Even worse are the days when she misses the needle — or spots it too late — and a child dies.

ES: The most common question I have been asked by parents whose child has just died of sepsis is, what could I have done differently? How is it possible that my child is dead? They were fine yesterday. I need them in that moment to know that they were loving parents who were doing the best they knew how to do for their child. All that said, I still carry the guilt with me about not acting fast enough, not recognizing fast enough, not fixing a child fast enough.

DG: These conversations haunt Emily, worried that she sends kids home with a popsicle and chicken soup totally unaware of the sepsis attacking their young bodies. Emily loves the idea of an algorithm constantly taking in data, searching for those haystack needles, instead of her.

ES: When it’s a child’s life on the line, having a backup system that AI could offer to bolster some of that human fallibility is really, really important.

DG: Still, Emily thought that kind of AI surveillance system was a pipe dream, a unicorn at the end of the rainbow. She doubted a computer could do something as hard and delicate as diagnosing sepsis in kids. But more and more, data scientists are convinced they’ve found that unicorn. It’s a subset of artificial intelligence known as “machine learning.” In machine learning, scientists “train” algorithms by feeding a bunch of information — test results and diagnoses, for example — into a computer program that uses that data to predict future outcomes.

MS: Machine learning can take large amounts of data and can identify patterns and trends that would not be intuitive to any human looking at the data.

DG: Mark Sendak works at Duke Health where he’s one of the head data scientists for the Duke Institute for Health Innovation. Mark says Duke leans on machine learning algorithms to streamline scheduling, diagnose kidney disease and identify cancerous tumors. His team’s developed more than 20 AI tools for Duke. In the fall of 2019, Mark and his team started working with Emily to develop a machine learning algorithm to do what Emily had always thought was impossible: help clinicians get a jump on the kids most likely to become septic.

ES: I was skeptical to start. Because I knew it would be really hard.

DG: Mark said the algorithm — sometimes called a “model” — would work like this:

MS: We train models that ingest all of the relevant clinical signs and indicators.

DG: Vital signs, blood work, organ failure from kids previously treated for sepsis at Duke. The algorithm uses those clinical signs to figure out what kid is most likely to be septic.

MC: And then every 15 minutes the algorithm asks the question for every patient does this kid have sepsis or not.

DG: Mark and Emily agreed if they could pull this off, it could revolutionize the treatment of childhood sepsis and potentially save thousands of lives a year.

MS: What we’re trying to do is provide the best, most accurate, most timely diagnosis for pediatric sepsis to get kids treatment. And that’s what we should be aspiring for.

DG: It’s just one example of why enthusiasm for machine learning is exploding in health care. But AI in the industry is still in its early days. It’s not clear how many hospitals are using these tools in patient care, and while interest is certainly high, several experts told us they believe adoption is still limited. That might be a good thing because along with the potential power of AI to improve health care, there’s also a risk that it could replicate racial biases — and even make them worse.

How Duke is trying to protect its sepsis algorithm from bias, after the break.

MIDROLL

DG: Welcome back.

When Mark Sendak and Emily Sterrett first met at a coffee house in Durham, North Carolina in the fall of 2019 to talk about a machine learning algorithm for childhood sepsis, Mark knew they needed to be careful. He had just read a groundbreaking paper that demonstrated how easy it was for bias to creep into computer algorithms like the one they were dreaming up for sepsis.

MS: It kind of made you hold your breath of like, shit. If you mess this up or don’t pay attention to the right things, you can really, really harm people.

DG: The study that shook Mark up was published in the journal Science. Researchers found that an algorithm used to predict health needs for more than 100 million people was biased against Black patients. The algorithm relied on how much people spent on health care to predict future health care needs, but because Black patients historically had less access to care, they often spent less. Under this algorithm, Black patients had to be much sicker to be recommended for extra care.

The study drove home a simple but profound point to Mark: Data may seem neutral, even fair, but it can be biased. People of color, for example, are often underrepresented in datasets that train AI. Research shows clinicians often provide different care to white and non-white patients. Those differences have been cemented into the data.

MS: When you learn from the past, you replicate the past. You further entrench the past. Because you take existing inequities and you treat them as the aspiration for how health care should be delivered.

DG: Mark and other experts we spoke with said the paper in Science is one of the few documented examples of AI harming patients. Mark says it takes a lot of time and money to uncover bias like this, and there are few incentives for developers or health systems to look for it. But there’s a consensus among health care researchers and data scientists who study AI that the potential for bias to get baked into AI is real, and it is dangerous.

Mark likes to use a warning often attributed to Mark Twain:

22:05: “History never repeats itself, but it often rhymes.” The problem with AI is that history does repeat itself. Like you’re essentially walking where there’s land mines, and your stuff’s going to blow up and it’s going to hurt people.

DG: So when Mark and his team started building the childhood sepsis algorithm, they walked very carefully. They spent a month working with Emily to teach the algorithm to identify sepsis based on clinical tools — vital signs and lab tests — instead of using easily accessible but often incomplete billing data.

MS: It’s like almost 50 to 100 times more complex to define sepsis using vitals and labs.

DG: After every tweak they made, the team tested the program to see if it found sepsis equally well in patients of different races and ethnicities — another time-consuming landmine avoided. In June 2022, the team unveiled the first draft to Emily.

ES: Mark’s team was able to show me a dashboard of every child in the hospital who had sepsis. It was so much more sophisticated and complex than I could almost wrap my brain around.

DG: The algorithm wasn’t yet predicting which kids would get sepsis, but it was identifying which patients were septic in real time. Emily could now picture a world where she could worry about sepsis a little less. She could see how this algorithm could add to her own capacity and improve care for her patients.

ES: When we started putting names and dates and locations to where sepsis was happening, that was a breakthrough moment.

DG: The team felt like they were within spitting distance of finalizing an algorithm that could save kids’ lives. And then, an unexpected meeting that fall threw all their painstaking efforts to elude bias into doubt. Duke researcher Ganga Moorthy had found that doctors took much longer to order critical blood tests for Hispanic kids eventually diagnosed with sepsis than white kids — the kind of delay that could be deadly.

Ganga Moorthy: One of my major hypotheses was that physicians were taking illnesses in white children perhaps more seriously than those of Hispanic children. Or perhaps there were delays in time to interpreters.

DG: Ganga logged onto the meeting hoping that Mark’s algorithm could identify the source behind the delays and hopefully eliminate them. That, after all, is part of the dream with AI, that computers will catch what humans miss and protect against clinicians’ inevitable biases and shortcomings.

GM: I came into it being like, Cool. We have this computer algorithm that is probably smarter than us, that really looks at mostly objective data, that puts everyone on a little bit more of a level playing field of saying that if your heart rate is elevated, it doesn’t matter if you need an interpreter, it doesn’t matter what your race, your ethnicity is. You could be could be at risk for sepsis and should be given attention.

DG: As Ganga worked through her slide listing the possible sources of delay — time to interpreter, provider bias — Mark’s eyes got wider and wider.

MS: I probably had my hands on my face within like seconds of seeing that slide just like, oh my God. We totally missed all of these subtle things that if any one of these was consistently true could introduce bias into the algorithm.

DG: As Ganga continued, Mark’s mind was racing. Imagine, he thought: Two patients arrive in the ER at exactly the same time, with exactly the same symptoms. Both septic. One family speaks English, the other Spanish. While medically identical, Ganga was telling him it might take doctors 60-120 minutes longer to order tests, diagnose sepsis and begin treatment.

MS: That would mean that the kid whose care was delayed would look like they have sepsis 2 hours later than the kid whose care wasn’t delayed.

DG: In that instant, Mark realized there was a chance the algorithm would think it takes Hispanic kids longer to develop sepsis. In other words, hard coding the delay Ganga found into data that Mark thought his team scrubbed clean of bias.

MS: I was angry with myself because I know that it is easy to put something out there that can cause massive harm. How could we not see this?

DG: Bias in AI is obviously a problem bigger than Duke. Researcher and soon-to-be University of Minnesota assistant professor Paige Nong says as interest in health care AI accelerates, the focus on diagnosing bias is falling behind. Paige interviewed data officials last year at 13 academic medical centers. Of the 13, only four told Paige they consider racial equity when developing or vetting machine learning algorithms just like Duke’s.

Paige Nong: What I was hearing in my interviews was that if a particular leader at a hospital or a health system happened to be personally concerned about racial inequity, then that would inform how they thought about AI in their health care system. But there was nothing structural, there was nothing at the regulatory or policy level that was requiring them to think or act that way.

DG: Paige’s research has its limitations: It hasn’t been peer reviewed and she only talked with a handful of hospitals. Yet, it’s one of the first studies of its kind, and it matches what we heard through our own reporting: A lot of hospitals are psyched about AI, but few think deeply about how to identify potential bias. Leaders in the space like Duke and the Mayo Clinic have formed national industry coalitions to develop and share best practices. And researchers have developed “playbooks” for combating bias. Paige considers this a start, but wildly insufficient.

Nong: What we need is regulation. We need policies that address this problem and make racial equity a priority for all the hospitals in the United States, not just the ones that have individual leaders invested in racial equity.

DG: The good news, Paige says, is racial bias in AI is beginning to get attention from federal health officials. They see how even the most well-resourced, well-intentioned health systems like Duke are struggling to diagnose bias in their algorithms. And they recognize there’s more attention needed to keep AI from doing more harm than good.

Next week, in Part 2 of our story, we’ll hear what three federal agencies are doing to keep racial bias from getting to the bedside. And how Mark Sendak and Emily Sterrett tried to keep their childhood sepsis algorithm from succumbing to this newfound bias.

I’m Dan Gorenstein, this is Tradeoffs.

Episode Resources

Selected Reporting on AI and Racial Bias:

A research team airs the messy truth about AI in medicine — and gives hospitals a guide to fix it (Casey Ross, STAT News, 4/27/2023)
How Doctors Use AI to Help Diagnose Patients (Sumathi Reddy, Wall Street Journal, 2/28/2023)
How Hospitals Are Using AI to Save Lives (Laura Landro, Wall Street Journal, 4/10/2022)
From a small town in North Carolina to big-city hospitals, how software infuses racism into U.S. health care (Casey Ross, STAT News, 10/13/2020)
Widely used algorithm for follow-up care in hospitals is racially biased, study finds (Shraddha Chakradhar, STAT News, 10/24/2019)
Selected Research and Analysis on AI and Racial Bias:
Predictive Accuracy of Stroke Risk Prediction Models Across Black and White Race, Sex, and Age Groups (Chuan Hong, Michael J. Pencina, Daniel M. Wojdyla, Jennifer L. Hall, Suzanne E. Judd, Michael Cary, Matthew M. Engelhard, Samuel Berchuck, Ying Xian, Ralph D’Agostino Sr, George Howard, Brett Kissela and Ricardo Henao; JAMA; 1/24/2023)
Responsible AI: Fighting Bias in Healthcare (Chris Hemphill, Actium Health, 4/18/2022)
Underdiagnosis bias of artificial intelligence algorithms applied to chest radiographs in under-served patient populations (Laleh Seyyed-Kalantari, Haoran Zhang, Matthew B. A. McDermott, Irene Y. Chen and Marzyeh Ghassemi; Nature Medicine; 12/10/2021)
Racial Bias in Health Care Artificial Intelligence (NIHCM, 9/30/2021)
Artificial Intelligence in Healthcare: The Hope, The Hype, The Promise, The Peril (National Academy of Medicine, 2019)
Dissecting racial bias in an algorithm used to manage the health of populations (Ziad Obermeyer, Brian Powers, Christine Vogeli and Sendhil Mullainathan; Science; 10/25/2019)
Selected Best Practices to Mitigate Racial Bias in AI:
Organizational Governance of Emerging Technologies: AI Adoption in Healthcare (Health AI Partnership, 5/10/2023)
Blueprint for Trustworthy AI: Implementation Guidance and Assurance for Healthcare (Coalition for Health AI, 4/4/2023)
Preventing Bias and Inequities in AI-Enabled Health Tools (Trevan Locke, Valerie J. Parker, Andrea Thoumi, Benjamin A. Goldstein and Christina Silcox; Duke Margolis Center for Health Policy; 7/6/2022)
Algorithmic Bias Playbook (Ziad Obermeyer, Rebecca Nissan, Michael Stern, Stephanie Eaneff, Emily Joy Bembeneck and Sendhil Maullainathan; Chicago Booth Center for Applied Artificial Intelligence; June 2021)
Ensuring Fairness in Machine Learning to Advance Health Equity (Alvin Rajkomar, Michaela Hardt, Michael D. Howell, Greg Corrado and Marshall H. Chin; Annals of Internal Medicine; 12/18/2018)

Episode Credits

Guests:

Emily Sterrett, MD, Associate Professor of Pediatrics, Director of Improvement Science, Duke University School of Medicine Department of Pediatrics
Mark Sendak, MD, MPP, Population Health & Data Science Lead, Duke Institute for Health Innovation
Ganga Moorthy, MD, Global Health Fellow, Duke Pediatric Infectious Disease Program
Paige Nong, PhD Candidate, University of Michigan School of Public Health

The Tradeoffs theme song was composed by Ty Citerman, with additional music this episode from Blue Dot Sessions and Epidemic Sound.

This episode was reported by Ryan Levi, edited by Dan Gorenstein and Cate Cahan, and mixed by Andrew Parrella and Cedric Wilson.

Special thanks to: Suresh Balu, Jeff Smith and Jordan Everson.

Additional thanks to: Julia Adler-Milstein, Brett Beaulieu-Jones, Bennett Borden, David Dorr, Malika Fair, Sara Gerke, Marzyeh Ghassemi, Maia Hightower, John Halamka, Chris Hemphill, John Jackson, Jen King, Elaine Nsoesie, Ziad Obermeyer, Michael Pencina, Yolande Pengetnze, Deb Raji, Juan Rojas, Keo Shaw, Mona Siddiqui, Yusuf Talha Tamer, Ritwik Tewari, Danny Tobey, Alexandra Valladares, David Vidal, Steven Waldren, Anna Zink, James Zou, the Tradeoffs Advisory Board and our stellar staff!

Rooting Out Racial Bias in Health Care AI, Part 1

Episode Transcript and Resources

MIDROLL

More from Tradeoffs

Ryan LeviManaging Editor