'Rooting Out Racial Bias in Health Care AI, Part 2' Transcript
June 1, 2023
Note: This transcript has been created with a combination of machine ears and human eyes. There may be small differences between this document and the audio version, which is one of many reasons we encourage you to listen to the episode!
Dan Gorenstein: Doctors, data scientists and hospital executives believe *artificial intelligence may help solve what until now have been intractable problems.
Emily Sterrett: When it’s a child’s life on the line, having a backup system that AI could offer is really, really important.
DG: But with these new possibilities come new risks, namely the threat of perpetuating racial bias.
Mark Sendak: When you learn from the past, you replicate the past. You further entrench the past.
DG: In Part 1 of our special two-part series on racial bias in AI, we met a team at Duke Health that built a computer program to help diagnose kids with sepsis, but was blindsided when they realized all that hard work could delay care for Hispanic children.
MS: I was angry with myself because I know that it is easy to put something out there that can cause massive harm.
DG: Today, in Part 2, Duke’s efforts to scrub bias from their program, and what the Biden administration is doing to push the health care world to diagnose and reduce racial bias in AI.
From the studio at the Leonard Davis Institute at the University of Pennsylvania, I’m Dan Gorenstein. This is Tradeoffs.
******
DG: If you missed Part 1, go back and listen to that first. It should be right below this episode in your feed.
Today, I’m joined by Tradeoffs producer Ryan Levi to talk about the federal government’s response to racial bias in health care AI. Ryan, how you doing, man?
Ryan Levi: Doing well, Dan.
DG: So, look, before we get going, can you just spell out for us what artificial intelligence actually is?
RL: Sure, so AI broadly means a computer program that’s trained to perform tasks normally done by humans. And we’re mostly going to be talking about a subset of AI called “machine learning” where those computer programs are trained to find patterns in huge amounts of data like test results and patients’ bills.
DG: Got it, great. So you’ve spent the last several months talking with federal officials, legal experts, data scientists and hospital folks to get a handle on how they’re all trying to keep racial bias from the bedside. What have you learned?
RL: Well, there’s this telling phrase that keeps coming up, Dan.
Carmel Shachar: Wild west.
Mark Sendak: The wild, wild west.
Paige Nong: It’s the wild west.
RL: I know you hate a cliche.
DG: True, true.
RL: But this one feels pretty on the mark. Using artificial intelligence to do everything from streamlining billing to helping diagnose cancer is a new frontier. And like the wild west, there’s this lawless feeling to it right now. There are few rules especially when it comes to preventing developers from baking racial inequity into their algorithms.
DG: But you’ve been talking with federal regulators who are hoping to change that. I’d like to know what these agencies are doing to make it easier to diagnose racial bias in health care algorithms. So, one of my favorite questions as you know: Who is doing what?
RL: Well, there are three agencies tackling different pieces of this puzzle — three sheriffs, if you will, trying to impose some order in this AI wild west.
DG: Dear lord, Ryan. Okay, fine, I will go along with it. So there are three sheriffs.
RL: That’s right, and each is responsible for tackling a different part of the AI bias problem. We’ve got the Food and Drug Administration or FDA. They regulate the developers who actually make these algorithms — big companies like GE and MedTronic. Then there’s the Office of the National Coordinator for Health Information Technology.
DG: A brutal name for a super important agency.
RL: Absolutely. They go by ONC for short. Their jurisdiction is the electronic health record, which is really integral to how these AI tools operate a lot of times. And finally, we’ve got the Department of Health and Human Services’ Office for Civil Rights, known as OCR. Their job is to make sure that clinicians, health systems and insurers aren’t discriminating against patients.
DG: Right, three regulators, excuse me, sheriffs: FDA, ONC and the Office for Civil Rights. Who should we start with?
RL: I’m going to say FDA. They’re the old timers of this group or at least as much of an old-timer as you can have in AI. They approved their first AI-powered device back in 1995.
DG: Wow. I had no idea AI had been kicking around in health care for that long, Ryan. I thought it was just more recent tech from this ChatGPT era.
RL: Yeah, in its capacity to review and approve medical devices, the FDA has actually greenlit more than 500 AI devices over the past 25-plus years, the vast majority of them in radiology, Dan. And it’s the FDA’s job to make sure the algorithms it reviews are safe and effective, just like the agency does for new prescription drugs and other medical devices.
DG: Okay, so what does that look like, Ryan, when it comes to racial bias?
RL: The agency told me that when an algorithm comes to them for review, they take a close look at the data used to build it. And this is where bias can come in. Remember, machine learning algorithms are fed historical data, like test results and diagnoses, and that helps the algorithms predict future outcomes. Because people of color are often underrepresented in those data sets, that can cause problems. So the FDA looks to make sure the data underpinning an algorithm is diverse enough, and they also ask developers to explain the steps they took to mitigate bias. The agency also says it’s working with academic researchers to come up with better ways to deal with this.
DG: I hate to ruin your metaphor, Ryan, but that doesn’t sound very wild west-y. It sounds like the FDA is pretty focused on this.
RL: Fair point, fair point. But when you talk to the researchers and clinicians working on health care AI, they say that the FDA has a lot more work to do.
Minerva Tantoco: Regulations have always lagged behind technology advances.
RL: Minerva Tantoco is the Chief AI Officer at the New York University McSilver Institute for Poverty Policy and Research. She, along with developers and academics who study this space, say they’d like the FDA to establish public guidelines that spell out what developers must do to prove their AI tools are unbiased.
MT: By setting the standard, the FDA can signal that this needs to be part of your design. The folks who are developing these tools, they’re going as quickly as possible. They’re taking the data they can access. But unless the FDA specifically says we are going to test for racial and ethnic bias, they won’t necessarily account for that.
RL: Minerva and other experts I spoke to told me they’re glad — really glad — the FDA is stepping up its efforts on bias, but they feel like the FDA is missing stuff right now. And setting this kind of public standard — becoming a stricter sheriff if you will — would help crack down on the lawless feeling out there.
DG: Is there any proof, Ryan, that the FDA is actually missing bias in these tools?
RL: Well, several sources told me hospitals have algorithms up and running right now, today, that did not go through any FDA review, much less one for bias. And as for the tools that are reviewed by the FDA, an investigation by STAT News and a study by researchers at Stanford, both in 2021, found that the FDA’s approach to looking at race in AI-devices was really uneven — sometimes asking for information, but often not. And that really captured the concerns I heard, Dan, around the FDA not having these public requirements for what developers have to do or have to show to prove their algorithms are unbiased.
DG: So while the FDA is taking direct steps, it sounds like there are still blind spots and a hunger for more regulation, more safeguards to beef up trust in these AI tools?
RL: Exactly.
DG: So that’s FDA. Let’s move onto our second sheriff in the AI bias wild west, the Office of the National Coordinator for Health IT or ONC. What part of the AI bias world are they focused on?
RL: ONC is also regulating developers like FDA, but for ONC, it’s only when it comes to the electronic health record where a lot of these clinical algorithms end up.
DG: And since just about every hospital and doctor’s office in the country uses electronic health records at this point, ONC has jurisdiction over pretty much the whole health care system.
RL: Absolutely. And their strategy comes down to this:
Kathryn Marchesini: Transparency
Jordan Everson: Transparency
Jeff Smith: Transparency
RL: That was Kathryn Marchesini, Jordan Everson and Jeff Smith, who are all top officials at ONC and helped draft some new regulations the agency just put out. And ONC is betting transparency is a first step to rooting out bias. You know, these algorithms are often a black box, Dan. They’re technically complex and developers fiercely guard their intellectual property, which makes it really hard for hospitals to know whether the AI they’re buying is biased. So you know how the FDA, at least sometimes, evaluates the data developers use to build their algorithms?
DG: I’m with you.
RL: Well, ONC wants to make sure the end-users — hospitals, doctors, nurses — also know what that data looks like. In April, ONC proposed a rule that would require developers to let clinicians see two things: One, what kind of data was used to build an algorithm. And two, what steps the developer took to ensure the final product was unbiased. Kathryn Marchesini at ONC says they think of it like a nutrition label.
KM: What are the ingredients used to make the algorithm? Who was represented in the data that was being used? So the demographic type of information. Where did that data come from? What type of data set?
RL: ONC hopes making developers share an ingredient list will help hospitals and clinicians decide if an algorithm is unbiased enough — really, safe enough — to use.
DG: I love that nutrition label metaphor, Ryan. It helps make ONC’s approach really clear. At the same time, it makes me wonder how effective it’s going to be. How many people in a hospital or doctor’s office know enough about all this stuff to evaluate whether an algorithm is biased or not?
RL: So funny you should say that. Carmel Shachar asked the same thing when I talked with her about this new rule. Carmel studies the regulation of digital health tools at the Petrie-Flom Center at Harvard Law School.
CS: Transparency is really good and important, but we can’t rely solely on transparency because clinicians might look at this and say, okay, that’s kind of cool and interesting. But again, I’m not a data scientist, so what am I supposed to do here?
RL: Carmel says what would be really helpful is having this nutrition label, this ingredient list out there publicly so everybody could kick the tires on these algorithms.
DG: Everybody like who?
RL: Anyone, Dan. Data scientists, academics, independent researchers. Carmel says they should all be able to evaluate algorithms and root out hidden bias.
CS: The more people who are keeping an eye out, the more likely it is if there’s something problematic, somebody’s going to notice something and say something.
RL: ONC told me, Dan, that the agency is open to future regulation that would require developers to share this information publicly.
DG: Thanks, Ryan. When we come back, we hear about our last sheriff and learn how Duke tried to keep a last-minute bias out of their new childhood sepsis algorithm.
MIDROLL
DG: Welcome back. I’m here with Tradeoffs producer Ryan Levi, who is walking us through how three different federal agencies are trying to make it easier to diagnose and eliminate racial bias in clinical AI tools. Alright, Ryan, the sun’s going down here on your wild west which means it’s time for our last sheriff to make an appearance: the Office for Civil Rights at HHS.
RL: Right, OCR. They are legitimately a law enforcement agency, by the way, so the whole sheriff thing is a little less far fetched with them.
DG: Good, good to hear.
RL: Anyway, OCR enforces federal health care anti-discrimination laws, and OCR Director Melanie Fontes Rainer told me those laws already cover discrimination by algorithms.
Melanie Fontes Rainer: We have open investigations in this space right now looking at use of predictive analytics and possible violation of federal civil rights laws and discrimination.
RL: While these laws, Dan, broadly prohibit health care providers from discriminating, there was nothing that explicitly said, “it’s illegal to use an algorithm to discriminate.” Period. So in August 2022, Melanie says OCR proposed a rule that said hospitals, physician practices and insurers…
MFR: must not discriminate on the basis of race, color, national origin, sex, age or disability through the use of clinical algorithms and decision making.
DG: So that’s interesting, Ryan. OCR proposed a regulation to ban something that is already against the law? Why get more specific if they already have this broader authority?
RL: Melanie says the point is to send a clear and obvious signal to hospitals and doctors that if they use an algorithm that is biased against their patients, they are breaking the law and they are responsible.
MFR: To make sure that they’re aware that this isn’t just, you know, buy a product off the shelf, close your eyes and use it. It’s another tool we have to regulate to make sure that they’re not using it in some way that furthers disparities.
RL: The other sheriffs we’ve talked about, Dan — FDA and ONC — they’re focused on making sure developers have done everything they can to rid an algorithm of bias by the time it gets to the hospital. What OCR is saying is providers also have a role to play. Hospitals and clinicians have to be prepared to spend the time and money to make sure biased algorithms aren’t getting to the bedside, or they could face fines and be forced to make changes.
DG: So what will hospitals actually have to do if this rule is finalized?
RL: It’s a good question, and we don’t quite know yet. Melanie made it very clear that even without this rule, hospitals can not discriminate using algorithms. That’s against the law. But just like with the FDA, Dan, there’s little guidance here on what exactly hospitals have to do to stay on the right side of the law. And given the immense investment and expertise required to make sure algorithms are unbiased, Harvard’s Carmel Shachar is worried the OCR rule could have unintended consequences.
CS: What we don’t want is for the rule to be so scary that physicians say, okay, I just won’t use any AI in my practice. I just don’t want to run the risk.
RL: And, Dan, Carmel worries in particular about smaller, less resourced hospitals that don’t have a team of data scientists like they have at Duke to do this work. Melanie at the Office for Civil Rights downplayed this concern. She says her office isn’t knocking on every hospital door looking for biased algorithms. They’re really zeroed in on providers with a pattern of discrimination. And she says OCR’s goal is to help a hospital stop discriminating without having to issue fines or other penalties.
DG: Thanks for summing up how these three different regulatory sheriffs — FDA, ONC and OCR — are trying to bring some more order to this world, Ryan. I’m curious, at the end of day, did the developers, clinicians and researchers you talked with think these efforts will help get rid of bias in health care algorithms?
RL: It’s a little hard to say. On one hand, pretty much everyone I talked with including the hospitals want more regulation. They’re excited to see the sheriffs in town paying attention to this. But they also say they need more guidance — clear guidance — on how to keep bias out of clinical AI, and importantly, they say they need more money to actually do that. I thought Mark Sendak, who helps develop AI tools for Duke Health and who we heard from in Part 1 of the series, articulated this concern really well.
MS: We’re hearing very loud and clear that we have to eliminate bias from algorithms. But I think what we’re not hearing is a regulator say, we understand the resources that it takes to identify these things, to monitor for these things, and we’re going to make investments to make sure that we address this problem.
RL: What Mark’s talking about there, Dan, are federal investments. The last really big tech change in health care was the implementation of electronic health records, and the federal government spent $35 billion to entice and help providers adopt EHRs. Not one of the regulatory proposals I’ve seen around algorithms and bias mention anything about incentives or assistance like that.
DG: Tradeoffs producer Ryan Levi, thanks, as always, sir, for your reporting.
RL: Anytime, partner.
DG: Oh my, okay. Off into the sunset he goes.
At the end of Part 1, we left Mark Sendak and the team at Duke a bit stunned. It was late 2022, and one of their colleagues had just told them that it often takes an hour or two longer for Hispanic kids at Duke to get checked out for sepsis compared to white kids — delays potentially caused by language barriers or provider bias. Mark realized they had unknowingly baked this delay into the data he and his team had so scrupulously tried to scrub clean.
MS: That would mean that the kid whose care was delayed would look like they have sepsis two hours later than the kid whose care wasn’t delayed.
DG: That kind of delay can be deadly with sepsis. Mark was disappointed with himself for missing something that now seemed so obvious like non-English speaking patients getting slower care. But Mark was also grateful for the chance to do something about it. He and his team dug in. They spent another two months trying to make sure their algorithm caught sepsis in Hispanic kids as quickly as all other kids.
MS: We did a ton of analysis and we looked at, okay, how would each of these items be represented in the data? How would we test what the impact of each of these is in the potential delays in care?
DG: Ultimately, the team determined that the algorithm was not predicting sepsis later for Hispanic patients. Mark says that’s likely because Duke treats a relatively small number of kids with sepsis, so there wasn’t enough data to tip the algorithm. Mark was relieved, and it cemented in his mind just how easy it is for bias to slip into AI.
MS: I don’t find it comforting that in one specific rare case, we didn’t have to intervene to prevent bias. Every time you become aware of a potential flaw there’s that responsibility of, where else is this happening?
DG: Research shows Hispanic patients often wait longer for care in the emergency department. Mark says that means algorithms in use right now are diagnosing Hispanic patients for health conditions slower than other patients.
MS: And I can’t tell you what it is, but I’m sure it’s already causing harm to Spanish-speaking families.
DG: Duke’s pediatric sepsis algorithm is going through its final tests and hospital approvals, on track to switch on this summer. After that, Mark and physician Emily Sterrett hope to share it with other hospitals.
ES: The impact of us being able to figure this out and share it with other larger institutions is huge. If this works, it would truly revolutionize the job of a pediatric emergency medicine physician.
DG: That potential — to better diagnose this tricky condition and save kids’ lives — that’s why Emily and Mark and so many in health care are excited about AI. But to do more good than harm, Mark says, is going to take what often seems in short supply: devotion to addressing this racial bias in algorithms and the broader health care system.
MS: You have to look in the mirror. It’s very introspective, and it requires you to ask hard questions of yourself, of the people that you’re working with, of the organizations you’re a part of. Because if you’re actually looking for bias in algorithms, the root cause of a lot of the bias is inequities in care.
DG: Mark believes developers must work closely with front-line clinicians like Emily. And he sees now, he needs to build a more diverse team: anthropologists and sociologists, patients and community members — enough different folks to think through how bias could sneak into this groundbreaking work. It’s a lot, more than any of the proposed regulations would require. But necessary, Mark says, to help make sure these tools leave our biases behind. I’m Dan Gorenstein, this is Tradeoffs.
Tradeoffs coverage on Diagnostic Excellence is supported in part by the Gordon and Betty Moore Foundation.
Want more Tradeoffs? Sign up for our weekly newsletter!
Episode Resources
Selected Reporting, Research and Analysis on Regulation of AI and Racial Bias:
AI leaders issue a plea to Congress: Regulate us, and quickly (Casey Ross, STAT News, 5/16/2023)
Congress wants to regulate AI, but it has a lot of catching up to do (Claudia Grisales, NPR, 5/15/2023)
‘Nutrition Facts Labels’ for Artificial Intelligence/Machine Learning-Based Medical Devices—The Urgent Need for Labeling Standards (Sara Gerke, George Washington Law Review, 4/18/2023)
Artificial Intelligence and Machine Learning Blog Series (Kathryn Marchesini, ONC, 4/13/2023)
New ONC rule aims to raise trust in clinical decision support algorithms (Rebecca Pifer, Healthcare Dive, 4/12/2023)
FDA proposes a new plan to streamline updates to medical devices that use AI (Casey Ross, STAT News, 3/30/2023)
Prevention of Bias and Discrimination in Clinical Practice Algorithms (Carmel Shachar, Sara Gerke and Dipl-Jur Univ; JAMA; 1/5/2023)
HHS’s proposed rule prohibiting discrimination via algorithm needs strengthening (Ashley Beecy, Steve Miff and Karandeep Singh; STAT News; 11/3/2022)
In new guidance, FDA says AI tools to warn of sepsis should be regulated as devices (Casey Ross, STAT News, 9/27/2022)
FDA Review Can Limit Bias Risks in Medical Devices Using Artificial Intelligence (Liz Richardson, Pew Charitable Trusts, 10/7/2021)
Ethics and governance of artificial intelligence for health (World Health Organization, 6/28/2021)
How medical AI devices are evaluated: limitations and recommendations from an analysis of FDA approvals (Eric Wu, Kevin Wu, Roxana Daneshjou, David Ouyang, Daniel E. Ho and James Zou; Nature Medicine; 4/5/2021)
As the FDA clears a flood of AI tools, missing data raise troubling questions on safety and fairness (Casey Ross, STAT News, 2/3/2021)
Selected Reporting on AI and Racial Bias:
A research team airs the messy truth about AI in medicine — and gives hospitals a guide to fix it (Casey Ross, STAT News, 4/27/2023)
How Doctors Use AI to Help Diagnose Patients (Sumathi Reddy, Wall Street Journal, 2/28/2023)
How Hospitals Are Using AI to Save Lives (Laura Landro, Wall Street Journal, 4/10/2022)
From a small town in North Carolina to big-city hospitals, how software infuses racism into U.S. health care (Casey Ross, STAT News, 10/13/2020)
Widely used algorithm for follow-up care in hospitals is racially biased, study finds (Shraddha Chakradhar, STAT News, 10/24/2019)
Selected Research and Analysis on AI and Racial Bias:
Predictive Accuracy of Stroke Risk Prediction Models Across Black and White Race, Sex, and Age Groups (Chuan Hong, Michael J. Pencina, Daniel M. Wojdyla, Jennifer L. Hall, Suzanne E. Judd, Michael Cary, Matthew M. Engelhard, Samuel Berchuck, Ying Xian, Ralph D’Agostino Sr, George Howard, Brett Kissela and Ricardo Henao; JAMA; 1/24/2023)
Responsible AI: Fighting Bias in Healthcare (Chris Hemphill, Actium Health, 4/18/2022)
Underdiagnosis bias of artificial intelligence algorithms applied to chest radiographs in under-served patient populations (Laleh Seyyed-Kalantari, Haoran Zhang, Matthew B. A. McDermott, Irene Y. Chen and Marzyeh Ghassemi; Nature Medicine; 12/10/2021)
Racial Bias in Health Care Artificial Intelligence (NIHCM, 9/30/2021)
Artificial Intelligence in Healthcare: The Hope, The Hype, The Promise, The Peril (National Academy of Medicine, 2019)
Dissecting racial bias in an algorithm used to manage the health of populations (Ziad Obermeyer, Brian Powers, Christine Vogeli and Sendhil Mullainathan; Science; 10/25/2019)
Selected Best Practices to Mitigate Racial Bias in AI:
Organizational Governance of Emerging Technologies: AI Adoption in Healthcare (Health AI Partnership, 5/10/2023)
Blueprint for Trustworthy AI: Implementation Guidance and Assurance for Healthcare (Coalition for Health AI, 4/4/2023)
Preventing Bias and Inequities in AI-Enabled Health Tools (Trevan Locke, Valerie J. Parker, Andrea Thoumi, Benjamin A. Goldstein and Christina Silcox; Duke Margolis Center for Health Policy; 7/6/2022)
Algorithmic Bias Playbook (Ziad Obermeyer, Rebecca Nissan, Michael Stern, Stephanie Eaneff, Emily Joy Bembeneck and Sendhil Maullainathan; Chicago Booth Center for Applied Artificial Intelligence; June 2021)
Ensuring Fairness in Machine Learning to Advance Health Equity (Alvin Rajkomar, Michaela Hardt, Michael D. Howell, Greg Corrado and Marshall H. Chin; Annals of Internal Medicine; 12/18/2018)
Episode Credits
Guests:
Emily Sterrett, MD, Associate Professor of Pediatrics, Director of Improvement Science, Duke University School of Medicine Department of Pediatrics
Mark Sendak, MD, MPP, Population Health & Data Science Lead, Duke Institute for Health Innovation
Minerva Tantoco, Chief AI Officer, New York University McSilver Institute for Poverty, Policy and Research
Carmel Shachar, JD, MPH, Executive Director, Petrie-Flom Center for Health Law Policy, Biotechnology, and Bioethics at Harvard Law School
Kathryn Marchesini, JD, Chief Privacy Officer, Office of the National Coordinator for Health Information Technology
Melanie Fontes Rainer, JD, Director, HHS Office for Civil Rights
Ryan Levi, Reporter/Producer, Tradeoffs
The Tradeoffs theme song was composed by Ty Citerman, with additional music this episode from Blue Dot Sessions and Epidemic Sound.
This episode was reported by Ryan Levi, edited by Dan Gorenstein and Cate Cahan, and mixed by Andrew Parrella and Cedric Wilson.
Special thanks to: Suresh Balu, Jordan Everson, Sara Gerke and Jeff Smith.
Additional thanks to: Julia Adler-Milstein, Brett Beaulieu-Jones, Bennett Borden, David Dorr, Malika Fair, Marzyeh Ghassemi, Maia Hightower, John Halamka, Chris Hemphill, John Jackson, Jen King, Elaine Nsoesie, Ziad Obermeyer, Michael Pencina, Yolande Pengetnze, Deb Raji, Juan Rojas, Keo Shaw, Mona Siddiqui, Yusuf Talha Tamer, Ritwik Tewari, Danny Tobey, Alexandra Valladares, David Vidal, Steven Waldren, Anna Zink, James Zou, the Tradeoffs Advisory Board and our stellar staff!