New research sheds light on how many hospitals are using artificial intelligence, what they’re using AI for, and what it means for patients and policymakers.
Artificial intelligence has dominated health care headlines for several years, but one seemingly simple question has been difficult to answer: How many hospitals actually use AI?
University of Minnesota researcher Paige Nong and her colleagues published a first-of-its-kind study in the journal Health Affairs earlier this year that found about two-thirds of the roughly 6,000 hospitals in the U.S. use AI and predictive models. Some use algorithms for clinical purposes — to identify abnormalities in X-rays and CT scans, for example, or to predict which patients are likely to fall or get a serious infection. Others rely on such models to help facilitate scheduling and billing. But of those hospitals that use AI of any kind, only 61% said they tested those tools for accuracy, and just 44% tested them for bias.
“The big picture is that these predictive models and AI in hospitals are widespread. But evaluation for bias isn’t,” Nong told us in a wide-ranging conversation about her recent research and the broader state of AI in health care. “This to me is like a blinking red light warning us that we have work to do here.”
Here are a few other highlights from our conversation:
- Paige’s research found that the most common way hospitals are using AI is to predict which patients are at highest risk of medical complications like sepsis or some other kind of infection, followed by operational uses like scheduling and billing.
- Many guardrails are in place to reduce the likelihood that biased algorithms harm patient care, according to Paige. She’s more worried about the potential harms of biased AI in scheduling or billing. “If we’re using AI to refer patients to collections agencies faster, that’s not going to result in missing a sepsis diagnosis, but it can result in really devastating financial consequences,” she said.
- Paige believes transparency regulations put in place under the Biden administration are encouraging more hospitals and vendors to evaluate their AI for bias. But she would like to see policymakers do more to help hospitals that have fewer resources acquire and evaluate AI tools. “What we want is to make sure [all hospitals] have the resources to conduct appropriate evaluations, and to use [AI] tools when they can provide them with value or when they can improve patient care,” she said.
Listen to the full episode above or read the transcript below to learn more about what Paige fears the risks to patients could be if research on bias in AI slows or stops.
Episode Transcript and Resources
Episode Transcript
Dan Gorenstein (DG): Given how much we hear about artificial intelligence and health care…
Clip: AI has been called medicine’s biggest moment since antibiotics…
Clip: Programs are learning to answer patient medical questions and diagnose illnesses…
DG: …you’d think that AI is being used in every hospital and doctor’s office in America.
Clip: Can it really replace your psychiatrist or doctor?
Clip: More than 350 gigabytes of information per patient goes into a central computer.
DG: The truth: AI is moving so fast that we’re just beginning to get good data on how many hospitals use it, what they’re using it for, and how much of a threat AI is to patients.
What we’re learning has one researcher concerned.
Paige Nong (PN): This to me is like a blinking red light warning us that we have work to do here.
DG: Today, new research on AI in hospitals and what it means for patients and policymakers.
From the studio at the Leonard Davis Institute at the University of Pennsylvania, I’m Dan Gorenstein. This is Tradeoffs.
*****
PN: My name is Paige Nong. I’m an assistant professor in health policy and management at the University of Minnesota’s School of Public Health.
DG: A groundbreaking 2019 paper inspired Paige to study AI back when she was in grad school. Ziad Obermeyer and other researchers looked at an algorithm used by a health insurance company to predict people’s health needs for more than 100 million people.
PN: The idea was that the insurance company wanted to identify patients that were really sick, who needed support in managing chronic conditions, like diabetes or hypertension. The problem was that the model used cost as the predictor, not the patients’ actual diagnoses.
DG: That was a problem because Black patients often spend less on care because Black patients have less access to care. Given how the algorithm was configured, Black patients had to be much sicker than white patients to get the same recommended follow-up care.
PN: And that paper has become famous because it really gave us a clear look at the risks of these algorithms and what it means when we don’t think very carefully about the data that we’re using.
DG: But Paige’s concerns went beyond the results of this one paper.
PN: We only knew about this predictive model discriminating because Obermeyer and his colleagues got access to this data and happened to do this kind of analysis. But the fact is that we don’t often have that kind of access to data and information about the inner workings of an insurance company’s algorithm. So for me, the big concern was that there wasn’t a system in place to prevent this kind of thing from happening. So my work over the last five years has been focused on how we design that system. How can we understand what’s happening in our health care system and use that information to build good guardrails or guidelines for AI that will protect patients and improve care?
DG: So you see this paper as a student and it both scares you, but it seems like it inspires you. Paige, was there an overarching question that this made you want to, like, begin to, like, be in hot pursuit of?
PN: Yeah, it made me want to understand, did health systems design good governance approaches that would prevent them from using bad AI. Did they have the capacity to do this? Did we need a policy intervention? There’s a lot of hype around AI and a lot of assumptions about how it’s transforming health care, but I wanted to know what was actually happening. I really wanted that information. So I started with some interviews to try to get a sense of what was going on.
DG: You interviewed, I think it was like, 13 data leaders or something at big health systems around the country to better get a baseline. And if I remember correctly, what you learned was kind of scary.
PN: Yes. So I did interviews with 13 academic medical centers across the country. This kind of governance approach ranged from a standing AI committee that reviewed every single tool that was used that involved any kind of machine learning or AI.
DG: Sort of a gold standard.
PN: Yes. What we would want to see. A consistent, sort of coherent, transparent approach to analyzing the AI tools that they were either considering using or actively using. But then on the other end of the spectrum, there were some academic medical centers that basically had a single individual making decisions about what tools to use or not use. And of these 13 systems, only four described any kind of equity consideration in their governance processes. Bias evaluations were not a consistent requirement. So back to that Obermeyer paper. And my question of, “Can we do this consistently?” The answer through these qualitative interviews was no.
DG: You’re saying that based on these interviews that you did, Paige, it was really clear that bias could very easily make its way into these algorithms.
PN: Yes. And it raised really important questions for me about how we could do this better.
DG: It’s always seemed to me that one of the most pernicious parts of this topic is that in order to sort of tackle bias, it requires a kind of self-awareness that somehow we are bringing our own bias to the table.
PN: Yes, that’s absolutely true. I did hear some fantastic examples, though, from some of these academic medical centers about really thorough and critical conversations about if race or ethnicity is used as a variable in a model, why and how and what is it contributing here, and how do we make sure that it’s not making things worse? There are some incredibly thoughtful, careful people who are involved in this work doing a really great job. I think our challenge is to scale that kind of effort and that is really hard.
DG: And so speaking to the scaling, these interviews confirm some of your worst fears. But it’s still only a handful of hospitals. So you present your findings at a big conference in New Orleans in November of 2023. What happens next?
PN: I was part of a panel on governing AI. After the panel wrapped up, my colleague Dr. Jordan Everson and I were talking about the findings from the interviews. Jordan is at the office of the Assistant Secretary for Technology Policy, which we used to call ONC. He’s a federal official, and he and his colleagues had recently gotten national data on AI use and evaluation as part of this annual survey of hospitals that’s done by the American Hospital Association. So he basically had the data that I had been wanting all along. So Jordan and I then led the effort to understand what this data meant and what was happening nationally in hospitals.
DG: And I mean, this seems like a huge deal Paige, kind of like Christmas comes early in New Orleans, him giving you this chance to go from knowing what 13 health care systems are doing with AI and bias to being able to basically survey hospitals around the country. How do you feel after that conversation? I mean, were you just like jacked?
PN: I was. I was so excited. Qualitative work can be so rich and give us really deep insight into what’s going on. But being able to kind of pair that with a quantitative, national understanding of what was happening just made me really hopeful for how we could move forward in thinking about policy and setting up those kind of systematic ways to think about using AI safely and effectively.
DG: When we come back, the results of Paige’s first-of-its kind research on who in health care is using AI and how … and the message she wants policymakers to take away.
MIDROLL
DG trx: Welcome back. We’re talking with University of Minnesota researcher Paige Nong about her new study, which for the first time gives us a national look at how hospitals are using and evaluating AI.
So, Paige, you just published this paper in Health Affairs, and you found that about two-thirds of U.S. hospitals are using some kind of predictive algorithm, about 60% are testing those algorithms for accuracy, and less than half are testing their algorithms for bias. What do you make of those results?
PN: Yeah. The big picture is that these predictive models and AI in hospitals are widespread. But evaluation for bias isn’t. We have a lot of work to do to close that gap and to make sure that these tools, not only that they work for the patient population where the tools are being deployed, but that they are free of bias. We have a lot of work to do there.
DG: When I was reading your paper, I was getting a very clear picture of who’s doing testing, who’s not doing testing. What was less clear to me was, what are the implications of your findings for the people who are walking in and out of the hospital? Who could this really hurt?
PN: I don’t think that the findings of this paper indicate that there are major clinical risks to patients. There are quality improvement professionals. There are legal teams at hospitals who are really focused on making sure that clinical care is safe. And, to my mind, because we don’t have that same medical liability concern or regulatory infrastructure or oversight developed on the operational side, that might be where patients might want to think about their access to appointments. Their appointments might be getting cut short, but they don’t know why. Or they might be having trouble with a bill and they can’t get somebody on the phone. They might not know why. And it is possible that these kinds of predictive AI tools are shaping their experience of the health care system outside of the direct clinical care, or outside of their direct relationship with their doctor.
DG And this gets at one of the other big findings from your paper — how hospitals are using AI. Because there are lots of ways to use AI in health care. What did you learn there?
PN: Yes. The most common way that hospitals are using predictive models in AI is to predict health trajectories or risks for their inpatients. So this would be like predicting fall risk for patients when they’re hospitalized or predicting something like sepsis. The second most common way that they’re using these tools is to identify high risk outpatients to inform follow up care. So if somebody leaves the hospital, they might be at risk for infection. They might be at risk for some kind of other complication. So hospitals really want to be able to predict that readmission risk, to make sure that they can intervene before the patient might be re-hospitalized. And then the third most common application of predictive models or AI was to facilitate scheduling. So that operational side of things, we’re using a lot of models for those kinds of functions in the health care system.
DG: Right, so this gets back to what you were saying earlier about how you think, based on your research, patients should be less concerned about AI getting a diagnosis wrong or suggesting the wrong treatment and more worried about how these tools are being used for scheduling and billing. To me as an outsider, that sounds like a weedy thing that doesn’t really seem to be a big deal, but in your mind, it’s clearly a big deal. Why?
PN: Because I think we have a relatively good kind of regulatory and policy infrastructure for thinking about clinical risk. FDA is our go to. They have specific rules. They have guidelines.
But when we think about tools like predicting if patients are going to pay their bills or predicting if patients will attend their appointments, to my mind, there’s still a risk there that we’re going to build barriers to care by double booking patients who are predicted as likely to miss an appointment. Those patients then get only five minutes with their doctor and they might end up avoiding care. If we’re using AI to refer patients to collections agencies faster, that’s not going to result in missing a sepsis diagnosis, but it can result in really devastating financial consequences.
DG: And so what I’m hearing you say is that operations are actually a critical part of the patient consumer experience and must be considered sort of holistically.
PN: Yes, absolutely.
DG: This survey of hospitals that you teamed up with Jordan on went out in late 2023, and over the last year we’ve seen continued focus, of course, on this issue. Industry leaders are refining and sharing best practices for testing algorithms. The Biden administration finalized regulations that require more transparency from developers and accountability from providers. In your paper, you suggest that these new regulations will encourage hospitals in your mind to do more testing of these algorithms. Why?
PN: I think the rule that requires vendors to share information with their customers about the tools that they’re providing to them is a really important first step. If we want hospitals to be able to design effective governance approaches, they need to know how the tools that they’re thinking about were designed. They need to know if they were designed on a patient population that looks like theirs.
The other major contribution that that rule made was that they started to kind of break down this distinction between specifically clinical models that are used to diagnose or treat a patient, and the administrative tools like predicting billing or using AI for scheduling. They kind of refer to these predictive decision support tools altogether. And to my mind that’s really important. It starts to kind of close that gap between the specifically clinical and the more operational or administrative models to give us a better chance at protecting patients from biased tools or just bad tools.
DG: Are there any additional policies, government carrots or sticks that you think are needed to get more hospitals testing for accuracy and for bias?
PN: Absolutely. One of the findings from our paper is that 79% of the hospitals that are using predictive models in AI get those models from their electronic health record vendor. So increasing transparency, providing information about evaluation, those are really positive steps that we can require on the vendor side.
Then on the hospital side, hospitals that have more resources are more likely to use AI and more likely to evaluate it. Independent hospitals, the rural hospitals, the ones with fewer resources, could especially benefit from tools to do evaluations and incentives to help them do them well.
DG: But I’m curious, given what we’ve been saying, Paige, about how few hospitals are testing their AI for accuracy and bias, could it be sort of a blessing in disguise for patients, at least, that these hospitals with fewer resources, which often care for more marginalized patients, are not using AI?
PN: It could be if the models are bad, right? We don’t want bad models deployed in any hospital. But I think this leaves under-resourced hospitals in really tough positions. They basically have two bad choices. One is to use tools that they can’t validate. And we don’t want that. Because those tools could be bad, they could be biased. Or they can completely avoid AI, but then they potentially miss out on the value that AI could provide to them.
What we want is to make sure they have the resources to conduct appropriate evaluations, and to use tools when they can provide them with value, or when they can improve patient care.
DG: And how do you think policymakers could or should maybe even tackle this resource issue?
PN: I think that we do have some muscle memory for expanding technical capacity. The regional extension center model, for example, was really effective at facilitating implementation of electronic health records, especially for under-resourced organizations. We were able to kind of infuse resources into these settings and facilitate their adoption of electronic health records. The expertise exists, right? We just need to make sure that it’s consistently available to all kinds of care delivery organizations, instead of just a few large health systems with a lot of resources.
DG: This entire conversation has been about data and health equity. And those are two things that have come under fire in the early days of the Trump presidency. The administration has pulled down federal health websites and datasets and moved aggressively against anything that looks like diversity, equity or inclusion work.
There’s still a lot of uncertainty about what the president will do when it comes specifically to regulating AI in health care. But, Paige, based on what’s happened so far, it seems like your work could be at risk. Obviously, you believe in your work, otherwise you wouldn’t do it. But I guess the question is what happens if this work, your work, is halted?
PN: It’s really important for us to know what AI use looks like to better protect patients from bad tools, to make sure that we’re using good ones, and to make sure that policy is responsive. If every hospital was already conducting robust local evaluation, we wouldn’t need to think about policy in that area. But the fact that only 44% of hospitals are doing this robust evaluation gives us this signal: We need to pay attention here. We’ve got some work to do. So without that kind of information, we’re just kind of moving without the knowledge that we need to make informed decisions.
DG: I know one of the reasons you got interested in this work in the first place was because of that Ziad Obermeyer paper that we talked about at the top of the show, and how it showed that black patients were being harmed by AI. What is in your mind the risk if we see a real rollback in the number of researchers who are looking at equity questions when it comes to AI and healthcare?
PN: We can’t make good decisions. I think people sometimes get caught up on the word bias and maybe misinterpret what we mean when we talk about bias in AI. This isn’t about any kind of personal prejudice. Bias here means the tool doesn’t work for everyone.
So we don’t want a tool that discriminates against rural patients. We don’t want a tool that works really well if you’re under 55. But for people 56 and older, it doesn’t work properly, right? A biased tool is a bad tool. And that’s why local evaluation matters, is it gives us a way to identify and prevent bias in these tools.
DG: And so a biased tool is a bad tool. And if this research isn’t being done looking for bias, we potentially are going to be using tools that can cause harm in one way or another.
PN: Yes, exactly. We don’t want to be in a situation where we look back ten years from now and say, oh, we deployed all these tools and we didn’t make sure that they were working properly. If we can be more proactive and we can be more thoughtful and be more critical AI consumers, we can prevent a lot of those harms. But it does take a lot of effort. It takes attention, and it takes these kinds of analyses to identify where those pain points might be, to identify the places where we can intervene and it can make a difference and improve safety and effectiveness.
DG: What’s your next project that you’re most excited about that’s a build on off of this, Paige?
PN: I am going to be doing qualitative interviews with leadership from safety net hospitals to understand in their specific contexts, what are their biggest challenges? What are their hopes for AI and what kinds of tools would be particularly useful for them?
Part of what we found in this paper was that hospitals that have the most resources are best able to conduct robust evaluation. It makes sense. It takes resources to do this. So the question that I’m trying to answer now is, how do we meaningfully and effectively connect hospitals that don’t have those resources to the resources they need to do this work.
DG: How are you feeling going into that work in this current environment?
PN: I’m energized. I’m really excited to hear the perspectives of these leaders, and to think carefully about where and how these tools can actually help them provide care to their patients.
I think that the paper answered some questions, but opened the doors to all kinds of other questions that I’m really excited to pursue answers for. And I think it’s really important for our health care system.
DG: Paige, thanks for taking the time to talk to us on Tradeoffs.
PN: Thanks very much for having me.
DG: I’m Dan Gorenstein, this is Tradeoffs.
Episode Resources
Additional Reporting and Research on Artificial Intelligence in Health Care:
- Current Use And Evaluation Of Artificial Intelligence And Predictive Models In US Hospitals (Paige Nong, Julia Adler-Milstein, Nate C. Apathy, A. Jay Holmgren and Jordan Everson; Health Affairs; 1/2025)
- New president, new rules: Trump rescinds Biden’s AI order (Mario Aguilar, STAT News, 1/21/2025)
- Health Care AI, Intended To Save Money, Turns Out To Require a Lot of Expensive Humans (Daris Tahir, KFF Health News, 1/10/2025)
- GOP’s AI vibe shift (Ben Leonard, Daniel Payne and Erin Schumaker; Politico, 12/2/2024)
- OpenAI isn’t built for health care. So why is its tech already in hospitals, pharma, and cancer care? (Mohana Ravindranath, STAT News, 11/12/2024)
- Rooting Out Racial Bias in Health Care AI (Ryan Levi, Tradeoffs, 5/25/2023)
- Dissecting racial bias in an algorithm used to manage the health of populations (Ziad Obermeyer, Brian Powers, Christine Vogeli and Sendhil Mullainathan; Science; 10/25/2019)
Episode Credits
Guests:
- Paige Nong, Assistant Professor, University of Minnesota School of Public Health
The Tradeoffs theme song was composed by Ty Citerman. Additional music this episode from Blue Dot Sessions and Epidemic Sound.
This episode was produced by Ryan Levi, edited by Dan Gorenstein and mixed by Andrew Parrella.
Reporting for this episode was supported in part by the Gordon and Betty Moore Foundation.
