How Patient Privacy Could Hurt AI

March 7, 2024

There are a lot of concerns about the dangers artificial intelligence could pose to people’s health data privacy. But one expert cautions too much concern over patient privacy could make health care AI biased and less effective.

Scroll down to listen to the full episode, read the transcript and get more information.

If you want more deep dives into health policy research, check out our Research Corner and subscribe to our weekly newsletters.

Note: This transcript has been created with a combination of machine ears and human eyes. There may be small differences between this document and the audio version, which is one of many reasons we encourage you to listen to the episode!

Dan Gorenstein (DG): There are a lot of concerns about the danger artificial intelligence could pose to people’s health data privacy. Members of Congress have grilled health companies about selling patient data and protecting that data from ransomware attacks.

Rep. Annie Kuster: What steps does your team take to ensure patient data being used to train AI tools is being protected from cyber criminals and just plain bad actors?

DG: Legal scholar and AI expert Nicholson Price understands these anxieties. But he cautions that too much concern over privacy could actually make health care algorithms biased and less effective.

Nicholson Price (NP): My hope is that AI can slot in and be transformational. But for that to happen, it needs to be based on high-quality, representative data.

DG: Today, how AI weakens our health care data privacy, and how privacy laws make it harder to create high-quality health care AI. From the studio at the Leonard Davis Institute at the University of Pennsylvania, I’m Dan Gorenstein. This is Tradeoffs.

*******

DG: Nicholson Price is a nerd.

NP: So the top is all healthy cell systems. So liver cell inside of the ear…

DG: His magenta office walls at the University of Michigan Law School are covered with framed photos of cells.

NP: And then the bottom is all diseases. So that’s white blood cell eating anthrax, bubonic plague in a rat’s intestine…

DG: He’s also a bit of a clairvoyant. Nicholson has been studying the intersection of health care and artificial intelligence since before many of us had even heard of AI.

NP: I really distinctly remember in 2013 saying, this is a thing that’s going to happen. We have these data, we’re getting better at making predictions, and there’s just the possibility for doing personalized medicine in a way that we really couldn’t do before. And this seems like an area of a huge amount of potential, but also an area where law has a real role in hopefully putting it on the right track.

DG: So, turns out that you were pretty good at making predictions too.

NP: I got lucky. Another thing I thought about writing about at the time was nanoscale robots that are going to help people and improve lots of health conditions. And I got lucky by not doing that either, because it’s really cool, really exciting, and definitely still pretty far off.

DG: And why do you care? Why do these questions interest you?

NP: So I think about AI, and I talk about it with colleagues who also think about AI, and they tend to come at it from a perspective of, man, this stuff’s going to break the world. And I think about this from a health and biology perspective. And I come in with a totally different prior, which is that world is broken. The number of people who don’t get care because they can’t afford it, because they can’t take the time for it is absolutely unacceptable. And my hope is that AI can make something of a difference there.

DG: Nicholson, one of the areas that you’ve written a bunch about is what AI means for health care data privacy. We know patient data drives health care AI. Without that data, there’s nothing for the AI to learn on, as you know. But a lot of people are concerned about the dangers that AI could pose to people’s data privacy. Here’s New Jersey Congressman Frank Pallone at a hearing last year.

Rep. Frank Pallone: I remain concerned that the expanded use of AI in health care has generated significant risks. It is critical that safeguards are in place to protect the privacy and security of the patient’s data.

DG: Nicholson, I’d like you to do a little table setting for us. How does my individual data, Dan Gorenstein’s data, go from my visit with my doctor to being used in an AI algorithm?

NP: So there’s a first step that happens before almost everything. Whoever has your data will de-identify it. De-identification means it doesn’t have your name on it. It also doesn’t have a bunch of other pieces of identifying information. And then there’s not much really governing what your doctor or your hospital or whoever can do with your data. They might use it on their own. They could sell or license it to a data broker. There are two other ways that data about your health get out into the world. One of them is you might share them. You can share them with a company that might say, hey, we’re using this app. We’ll analyze your health data and give you a useful insights. In exchange, you let us do stuff with them. But there’s a ton of other data that’s also health data. Things like, What does my iPhone say about me? What do my Google search habits say about my health? And so those sorts of data are really kind of freely out there. And if companies want to aggregate them or sell them or license them or combine them, that’s pretty much fine.

DG: Right, you, you write about, in one article, this so-called Target example, where an algorithm can ultimately determine whether someone is pregnant based on what they’re buying at Target.

NP: Yeah. Oh, yeah. Yeah, exactly. The story goes…

Clip: Target’s advanced advertising system even knew about a teen-aged girl’s pregnancy before she could even break the news to her own father. He found out when the store sent her maternity deals in the mail.

NP: The wild thing about this, though, is this is a story from 2012. It’s gotten much better now. So there’s a wonderful, just brutal article by a professor named Anya Prince that appeared in the Atlantic. She is someone who studies health privacy. And she got pregnant and decided to say, I want to keep information of my pregnancy from companies. And so bought things only using cash and searched only using VPNs and incognito things in browsers and really tried hard as somebody who knows about this stuff to keep her pregnancy hidden. And definitely failed. Totally failed. So I think the takeaway is it’s really hard to keep your health information private from well-resourced companies that are interested in it.

DG: And Nicholson, what legally regulates how our data can be used? What protections are there for us?

NP: The funny thing is I would say the strongest privacy regime, or certainly one of the strongest privacy regimes we have in the U.S., is about health data. And so this is HIPAA, the Health Insurance Portability and Accountability Act. You will notice that nowhere in that title is the word privacy. HIPAA basically says, hey, here’s a set of entities — health providers, health insurance companies, health data clearinghouses HIPAA says, hey, you can use this for some stuff. You can use it for payment purposes or health care operations. But you can’t sell it without patient consent. And that sounds strong. And it is strong in some ways. But there are at least two big challenges. One of them is it only applies to identifiable information. Once you’ve removed that, not protected at all.

DG: So you’re saying once there’s nothing to tie that data back to me, my name, my birthday, my social security number, but it still talks about when I got Covid or whatever, that is now free game.

NP: Yes, with a little bit of a modification, which is you said there’s nothing to tie me to that data.

DG: Overstating. I apologize.

NP: Oh, that’s a huge overstatement because your name’s not associated with it, but if you use your credit card on a specific day to buy some things at a pharmacy that has a particular prescription and some other things, and your location data ties you to going from place A to place B or whatever, all of a sudden it’s actually not that hard to reconnect that information to you. The other thing is HIPPA covers providers, health insurance companies, their business associates. You notice who’s not on that list? Target. Walmart. Fitbit. Apple. Amazon. HIPAA was written at a time when the things that are showing up in your doctor’s office, in your hospital visit, that’s what we thought about as health data. And that’s not the world we live in now.

DG: When we come back, Nicholson explains how our health data privacy laws make it harder to build good health care AI.

MIDROLL

DG: Welcome back. We’re talking with University of Michigan law professor Nicholson Price about how AI impacts our health data privacy.

And so, Nicholson, you have this really interesting line in a paper from 2021 where you write, “A.I. weakens protections for health, privacy and health privacy weakens the AI used in health.” I’d like you to walk us through both parts of that sentence, starting with the first half: “AI weakens protections for health privacy.” What does that look like? What’s a nice concrete example?

NP: Yeah. One thing is if you have different sources of de-identified data — you have one dataset that’s all hospital discharges, and you have another dataset that’s all Medicare claims, and you have another dataset that’s a bunch of purchases from pharmacies — and you want to figure out how you can link those together and figure out what the patterns are and maybe also know who some of these people are, AI makes that a lot easier. The other way that AI weakens privacy protections is the ease of inferring health information from non-health data. If I were to say a woman that I know at a dinner party we recently attended, she avoided soft cheese and raw fish and didn’t drink wine, I might infer, oh, maybe she’s pregnant. I don’t need AI for that. If, on the other hand, I were to say, based on the rate of flicker and color changes of a single pixel in a smart TV in this particular home, I can tell that this person has shifted from watching this type of TV at this time to a separate type of TV at a different time. And that shift is highly correlated with people in the first trimester of pregnancy. It’s hard to make that analysis without AI. It’s not particularly hard to make that analysis with AI.

DG: So let’s talk, Nicholson, about the second part of your sentence where you say, “health privacy weakens the AI used in health.” Tell us about that.

NP: So if you want to find data about a person and you are a big well-sourced company, it’s not that hard to do. It’s harder for small nonprofit researchers to do, though. There’s a wonderful example of this which is called the MIMIC dataset. And it’s out of Beth Israel Deaconess in Boston. And it’s this terrific high-quality data set of the highly de-identified emergency room information from this one hospital. And it’s been used in hundreds of research projects to predict various things about patients’ health related to the E.R. Fantastic. But that’s one well-resourced hospital in Boston that has certain care patterns and sees certain patients And so if the thing that we train health AI on is data from one hospital in Boston, we’re going to have a skewed vision of what the world looks like. This is going to create inaccuracies and biases in the health system, and that’s a problem for the resulting products and what they predict and how they can help us.

DG: So your concern is only a fraction of hospitals and clinics have the resources to turn their patient data into AI algorithms under our privacy laws. And because of that, the data used in those algorithms are less likely to be representative, which means the AI are more likely to be biased and possibly harmful to patients.

NP: Yeah, that’s that’s exactly right.

DG: You talk about Beth Israel Deaconess in Boston. I know there’s another study from 2020 that found that the overwhelming majority of patient data used to train clinical AI algorithms came from California, Massachusetts and New York. Nicholson, if that’s the case, what do we know about whose data is being included and whose data isn’t?

NP: So it’s hard to answer who, but we can say some things. It’s patients that are more likely to be seen by academic medical centers. So there’s likely to be a bias in terms of the resources available at the hospital. I think in New York, Black patients were less likely to be seen at academic medical centers than white patients were. There are also some differences in consent rates when patients are approached to include their data in research datasets. So this is actually a study that was led by Kayte Spector-Bagdady here at the University of Michigan that I was involved in, where we looked at the differences between patients who were approached for participation in a big research dataset and patients who agreed to be in the dataset. And what we saw was, the final data set was whiter, older, more male, and a little wealthier. This is a technology which should be scalable and personalizable and able to help folks across all walks of life. And collecting data across a broad scheme of places is what we need to to make that happen.

DG: And Nicholson, is that what you say when you talk to the people who are really concerned about data privacy being compromised or lost? Do you try to break this down from a health equity standpoint?

NP: So I think health equity is a good way of thinking about this and a really important goal. Another one is, if I’m a company and I want to know things about your political leanings or things about what you want to buy and how I can advertise to you, man, the law has very little to say about that. And it’s hard to think about really good things that come out of that. Health, on the other hand, man, there’s so much good that could come out of this. Yeah, there’s advertising. Sure, there’s manipulation. But it also has the potential to say, like, let’s just transform something that’s absolutely central to the lives of everyone, and that’s the one where we say, oh, no, it’s quite a bit harder to share data in that space. That’s the one we’re going to lock down. If I gave you the choice, either give me access to your medical records or hand me your unlocked phone, I think a lot of people would say, here are my medical records. Now, to be clear, I’m speaking from a place of privilege. I’m a stably employed, middle aged, healthy white dude. And I get it. And that is not a tradeoff that everybody would make. But I think a lot of people would say, here’s my health records, stay away from my phone. And that’s the opposite of the tradeoff that we make at a national level.

DG: Do you get pushback on that?

NP: Uh, the biggest piece of pushback is that I’m speaking from a place of privilege and that doesn’t reflect the experience of some folks in the health system. And that’s a totally fair critique. The question of which data would you give up? Some people say neither one of those should be shared. Both of them should be super secret and really, really private, and we shouldn’t share data in any direction.

DG: So let’s talk solutions. You write that the, “ongoing dysfunction of the status quo calls for a new bargain between patients and the health system about the uses of patient data.” What’s the new bargain look like?

NP: I wish I had a great answer on this. The principal lens that we think about control over individual information is, you have control over your data. You can decide whether or not to share them. I think a better way to think about this is through a communitarian or a justice perspective where we say, how can we shape data collection in such a way that it’s going to help lots of people? But there’s a catch, right? The catch is if you say, alright, we’re going to trade weaker control by the individual over their data for a stronger social model that’s going to make care better for lots of folks, you got to make care better for lots of folks. So the bargain is, yeah, we try to protect that and decrease poor uses of it. But you’re going to lose some control, and we’re going to use it to do good stuff. And you’re going to get the benefits of that good stuff.

DG: What kind of policy change would allow for more data sharing and better AI for all, while still protecting against the misuses of people’s data? The scary stories.

NP: I think maybe the most obvious thing we could imagine doing is putting a research exception into the HIPAA privacy rule. Another way potentially would be active government efforts to collect lots of data or some sort of centralized effort to collect lots of data on lots of patients without running it through a filter of individual choices about participating. So there’s an effort called All of Us, which has a goal of collecting data from a million Americans with a focus on making sure that it’s deeply representative across different racial and ethnic groups, gender representative, representative of urban and rural divides. And maybe part of the way that we try to restrict access by bad actors or bad uses is saying, here are restrictions on who can get access to these data. Here are restrictions on what you can do with it.

DG: You’ve also written, Nicholson, about some technical solutions people are working on like hospitals building algorithms using their data and then sending it off to another hospital in like daisy chain where algorithms keeps learning but the data stays separate. But all of this still requires patients to give up some privacy in exchange for better AI, like the “good stuff” as you put it a few minutes ago. What would actually change about the care that’s being delivered? What would we get for making these privacy sacrifices?

NP: I think about how AI can potentially transform the health system for good. We can already see instances of AI predicting acute kidney injury many hours before the best physicians in the world could do it. That’s wild. That’s so cool. There’s an autonomous diagnostic system for diagnosing more than mild diabetic retinopathy. Basically, your eyes are going for diabetic reasons. And it takes a picture of your retina, uploads it to a system and can diagnose, no physician needed. When we talk about places that don’t have a lot of eye doctors, that’s amazing. That’s what I think about when I think about the good stuff.

DG: You talked about in 2012, 2013, you began to write about health care in AI. You sort of read the landscape and made a prediction that this was, in fact, coming at a time when few people saw it. What’s your sense now? You talk about being optimistic, but you also talk about the status quo and what we may end up getting if we have sort of mediocre or less than mediocre AI and policies in and around privacy and data collection. Where do you think we land? What’s your read of the landscape again?

NP: Right now, I’m not wildly optimistic about the idea that we’re going to have a big change in the way we think about health data and privacy. But there are a lot of smart, talented people working in this space and trying to make things better. So I think we got a long way to go, but I keep smiling.

DG: Nicholson, thanks so much for taking the time to talk to us on Tradeoffs. I really appreciate it.

NP: Of course. Absolutely. My pleasure. Anytime. This was a delight.

DG: I’m Dan Gorenstein. This is Tradeoffs.

Tradeoffs’ coverage of diagnostic excellence is funded in part by the Gordon and Betty Moore Foundation.

Want more Tradeoffs? Sign up for our weekly newsletter!

Episode Resources

Additional Resources and Reporting on Health Care AI and Privacy:

Lawmakers stress data privacy in health AI oversight (Emily Olsen, Healthcare Dive, 11/30/2023)

I Tried to Keep My Pregnancy Secret (Anya E. R. Prince, The Atlantic, 10/10/2022)

Problematic Interactions Between AI and Health Privacy (Nicholson Price, Utah Law Review, November 2021)

Privacy and artificial intelligence: challenges for protecting health information in a new era (Blake Murdoch, BMC Medical Ethics, 9/15/2021)

“My Research Is Their Business, but I’m Not Their Business”: Patient and Clinician Perspectives on Commercialization of Precision Oncology Data (Kayte Spector‐Bagdady, Chris D. Krenz, Collin Brummel, J. Chad Brenner, Carol R. Bradford and Andrew G. Shuman; Oncologist; 3/13/2020)

Episode Credits

Guests:

Nicholson Price, JD, PhD, Professor of Law, University of Michigan

The Tradeoffs theme song was composed by Ty Citerman. Additional music this episode from Blue Dot Sessions and Epidemic Sound.

This episode was produced by Ryan Levi, edited by Deborah Franklin and Dan Gorenstein, and mixed by Andrew Parrella and Cedric Wilson.

Additional thanks to: Michelle Mello, Carmel Shachar, the Tradeoffs Advisory Board and our stellar staff!