Life and Death Decision-making through Algorithms, with Professor David Rehkopf and Derek Ouyang, Stanford University

[00:00:08] Speaker A: Good planets are hard to find now Temperate zones and tropic climbs and run through currents and thriving seas Winds blowing through breathing trees Strong ozone, safe sunshine. Good planets are hard to find. Yeah. [00:00:35] Speaker B: Hello, K SQUID listeners. It's every other Sunday again and you're listening to Sustainability Now, a bi weekly K Squid radio show focused on environment, sustainability and social justice in the Monterey Bay region, California and the world. I'm your host, Ronnie Lipschitz. Over the past few years, we've heard a lot about artificial intelligence and the algorithms that support public policy, decision making and resource allocations by processing reams of presumably neutral data. The algorithms are supposed to produce unbiased results, but we've also heard concerns about the algorithms themselves, what unrecognized assumptions go into their construction and how they can produce different outcomes depending on programmer choices about the data that goes into them. When applied to the allocation of public resources, differences in results can have real financial consequences for recipients and their communities. My guests today on Sustainability now are Professor David Rekhopf of the Department of Epidemiology and Population Health at the Stanford University School of Medicine and Derek Oyang, Executive Director of City Systems and Senior Research Manager in Stanford University's Regulation and Valuation and Governance Lab. They, with several colleagues, have audited a large scale algorithm known as Cal enviro Screen, designed to promote environmental justice and guide public funding by identifying disadvantaged neighborhoods. As they write, we observe the model to be both highly sensitive to subjective model specifications and financially consequential and susceptible to manipulation, raising ethical concerns. David Rakoff and Derek Oyang, welcome to Sustainability Now. [00:02:20] Speaker C: Thanks so much. [00:02:22] Speaker D: Thank you having us. [00:02:23] Speaker B: So we're going to talk about your collectively authored article in Nature Machine Intelligence Mitigating Allocative Trade Offs and Harms in an Environmental Justice Data Tool, which is sort of a mouthful probably for our listeners, but I know the article has attracted attention in various quarters and on today's show it would be helpful to address your work on Cal enviroscreen as one example of a larger phenomenon, the growing role and impacts of such algorithm based programs and applications in our daily lives and what that means. So I want to talk more broadly about maybe algorithmic society and things that are basically that are based on algorithms. But to begin with, what is the California Community Environmental Health Screening Tool? Who designed it and what's its ostensible purpose? [00:03:14] Speaker D: Yeah, so you mentioned already, also known as Cal enviroscreen, where a lot of people refer to it. So it's designed by the State of California Office of Environmental Health Hazard assessment more than 10 years ago. So there's a website, they have oehha.ca.gov where they have a lot of really user friendly information about the tool. So they describe it first as a mapping tool to show where people are most affected by different types of pollution and where folks may be most susceptible to the effects of pollution. So it doesn't measure people, it measures places. This is an important distinction. It's different from the ways a lot of other algorithms that we're familiar with work. It's based on census tracts. So that's about an average of around 4,000 people, much smaller than a city, but you know, typically more than just a few blocks around your house. It's a bit like a neighborhood, but sometimes maybe a bit larger than people people would think of as their neighborhood and a bit different. So then what it does is it ranks all of those places in California by 21 different factors which are in four categories. First category is pollutants, which they refer to as exposures like PM2.5 lead pesticides. Secondly, environmental effects, which are things like hazardous waste sites, groundwater threats, then sensitive population indicators, which are some health outcomes including like asthma, cardiovascular disease, and then fourthly, socioeconomic factors like poverty and unemployment. So what the algorithm does is it combines those 21 factors together into a single score for each census tract in the state of California. I'll just add that importantly, it's also more than a mapping tool, which is what we really focused on in our paper. These rankings of places have been used for a lot of different purposes, including directing a lot of resources upwards of $6 billion in California over the last 10 years to different census tracts in the state so that those funds help do important work for environmental justice. With the intention of directing resources towards communities in need based on that algorithm. [00:05:30] Speaker B: I've actually used the program in writing grants. One of the requirements has been to show that the particular population that would be affected by the grant falls within some of the parameters of cal and virus screen. But of course I never really have understood what's in it, how it works. And I do recognize, and we might come back to this later, Even though census tracts are regarded as uniform, they can often be quite different depending on where they're located. Can you give us a sense of how an algorithm operates? I mean, you can define what an algorithm is perhaps, and then give us a little bit more detail. Because what you've said is, you know, you put in a lot of data, something happens, it's a black box, and then results come out. But that's still that black box. I have this. I should mention I have this Calvin and Hobbes cartoon, which I also sent to you. Some of our listeners might remember it, since it's from 1987, I think. And Calvin and Hobbes are sitting in the kitchen, and Calvin says, want to see something weird? He's sitting in front of a toaster. Watch. You put bread in this slot and push down this lever. Then in a few minutes, toast pops up. And Hobbs says, wow, where does the bread go? And Calvin says, beats me. That's weird. So stuff goes in, stuff comes out. But, you know, what's the toasting process? Process like? [00:06:56] Speaker D: Yeah, no, that's exactly the key aspects. There's inputs and then there's outputs. So an algorithm itself can be very simple. It's simply a set of instructions for accomplishing something. So one of the analogies people use sometimes is like a recipe for cooking. So the algorithm will tell us how many ingredients. This much sugar, this much lemon, this much salt, and we get out lemon bars. So they're a way we can communicate instructions to each other. What's notable about an algorithm is it's very specific. It often involves quantitative inputs. So another sort of example of an algorithm, I think a bit closer of an analogy to what we're talking about here, is an algorithm for calculating the cost of shipping a package. It'd be very difficult for postal system employees, confusing for customers, if every time someone wanted to ship a package, they had to come in and make a decision and argue about how much the shipping was, the weight, you know, how far it would be shipped. So instead, the postal service has an algorithm that calculates the cost of shipping. The customer knows what to expect. They might not like how much it costs, but at least it's clear and there's a way to communicate and the customer knows what to expect. [00:08:14] Speaker B: Of course, in that case, there are very definite inputs. Right. I do remember when post offices did not have algorithms like this and you had to weigh the package and then look in a pamphlet to see how far it was going, which was. I don't know if it took that much more time, but I know now you can do it remotely. Of course, the two of you are in two different units at Stanford and your colleagues on the paper come from. And here I'm just going to list them. Departments of Environmental Health and Engineering and Biostatistics at John Hopkins Bloomberg School of Public Health. That's two different entities. The Department of Information Science at Cornell Stanford Law School, and the Stanford Department of Political Science. That's in addition to your positions in epidemiology and at this center, which I suppose you could tell us about as well. That's quite a hodgepodge of departments and disciplines. Why are there so many? What motivated, what prompted you to investigate Cal enviroscreen in particular? [00:09:18] Speaker D: Yeah, that's an important point. I'm glad you noticed that. And this is a great chance to say as well that this paper was led by Ben Nguyen was a graduate student at Stanford. I was his advisor along with Liz Chin. And so yeah, they brought together a broad group of people to help think through this. And that's because there's really a wide range of experience needed to do this work. There's the math of it, there's the creating the metric, there's knowledge about environmental science, knowledge about how it's going to be applied. There's this measurement, as I mentioned, you know, different health components, which is a whole other type of knowledge. You know, there's even more people than were authors that were involved with this. And that's a really important point. To create an algorithm and think about the impacts, you really need these different type of experiences kind of coming together. [00:10:12] Speaker C: This is Derek here and I'm based out of the law school at Stanford, but originally came from the School of Engineering. And the diversity of the roster that you're mentioning, Ronnie, I think speaks a bit to the diversity of calenviro screens uses as well, because a score that's generated impacts funding that goes into affordable housing, agricultural subsidies, public transit, renewable energy, many, many more domains. So from a motivation perspective, calenviroscreen matters to a lot of different stakeholders. And I think the story we tell in the paper is mostly one of how do you think responsibly about fair allocation among diverse groups? So I think it's kind of fitting that our team and the many, many different community groups that we work with are a kind of reflection of that coming togetherness as well. [00:10:57] Speaker B: Well, I'd like to get back to some of those points later on. So this was a grad student project to start with or was this an assignment to a grad student from an advisor? [00:11:08] Speaker D: No, definitely not an assignment. This was all Ben and Liz really looking around and seeing this algorithm, seeing how it was being used and then really being curious about the impact of it. [00:11:23] Speaker B: Had anyone had experience in using it? [00:11:25] Speaker C: I have, and I think it kind of matches yours a little bit. Ronnie, you mentioned kind of from the perspective of nonprofits and community based organization are applying for that Funding and I do some nonprofit work on the side in the Bay Area, work with plenty of those communities. So I've seen Cal enviroscreen for many, many years prior to this work and some of the kind of frustration sides of it as well. So, for example, imagine a community based group is trying to apply for something like Wildfire Resilience funding, a very timely topic right now. They would look at the grant information on a government website, a state website, and they'd get to a fine print moment at which there's an eligibility requirement. And it'll say something like, look up the census tracts that make up your community, and if they're eligible or not, they'll be designated as a, quote, unquote, disadvantaged community. And that designation is the heart of what the score is, determining this kind of ranking of census tracts across California. But as a community based group, you're sort of beholden to what that score is. I mean, it gets updated occasionally, but it kind of is what it is. And I've seen and been alongside communities that are, say, trying to get funding for urban tree canopy or energy investments. And then they look at the map, maybe for the first time, they can use the map on the government website, they can download a spreadsheet, and then they say, whoa, wait a minute. Our neighborhood turns out to be made up of multiple census tracts, that some are eligible and others are not. But we definitely don't see those divisions in that way. We understand at the actual ground level, at the household level, the need, and we want to support our entire community. And that turns into at times some serious frustration in what they're actually able to apply for. So it's been very much a community based concern for these types of groups for many years. [00:13:03] Speaker B: Well, here's a statistical question. Does the program calculate based on averages or on means? [00:13:11] Speaker C: I'd say it's even more complicated than that. First, I think you're describing a median. So, for example, a common kind of indicator is a median household income. And so whatever your median household income is of a given census tract, and this could be the kind of thing that goes into these types of algorithms and indices. Let's say it's $75,000. Then that means 50% of households make less than that and 50% make more. But it doesn't really tell you anything about the wideness of that distribution. And maybe if you're really concerned about folks in poverty, median household income doesn't necessarily tell you how many households are under a different type of measure. It just tells you what happens to be right at that 50% mark for that community. Now that as an index design can have some of those fundamental concerns you're hinting at, which is whether that specific measure even makes sense or not. But Cal enviroscreen is like 21 different separate measures in which you could have that conversation. And then after you've maybe had that conversation, then it just merges those 21 different measures together with even more math. So there are layers upon layers of the basic question at hand, which is, is this the right way to make use of these numerical inputs to get to some value judgment about whether a community deserves funding or not? And that's exactly the kind of messiness that we wade into in this paper. [00:14:26] Speaker B: So again, to get back to Cal and virus screen, I mean, I want to be, I want to be as specific as I can, although I, I understand this is, you know, we're trying to generalize here. When it takes a particular input, let's just say one asthma, what data point goes into the algorithm? Or is there a data point with some kind of standard deviation that goes in or something like that? [00:14:49] Speaker D: Yeah, in most cases it's like that mean or that average for the area. So not those distributional aspects. So I'll say poverty there is, you know, percent poverty, which by itself then is a percentage of people below poverty. So that's a little bit more of a distributional measures. But the others are really sort of an average. And you know, you can. For readers who are really interested, when you go to Calenbio screen, there's an indicators overview tab. Again, most people don't have time to dive into this, but they are transparent about it. But you have to dive a few layers down the website to get to it. [00:15:28] Speaker B: I can say for certain that nobody who has to use it does diving. [00:15:33] Speaker C: Exactly, exactly. To add some more color here, maybe I'll talk about two kinds of data measuring data collection practices that go into this and two kind of problems. So some of those 21 indicators are census based information. So that typically means the census, Census Bureau, federal agency does some amount of household interviewing and it creates averages basically at the census tract level. So a lot of those go into calenviro screen and then others are more environmental measures. So there might be something like air pollution and you'll have some number of environmental monitoring instruments throughout the state of California. And then you may not literally be monitoring every block, but they'll do statistical work to try to get to extrapolated or interpolated numbers across the different communities. So that's the kind of stuff going into the raw data. And then that can produce two big problems which you've alluded to. One is just the basic uncertainty in these measures, I'd say, particularly for those ones where you just have one air pollution station in a big county and you're trying to claim that there's this kind of difference, this variation across individual census tracts. There are a lot of uncertainties around those numbers and maybe bigger uncertainties for smaller communities. And that's not actually propagated through the tool as it's currently designed. The other problem when it comes to averaging is whenever you define a boundary like a census tract and you say there's just one number which is the average percent poverty or the average air quality, that sort of masks over the possibility of within census tract variation, smaller area variation. And in fact, there is a different kind of choice that could have been made in the algorithm. We get to it in our paper, where you could have picked the next smaller unit of census block groups as opposed to census tracts. And so a census tract as a whole might not have actually met the definition of disadvantaged. That does not mean that there might not be smaller pockets within that community that actually are some of the most vulnerable parts of the state. And that came down to whether you chose to average at a bigger or smaller geographic unit. And that's again a part of the fundamental subjective choices that are made in these types of algorithms. [00:17:35] Speaker B: You're listening to Sustainability now. I'm your host Ronnie Lipschitz, and my guest today are Professor David Rakoff and Derek Oyang, both of whom are from Stanford University, albeit from different units. And they're, I guess, members of a group of researchers who recently published an article about an algorithmic program, Cal EnviroScreen. And since it has, it's an open access and open source algorithm that made it possible for them to actually do some in depth analysis. Derek, maybe you can talk a little bit about the way in which the statistical analysis and the numbers that come out result from choices or decisions made by the people who are collecting the data and then the people who are analyzing the data, doing statistical analysis of the data. Because it seems to me that that subjective aspect which you mentioned is extremely important and it applies much more broadly to algorithms, as I understand it, understand them. [00:18:40] Speaker C: I mean, we talk about this so often at Stanford in research work, in our community work, in our classes, with students, where Stanford has this reputation for being a very technically oriented kind of number crunching kind of group, especially engineering students. And it's very easy to fall into the trap of thinking of algorithms as these tools that can create some notion of objective truth. And that view kind of focuses on just the math, the literal math that's happening given some number of inputs. It doesn't do incorrect math right in between to produce a single output. But it's all about the choices that are made to recognize that algorithms are tools that are designed by humans for particular purposes. And humans can have mental shortcuts of their own, we think heuristically. And when we have those mental shortcuts, a lot can get oversimplified and masked and nuance can get lost. And in fact, by just focusing on these types of tools for decision making purposes, we can forget that there can be a richness of deliberation that actually should go into value judgments here and into why these tools were designed. So it's almost like when you over design a tool to be simple, you can forget that actually sometimes we need to audit and really scrutinize the choices that are made here. We go through a bunch of different examples and it's only possible in this setting because this is a state public tool that is all about transparency. And in even those situations we can find all kinds of arbitrary choices that are made that would have led to a very different set of disadvantaged communities being designated. And we're not even necessarily talking about massive changes. These are just slight tweaks in the intricate workings of an algorithm like this. So we wanted to tell that story because just imagine if we can see this much in what's ostensibly a very transparent and accountable tool, just how much more of this kind of tweaking might be possible and the sensitivity might be possible in the design of even more black box algorithms that are used in other sectors, in particular in the private sectors. [00:20:43] Speaker B: Can you give us some examples of private sector algorithms? You know that and I know. I realize again, these are black boxes and you're not allowed to look inside of them. But the most recent one, of course we've been hearing about is TikTok's algorithm. And I have no idea what exactly that means. I'm not sure if anyone else does. [00:21:03] Speaker C: Absolutely. There's a big battle to open up that black box. And slightly less recent, but still relatively recent and just shows how much is happening in the news these days is the college admissions kind of story about the inputs. And I think this is actually that's a pretty good parallel to this case because we are not necessarily talking about like the Calvin and Hobbes good example of Ingredients go in and some kind of delectable treat goes out. Here it's information goes in, information goes out, and that information is maybe about who deserves to eat the treat and who doesn't. It's a question of allocating resources, just like you would allocate kind of who gets to go to a very limited number of slots in a higher institution. And when we're talking about allocating resources, we're fundamentally talking about questions of fairness and perhaps all the more reason to be really scrutinizing the sensitivity of those inner workings and whether the chosen qualifications, you're even looking at the input information makes sense or not. So, yeah, yeah, we're constantly seeing examples of where this can matter a lot. And this story allows us to at least peek under the hood of one such case. [00:22:11] Speaker B: How would you, you know, I mean, given. Given that you sort of like, you know, peeking under the hood of a, of a Volkswagen. Right. What's it going to tell you about other algorithms? [00:22:21] Speaker C: Well, I think, you know, what we showed here was surprising to us where under the hood there was so much variation in the outcome. And we talk about the degree to which the communities that are just shy of being designated as disadvantaged in so many alternate universes, slightly different ways of designing the algorithm could have been designated. And then we go on in the story to talk about the material consequences of that. Maybe David can speak to this, but just the amount of funding that could have gone to some communities instead went to others, and the implications on all the work we care about in terms of providing resources to build and rebuild communities. So I think about, in the kind of Volkswagen sense, so many problems that have a misallocation of resources that could be addressed if for a different way that we kind of thought about allocating resources. And it kind of speaks to just how important these allocation and these algorithm decisions are. [00:23:22] Speaker B: Did you want to add anything, David? [00:23:23] Speaker D: Well, just come back to kind of reiterate that initial point that Derek made about these. There's not necessarily anything technically wrong. I mean, there have been problems with other algorithms that made mistakes, made errors. Another reason why you want to have code available, have it transparent so people can find those things. But these would be things that smart people would all agree, like, okay, the math is correct here, but all those same smart people would say, I would do it this way, I would do it that way, I would do it that way, like a mean, a median, different ways of standardizing variables, different functional forms for combining them, and that there's not this one objective gold standard. And so what we're talking about here is these reasonable differences that result in 16. What we showed in the paper, 16% of census tracts in California, which are like 6 million people, could have been allocated in a different direction to receive funding or not receive funding based on good, smart people having differences of opinion on that algorithm. [00:24:29] Speaker B: What about the quality of the input data? Part of it is from census data. And people lie. People don't always tell the truth. Right. And I'm just thinking now, you know, you talk asthma is one of the factors that goes in is that into calenviro screen. And I presume that's based on visits to the er. So lots of people who have asthma never go to the er. So how can you justify. And I guess this is partly an ethical question, but I'm not sure it is. It's more a methodological question. How do you justify using particular sets of data whose flaws, you know, may be potentially quite serious? So this is the input question. Talk about the output question a little later. [00:25:16] Speaker D: Yeah, you're spot on with that point. And the asthma is a great example. So, you know, just reiterating that it's completely based on sort of who wants to engage with the healthcare system, who not wants to. Who has the capacity and has fewer barriers for engaging with the healthcare system, which you think would be inversely correlated with a lot of things we're trying to capture in this algorithm in terms of socioeconomic characteristics of areas, for example, or amount of health care resources. So potentially very problematic. I think one of the ways, there's no magic to it, this is bringing experts together, thinking about the biases in these measures and thinking about, can we get other data? Can we get better data? Can we use also data from California Health Interview Survey or other types of data sources from primary care that measure asthma and kind of triangulate between these sources to try to get at something where the biases are sort of orthogonal to each other. The other thing I would say, though, is that's kind of why. That's one of the advantages of patent potentially having something that's a bit more complicated with more measures that you're not totally reliant on that single measure. So you hope that the direction of your biases or measurement error might be different across these 21 different things. And, you know, they would all do. You'd be able to do a little bit better job by kind of averaging them out. Of course, if they're all biased, you know, that doesn't help. So it doesn't automatically solve the problem. But you're trying to hedge against particular errors for measures by having a measure that's a bit more comprehensive or an index that's a bit more comprehensive. [00:27:06] Speaker B: You know, Derek, you mentioned heuristic, and it was a term that I. That I had in my set of questions. I sent you that. That idea that we all on a daily basis usually have heuristics by which we make decisions. We don't sit around and do detailed calculations about these kinds of things. And of course, this is human history. I have no idea where to start the questions I want to ask, and I'm not quite sure how to frame it. Has anybody looked at the results of an algorithm as compared to other decision making methods about similar kinds of questions? And I'm thinking here you have a panel of experts who are looking at certain data sets and also pulling in, you might call folk wisdom. Do you get a better set of results with the algorithm as opposed to having a group of people sort of more or less eyeballing things? That seems to me to be also an important sort of comparison. [00:28:08] Speaker C: Yeah, I mean, that does get to the heart of whether to use an algorithm or not in these types of decisions. And I think a baseline alternative would be, you know, how would these kinds of decisions have been made in different governing institutions prior to the ability to do number crunching? And has number crunching actually brought us in a helpful direction? But I think there's so much nuance in what you articulated there, because the definition of better is the first question there. And I think that's timeless, that is somewhat agnostic to the tool that's been made. Is the question of is it better a case in which you're improving the most for everybody or reducing the difference in outcomes across groups? And Cal enviroscreen very clearly is signaling that it's about trying to correct for the inequities that might exist as a result of different types of environmental harms that have been perpetrated or have just occurred across different communities in California? And it's attempting in a very transparent way to define a way to close that gap by ranking communities. I think that kind of process, you can imagine an alternative in which it's about individual communities making maybe a more qualitative case for that or different other forms of judgment that are made by representatives without a score being involved. That would be, I think, a really interesting scientific question. But we don't have a kind of real case of that at the scale we're talking about for encal. Enviro screen. But I think it's also helpful to maybe not set those as so different and so dichotomous because I think a big part of our argument is that it might appear to be this kind of machine based, computational objective approach that is freed from the potential whims and biases of quote unquote folk wisdom. But you also had a set of human designers that have snuck those in, whether intentioned or not, into just all the choices made here. So I think you can't totally escape from the fact that there's subjectivity all the way down in these types of tools. [00:30:10] Speaker B: Good answer. You're listening to Sustainability now. I'm your host Ronnie Lipchitz. My guests today are Professor David Rakoff from the Stanford Epidemiology Department and Derek Oyang, Senior research Manager in Stanford's Regulation and Evaluation and Governance Lab. Before we go back to what we were, you know, to pursue what we were talking about, what does that lab actually do? [00:30:33] Speaker C: The REG Lab for short, works on a wide range of topics, but we tend to focus fundamentally on improving government and that can be at federal, state, to local scales. We think a lot about the brain drain problem of a lot of the technical talent of Stanford going into private industry and government teams tackling really important problems, like the problems calenviroscreen is attempting to tackle being under resourced in our current setting. So we are looking for ways to bring evidence based policymaking and tools, particularly to the public sector. [00:31:07] Speaker B: Can I assume that the work that you're doing there is somewhat similar to the, to this particular research project in terms of thinking about how algorithms can help improve governance? I mean, is that what you're up to? And are there other considerations then besides the ones we've been talking about? [00:31:27] Speaker C: Well, Account Viral Screen is an example of the kind of tool that might be wielded by a set of policymakers. And we see, well, California Screen is just quite ubiquitous in the California context. But we see examples of what I'd say more generally are algorithmic indices. These aspirations to take a lot of complex information, to distill them into some single numerical score that creates a clear ranking system and then to allocate resources based on that need. And that happens in environment, in social services and health, all across our communities. And in our work, we tend to actually bring quite a bit of scrutiny to the otherwise kind of unfettered use of such tools and maybe the assumption otherwise that they are going to be blameless and flawless ways to make decisions. And in our lab, because we can do these types of algorithmic audits like we do in this project, or just want to spend that time really helping stakeholders get into the weeds, get under the hood there. We tend to leave that process having helped our community partners think about the use of such algorithms more responsibly. And I think that's the ultimate lesson, is to not get kind of lost in the sense that it's this clean solution that is magically taking some inputs and getting exactly the right objective output and hopefully helping to build a kind of maturity around. You should always be asking the kinds of questions we're asking here in this conversation. [00:32:54] Speaker B: So you do consult with people who are sort of in the front lines making decisions. [00:33:00] Speaker C: It's often government partners saying, hey, we would love Stanford's help on this fancy way to solve this problem. And we usually don't directly provide that service. We want to get into a kind of deeper nuanced conversation about why they want to solve a problem in a particular way, what those implicit value judgments are. And so we tend to not just give directly what a consultant might give, we give a much more kind of grounded and holistic kind of dialogue and relationship with these partners. [00:33:27] Speaker B: How do they respond to that, to the idea that you're going to give them a more nuanced understanding and relationship? I mean that certainly must complicate their lives. [00:33:36] Speaker C: Yeah, some walk away from that honestly because maybe just they don't have the time. Back to the core problem of being under resourced. They're looking for answers, they're looking for solutions. And in the best case scenarios we can co produce this kind of knowledge, co publish with real decision makers, real stakeholders and then research, academic research can be a kind of megaphone for sounding the alarm at times or just providing this useful insight so that that can become a kind of more grounded intuition for many, many more decision makers across the country. [00:34:08] Speaker B: EPA has a similar tool, right? Yep. [00:34:11] Speaker C: The Environmental Protection Agency at the federal level. I can't get the full provenance, but I'm pretty sure they have been looking from the sidelines at Cal Enviro Screen for a while because it really was ahead of its time. And what the EPA did in this last administration is put together a pretty similar kind of multi measure index called EJ Screen. And it's been built into a lot of the kind of allocation resources decisions of federal funds, just like we're talking about at the California level, but even. [00:34:42] Speaker B: Nationwide, I imagine that's going to disappear at least for four years. It might be revived depending on, you know, who's president in 2028. Now, you know, I read, I would say the popular press, and I read a lot. You know, I see a lot of articles about artificial intelligence and. And I. Artificial. And I. AI designing algorithms. How. How does that work? Do. Do. If you. If you try to do that, does it start with. Does AI, whatever that means might be necessary to clarify that. Start with an existing algorithm. Like, would you. If you took the calenviroscreen algorithm and you wanted to somehow improve it or make it, you know, more detailed or something like that, Is this something that you could put in front of, you know, some sort of AI program and say, do this. [00:35:44] Speaker C: You want to take that, Dave? [00:35:46] Speaker B: I think of me as stupid. [00:35:49] Speaker D: You certainly could do that. Yes. [00:35:52] Speaker B: Well, when somebody talks about AI writing algorithms, what does that mean? [00:36:00] Speaker D: Yeah, exactly. So I think you could do that. I mean, you could take any algorithm, put it into any number of sort of neural large language models and kind of let the. Let the AI essentially bring in other sorts of algorithms, bring in other information. I mean, this. I think the real scary sort of part of that or the concern is that the lack of transparency in what's happening, what's kind of coming into that, what the suggestions would be. I mean, I think at the heart of it, you're onto something good, which is, you know, I was thinking about this as Derek was responding to your last question, that, you know, so many people put so much work into just evaluating these at one point in time or developing them at one point in time. We want an algorithm for this, we want to do this new thing, and then there's no sort of revisiting that, seeing how it works, evaluating who is affected. So, I mean, I think the thing that's exciting, the positive thing about what you're suggesting, there is developing a process for kind of continually evaluating over time, bringing in all sorts of tools, AI being one of them, along with sort of communities, along with broader crowdsourcing. I mean, you were talking earlier about sort of there's no objective truth. What is there? I mean, I think it would be amazing to have sort of polling data from the public, not just commissions or not just smaller community groups, but also as a source of data. What do you think about this area? Do you think this area, you know, is a community that's been ignored, that's been subjected to more pollutants than others, you know, in addition to some of the quantitative information? So I think it could play a role. I think, you know, there's a lot of exciting things about the amount of information that could be brought together, but a lot of real cautions about the transparency of that and how that would be balanced by their interests. [00:38:08] Speaker B: So maybe you can talk a little bit about more about the variations in outcomes that you discovered. How did you go about doing that? How did you vary inputs and what were the results that were produced? You mentioned 16% earlier. And so I'm asking you now about the sort of the mechanics of this process of trying out different parameters. [00:38:41] Speaker D: So there's this whole universe of possible models and we wanted to explore things in the paper that would be reasonable choices that someone could make. So this is roughly divided into differences in the pre processing model, like what we did with the data coming in, how the data were aggregated together and then in particular the health metrics that were picked because I don't want to say they were arbitrary, but there's like a whole landscape of other sort of health metrics that could be picked. So across those three areas, we varied a few different things in the model. So for example percentile rating versus Z score standardization and then looked at what the results were looked like with those different assumptions. And so over that full parameter space then we got these sort of confidence bounds of where, where a reasonable model might be predicting and that's where we got that 16% from. [00:39:36] Speaker B: And can you say some more about, you know, I mean, the significance of these sort of variations in inputs? Because even there you may have different results depending on not, not so much the data, but how the research or the accumulation of the data has been done. Have been done? No, has been done. [00:40:01] Speaker D: Yeah. So we talked earlier about the health outcomes. And so rather than relying on ER data, we use survey data and survey indicators from thinking about chronic kidney disease, cancer, COPD instead and looked at what those differences might be. [00:40:21] Speaker B: I mean, are there surveys of individuals or are they surveys of doctors? I mean, who's the source of that information? [00:40:32] Speaker C: This is the CDC Behavioral Risk Factor Survey. So it is individuals and it's akin to census household surveying. And it's just another potential reasonable choice if you're trying to get to anything like granular health information, but of course comes with its own imperfections and its own limitations. So I think in this process we were looking at the choices made in the current real cal enviro screen and trying to make just one degree kinds of adjustments to what could have been a really reasonable alternative choice that was still trying to do the best with the data available. We couldn't necessarily compare to any deeper notion of ground truth of health outcomes. But Our point is more just that subjectivity that's built into the reasonable choices that would have been made by experts, by these types of decision makers. I think for a kind of. For the general audience, it might be helpful to imagine that in this paper we were looking at as many of these alternate universes, in a marvel sense, of all the possible ways that calenviro Screen could have been created. It's not itself truly infinite, but we try to chart a reasonable set of alternative realities. And then we show how if you are imagining ranking all these census tracts and there's the one actual ranking, but here are the thousands of other rankings, and we show that that single marching order is really this cacophony of different potential ranking orders. And that's the core of the story we're telling, which is that you shouldn't hold the one current notion of rank order as this absolute or objective reality. And the fact that there's this much more variation should shift the conversation towards really scrutinizing those underlying choices and not so much on just the downstream consequences and resoluteness of what is in fact in place. [00:42:34] Speaker B: Have you seen that happening? [00:42:36] Speaker C: Well, as we talked about before, I think it happens right now in this more decentralized way. It's individual community groups, maybe through passively, aggressively, through the ways they're applying for funding, requesting variances, requesting exceptions. There is a notice and comment process, as you would expect from public processes like this, but only so many groups have the wherewithal, the interest, the capacity to issue a formal request. And so I think hopefully this paper provides some materials of substance for multiple stakeholders to maybe make a more concerted and targeted set of recommendations. We of course, make some of these recommendations ourselves and are hopeful that Cal enviro Screen itself is updating on some kind of cadence. It's going through versions. And the more this story is told, the more we can kind of reveal a little bit of what's under the hood and put that in language that is communicable to different stakeholders, then hopefully that just helps bring more momentum to particular possible changes to the stool over time. [00:43:44] Speaker B: This is sustainability Now. I'm Ronnie Lipschitz, your host. My guests today are Professor David Rakoff and Derek Oyang from Stanford University. And we've been talking about a recent paper they co published with a number of colleagues looking at Cal enviroscreen and analyzing the algorithm behind the program and how you can get different results depending on what you put into the algorithm. And those different results can have serious impacts in terms of how resources are allocated. You write in the article, it is feasible for a politically motivated internal actor, whether subconsciously or intentionally. And I think the intentional part is more interesting to prefer model specifications that designate tracks according to a specific demographic, such as political affiliation or race. Now, can you explain how that might happen in particular with Cal Enviro screen? How could you do that? How could an internal actor intentionally manipulate the results? [00:44:56] Speaker D: Yeah, so, I mean, we've been talking about all this, the Marvel universes, universes of different potential models, alternative universes you could go through and pick the one that would designate funding to the places you would. Where your constituents were, for example, in a politically motivated way. So you could just. It would be very simple to go through and do that. So the way to avoid this really is clear processes of transparency about the model and where the power is for creating the algorithm and picking the algorithm. So we tend to think that the best way to do this would be combining across results, across all the possible best models. But at the very least, you should kind of defend why there's choices being made and show what those alternatives are to just sort of prove to the public that you're not cherry picking the best model that you would just want. And so I think having these transparent public processes, having sort of a technical panel that sort of evaluates the best potential models and making this all clear, can really avoid these type of problems. [00:46:14] Speaker B: Joe, who wrote this particular algorithm or this particular program? I mean, not individuals, but who was involved in doing it? Do you know? [00:46:24] Speaker D: I mean, my understanding people at the State of California Office of Environmental Health Hazard Assessment. Yeah. Is that right, Derek? Yeah. [00:46:33] Speaker B: They actually had the capacity to do this kind of work. [00:46:37] Speaker D: Yeah. And it's been. I want to make sure that comes out loud and clear about what an exceptional thing they have done, both in doing this for clear environmental justice objectives, making this transparent, putting the data out there so folks like us can kind of look at it. And this is really exactly how the process should be working. I mean, this is calenviroscan 4.0. There's going to be hopefully a 5.0. And each time we kind of go through these things and get more input and try to make it better and better. [00:47:09] Speaker B: Have you, have either of you been involved in that process or are you still just sort of external critics and reviewers? [00:47:18] Speaker D: I can say from. I'm not sure if Derek's been doing more lately. We definitely have been chatting with them about it. So they've been, you know, before, before the paper came out, Ben sent Them sent them a copy of it, giving them a heads up what we're working on. Been in communication with them. They've been super interested in what we're doing. You know, learned a ton from them. Really just incredible, thoughtful folks. So yeah, been definitely a great conversation as part with them as part of this paper. [00:47:45] Speaker B: Are you going to pursue any of this? I mean work with algorithms and do any. I mean I'm not sure you, David, probably have other things on your plate, right? [00:48:01] Speaker D: Well, yeah, we both do. Derek's probably busier than I am. So the one area first of all want to kind of keep working on Calvin Vital Skin in particular, thinking about how we can be helpful in sort of advising 5.0, making a lot of public comments. I'm working on a lot of some of these same issues for social deprivation indices that are used for payment adjustment to physicians. So this is another type of algorithm that's coming up. So when folks have more complex medical needs are living in underserved areas, should physicians get more reimbursement to give them more time to work with those patients? So there's a lot of these area deprivation indices that are out there and we're doing some analyses of sort of the equity implications for the use of those indices. [00:48:52] Speaker B: Can I ask is this research that was self motivated or are you doing this for some sponsor or something like that? [00:49:02] Speaker D: Yeah, so a little bit of both. But it's been the center for Medicaid and Medicare Services CMS has been interested in the work we're doing. So we've been in conversation with them to try to look at the measure they're using. What are the effects of that? Are there better measures or alternative measures that they could be using? So it's been in conversations with people mainly in the federal government. [00:49:28] Speaker B: And how about you, Derek, doing some. [00:49:31] Speaker C: Related work as well. We're trying to finish a paper right now that's looking at the bipartisan infrastructure law from this past administration which is similarly focused on environmental justice concerns and allocating particularly drinking water infrastructure improvements. That's what we focus on in the paper across all 50 states. And then there's also this notion of a quote unquote, disadvantaged communities. But not to confuse the audience even more, but that doesn't necessarily mean the same thing as what we've been talking about with Cal Enviroscreen. And even California's definition of disadvantage for the purposes of the bipartisan infrastructure law is not the same as CalEnviroScreen. So it just is, you know, speaks to again the subjectivity of decisions here. But in that study we're really recognizing that what's interesting about the BIL case is a kind of part of the, quote unquote bipartisan negotiation that happened was, you know, this is going to be funding that's nationwide for environmental justice, but every state gets to decide what they mean by disadvantaged community. So we can actually track how individual states potentially updated and changed their definitions, post the funding becoming available, and see how the demographic composition of communities that were eligible changed over time. The composition of communities that got funded changed over time. And the long story short is that there's quite a bit of variation and that might move in kind of predictable ways based on political affiliations and so forth. And so I think it just shows that besides calenviro screen, the case we focused on in this paper, there are so many potential situations and contexts in which the lessons, hopefully that we can learn from what we did here in California can provide insights in other cases. But I also do remain, like David, committed to the possibility of directly improving and providing ideas for calenviroscreen 5.0, because California really does lead the charge in many of these situations in, in showing what is possible and exploring the potential of data driven methods. So I think it's helpful to imagine that if we can get more of these things right concretely in whatever version 5.0 is, and if more stakeholders were involved and really understand this notion of adding the word humility next to algorithm development, those usually don't happen in the same sentence, but I think it's basically about trying to get those two words in the same sentence more often then that can really propagate, I think, to other places as well. [00:52:04] Speaker B: I have to say, cynic that I am, that it sounds like greater humility would also open up things for greater litigation. You know, you got to be careful sometimes what you wish for. So, David Rakoff and Derek Oyoung, thank you so much for being my guests on Sustainability Now. [00:52:22] Speaker C: Thanks so much for having us. Thank you. [00:52:25] Speaker B: You've been listening to Sustainability now interview with Professor David Rakoff of the Department of Epidemiology and Population Health at the Stanford University School of Medicine. And Derek Oyang, Executive Director of City Systems and Senior Research Manager at Stanford's University's Regulation Evaluation and Governance Lab, also known as the reglab, about the operation of algorithms in decision and policy making, with a particular focus on on Cal enviroscreen. If you'd like to listen to previous shows, you can find [email protected] sustainability now and Spotify, YouTube and Pocket Casts, among other podcast sites. So thanks for listening, and thanks to all the staff and volunteers who make K Squid your community radio station and keep it going. And so, until next every other Sunday, sustainability. Now. [00:53:28] Speaker A: Good planets are hard to find. Now. Tempered zones and tropic climbs through currents and thriving seas, winds blowing through freezing trees, strong ozone, unsafe sunshine. Good planners are hard to find. Yeah.

Show Notes

Episode Transcript

Other Episodes

Episode 35

Are we Becoming “Plastic People of the Universe” Or, What does “compostable” really mean?

Episode 121

Is Solar Energy a Commons Belonging to Everyone or Private Property only for the Well-off? with Professor Kathryn Milun of the Solar Commons Project

Episode 129

Protect San Benito County!