Turning the Light of Truth: Data Science as a Practice of Resistance

Author

Phil Chodrow

Published

October 15, 2023

About

This is the draft of a speech I gave at Middlebury TEDx on the topic of data science and social justice on November 11th, 2023.

Acknowledgments

Parts of this speech reflect the influence on my thinking of people like Chad Topaz and Nathan Alexander. I’ve attended several of their talks and adapted some of their ideas and insights into this speech. I’m grateful to have learned from them and appreciate the opportunity to pass on some of what I’ve learned to the TEDx audience.

Draft Text

Introduction

Hi everyone.

Thanks for joining me today. My name is Phil Chodrow. I’m a professor of computer science here at Midd. Some of you may know me from classes in introductory computing, math foundations, or machine learning. If you’re in the room and we’ve met, it’s great to see you again. If not, I hope we’ll get the chance to meet in the future.

In college, I had a love of abstract mathematics. I loved all the symbols and how they fit together; the way that when you solve a math problem you really know that you’ve gotten it. When I was doing math, I really felt like I had managed to touch something true, and that felt great. When I left college, I…flailed around a bit…and wound up as a data analyst at a healthcare nonprofit. This was a pretty transformative experience for me in a lot of ways. One of those ways was a new interest: I was excited about using data and data science in order to make the world around me a little better. This interest caught fire in me and eventually pushed me to grad school, where I kept working on problems at the intersection of math, data science, and our social world. These days, my research still includes a lot of fun math, but now it also includes data science for sustainability, equity, and justice.

In the last couple years, I’ve developed new ways for measuring segregation and inequality; written an analysis of law enforcement data that was submitted to the Supreme Court; studied racial disparities in criminal sentencing in Virginia; and written an article on how folks in my scholarly community can contribute to justice causes. I’ve brought all this work here with me to Middlebury.

This work—data science for social justice—doesn’t belong to me, and I didn’t invent it. There’s a long tradition of folks using data science to resist oppression and build a better world. I want to spend most of the rest of our time together sharing a bit about this tradition and about the role of data science in our present day. Then, since I’m a professor here, I’m going to finish up by leaving you with a little homework.

The Light of Truth

The theme of this event is “Truth and Dare.” Not “or”—“and.” When I first heard this phrase, I immediately thought back to a famous line from the civil rights activist and reformer Ida B. Wells:

The way to right wrongs is to turn the light of truth upon them.

Now, you can see our theme running all through this statement. If we dare to right the wrongs of the world, then we do so with the light of truth. We need to tell the world what is happening. For Wells, this looked like telling the world about escalating violence against African Americans—including lynching—in the America of the 1880s and 1890s. The truth in Wells’s work of this time is a combination of in-depth, investigative reporting and carefully compiled statistics describing the brutal extent of lynching and its impact on Black Americans. The dare of this was work was literally putting her safety on the line. After her anti-lynching advocacy began to make waves in both the American South and the North, Wells’s newspaper in Memphis was violently attacked. She herself continued her work in Harlem, New York and in Chicago, Illinois.

There’s a rich history of African American civil rights advocates turning the light of truth on the wrongs of the world with statistics and data science. Here’s another famous example. Leading up to the 1900 World’s Fair, Dr. W. E. B. Du Bois and a team of students at Atlanta University created a series of vivid data visualizations describing the conditions of life for African Americans in Georgia and the greater United States. Du Bois described his vision for this exhibit:

…An honest and straightforward exhibit of a small nation of people, picturing their life and development without apology or gloss, and above all made by themselves.

These visualizations are all hand-drawn representations of the reality for African Americans. Not all of them have an explicit agenda. Some of them are just maps of how much land was owned by African Americans in the state of Georgia, or simple visualizations of who worked in what professions.

Others are much starker descriptions of the impact of violence and systemic oppression. For example, here is a description of the ages of African Americans in Georgia as compared to Black residents of France. Each bar represents the proportion in an age group – for example, the top black bar says that 30.1% of African Americans in Georgia were under 10 years old, compared to 17.5% of Black residents of France in the yellow bar. If you see the black bar shrinking down much faster than the yellow bar as we move down the page, that is telling you a grizzly story – African Americans in Georgia were much less likely to grow to old age when compared to Black residents of France, due to violence, systemic discrimination, and disparity. We can see echoes of this plot in the US to this day, where Black Americans still have lower life expectancies due to systemic disparities in healthcare, educational access, and economic opportunity.

Now, they didn’t have this phrase 100 years ago, but these days we can recognize what Ida B. Wells and Dr. W. E. B. Du Bois were doing data science. Both of them were collecting rich, complicated data; analyzing it through the lenses of expertise and lived experience; and communicating their findings in careful, honest, and effective ways. They use data – representations of collective facts – to turn the light of truth on systemic injustice. Their data science was a practice of resistance.

Our Present

So now let’s turn to the present. These days, we’re seeing data and data science on unprecedented scales. If you have a laptop that holds 256 gigabytes of storage, like mine does, then you would need one billion (with a B) of those just to hold the data that we produce in a single day. Increasingly, corporations are thinking about how to make profit from all this data, and they are hiring people to do it. Job demand for data scientists exploded over the last decade, and is expected to continue to grow.

If you look at your day-to-day life, you are surrounded by data science. When you get a recommended post, or a targeted ad, or even just a spelling suggestion when you text, you are experiencing the work of millions of hours of data science at some of the world’s largest corporations. But here’s the thing. It’s not just the little stuff day-to-day. Data science is going to play a role in some of the biggest decisions of your life.

  • Are you applying for a job? Your resume, or even your face, might get scanned and screened by machine learning systems.
  • Are you trying to rent an apartment? You might be screened by a predictive algorithm that estimates your “willingness to pay” rent each month.
  • Are you sick? Summaries of your visits to the doctor, recommendations for your care plan, and recommendations for what your insurance should cover may be informed by generative language models.
  • Do you live in the jurisdiction of a police department that uses predictive analytics? You might be recommended for a watch-list based on aspects of your identity and residence, regardless of your prior history with the law enforcement system.

So, data science technologies don’t just feed us content and ads – they are also with us for some of the biggest decisions about where we will work, where we will live, what care we will receive, and whether we will be free. And, as you might have seen or even studied, so many of these tools reinforce sexism, racism, and economic inequality. Now, this is not a new thing – data as a tool of oppression and control has a long, dark history. I just want to remind you that this history is not just history – it’s also our present.

Now, this might feel a little different from the data science of Ida B. Wells and Dr. Du Bois, right? When we look to them, we see data science as an act of resistance against oppression and control. This liberatory kind of data science – data science as resistance, data science for social justice – is also in our present. To my mind, this work has taken two different directions into our present day. I want to highlight a little about each of those two directions before I finish off with a challenge.

Critical Perspectives on Data Science Technologies

The first direction I want to highlight for you is critical perspective on data science technologies. When we take a critical perspective, we ask ourselves questions like:

  • What are the actual, measurable impacts of data science technologies on human beings?
  • Who benefits from the spread of data science technologies? Who is harmed?
  • How do benefits and harms of data science technologies align with historical and present racism, misogyny, classism, and other systems of oppression?

Critical perspective most frequently comes from people who have identities or experiences that help them understand marginalization and oppression. With this in mind, I’d like to briefly point towards the work of many expert women of color who are bringing critical perspective on the role of data in our world today.

Dr. Safiya Umoja Noble is the author of the book Algorithms of Oppression, a deep dive into how search–the precursor of contemporary generative machine learning–can reinforce deeply harmful stereotypes related to race and gender. Dr. Joy Buolamwini is a computer scientist who played a leading role in uncovering intersectional gender and racial bias in widely-used computer vision systems. She collaborated in this work with Dr. Timnit Gebru, who has also written on the limitations of large language models and the ideological currents expressed by certain influential leaders in artificial general intelligence.

Data Activism Today

The partner of critical perspective is data activism – using data to make a difference. I started us off with early data activism in the work of Ida B. Wells and Dr. W. E. B. Du Bois. This work continues in our present. One national-scale example of modern data resistance comes from the Mapping Police Violence project. The team behind this project collects data on every instance they can find in which law enforcement takes a life. In addition to publishing this data in raw form for anyone to use, they also publish the dashboard that I’ve highlighted here. An important feature of this project is the attention they pay to data quality: they use a multi-step process in order to gather reliable data. Then, they present it with a focus on the facts. Yes, there is an activist agenda here as well, but it’s separate – the purpose of this project is to produce the high-quality data projects that support activism and advocacy.

Another example, closer to home, comes from folks working to combat homelessness here in Vermont. As you may know, our state have the second-highest rate of homelessness in the US, behind only California. In a 2023 report, the Vermont Coalition to End Homelessness and the Chittenden County Homelessness Alliance presented a pretty damning picture for our state: the number of individuals experiencing homelessness on a single night in our state has doubled compared to similar counts five years ago. This count appears to still be on the rise, due in part to rising rents and the end of pandemic era housing supports. This is challenging data science work, from data collection to interpretation to presentation to advocacy. These organizations are daring to turn the light of truth on a deep social wrong.

My challenges for you

So to you all, especially the students in the room: obviously, since I’m a professor, I have homework for you. My homework comes in the form of challenges. Dares, if you will.

If you are already powerful in data science…

First, some of you in here might already be powerful in data science. Maybe you’ve taken some classes in mathematics, statistics, or computer science. Maybe you’ve declared a major, or done a data science project in one of your classes. My dare for you is to ask yourself:

How can I use my skills in data science contribute to a more just, equitable, peaceful, and sustainable world?

It can actually be pretty difficult to know what data science work is going to make the world more just, equitable, peaceful, and sustainable. That’s the lesson we have to learn from all the stories about bias in modern data systems. No one plans to make a racist machine learning model. It happens through systematic negligence, unaware privilege, and failure to learn about the experiences and identities of people different from you.

So, to use your powers for good, approach your work with humility about the many, many things that you do not know and have not experienced. Stay humble, stay curious, and listen. Listen to the people your data is about. Listen to the folks who might be impacted, for good or for ill, by your work. Listen to activists. Listen to expert women of color. Listen, challenge your assumptions, and learn.

If you are an activist…

Now I also want to address my activists and resisters in the room. Y’all are on the front lines fighting for a world with less oppression, discrimination, inequity, and violence. First of all – thank for your courage and for fighting your fight.

Your insight, from your experience and the experience of the communities you fight for, is needed as we seek a more inclusive and equitable world of data science. Rightly or wrongly, to get yourself in the conversations that matter most, it’s necessary to speak the language – data, computation, statistics, figures.

So, my challenge for you is simple: level yourselves up with data. Seek allies who are gathering the data that your movement needs – or become the person who’s gathering data that your movement needs. Learn to work with your data, understand its power and its limitations. Here at Midd we have a lot of ways for you to do that, and we’re getting more every year. Level yourselves up, get in the room, and be heard.

Wrapping up

Y’all. Data science is a part of our present day, and it’s complicated. Data science a fun and exciting way to apply skills to the real world. Data science is an oppressive and inequitable tool of racialized corporate power. It’s both/and. Today, I’ve encouraged you to look to some folks in our history and our present day to see the promise of data science as a practice of resistance. I try to take some inspiration from their work, and I hope that you will too.

So. You see wrong in the world? Turn the light of truth.

Thanks y’all.

Sources and Inspirations