Ranjit Singh reflects on the possibilities and limits of red-teaming for AI accountability
October 2023 was a busy month for Ranjit Singh, a Senior Researcher at the Data & Society Research Institute. Singh was reappointed as a Siegel Research Fellow. He was recognized by Business Insider as one of the top 100 people in artificial intelligence. And he co-authored an influential policy brief on AI red-teaming, the practice of trying to produce undesirable outcomes in an effort to identify and fix harms in generative AI models. Following up on this whirlwind of activity, Singh and a co-author had a paper on AI governance accepted by Harvard Data Science Review in December.
We sat down with Singh to learn more about what red-teaming looks like in an AI context; the accountability mechanisms that he believes should accompany red-teaming; where he believes responsibility lies for imposing accountability; and how we can make systems work for ordinary people. In our conversation, Singh also shares his collaborations with other Siegel Research Fellows, his upcoming research projects, and what’s on his reading list.
You occupy a unique position within the computer science world. You work at the intersection of technology and society. What does that perspective offer?
I’m trained as a software engineer. I studied information and communication technology as an undergraduate. But when I started looking at the intersection of technology with social issues, I was increasingly drawn to a field called “science and technology studies,” which I studied as a graduate student. It is an academic discipline focused on studying how science, technology, and society come to shape each other. As a classic text in the discipline puts it: “solutions to the problem of knowledge are solutions to the problem of social order.”
The science and technology studies approach is also embedded in the ethos of the research projects at the Data & Society Research Institute. At Data & Society, I’m focused on thinking through AI issues, including the infrastructure that is needed to build data environments that work for all people, especially people in the Majority World.
I’m also working on understanding the impact of algorithmic systems in the real world. In this context, there are a number of pre-existing regulatory mechanisms for impact assessments in various domains such as environment, finance, human rights, and privacy. I’m interested in what we can learn from these existing evaluation practices. I want to push back against the notion that we need to start from scratch to understand the impact of AI and algorithmic systems because they are too complex.
Red-teaming is one of the strategies that you’ve studied most deeply. What is red-teaming and how is it used in an AI context?
Red-teaming is one way that we can evaluate algorithmic systems and identify their potential impacts and harms. I should start by noting that consensus or formal definition for red-teaming in the generative AI context specifically is still emerging and it will take some time for it to become standardized. Broadly, the purpose of red-teaming in war games or cybersecurity is to attempt to poke holes in a plan or an application, where the undesirable outcome such as losing the war or getting hacked seems clear to all parties.
In red-teaming we imagine how things might go wrong in order to identify previously unknown possibilities of failure. We can’t be exactly sure what outcomes will occur, so we rely on consensus-building and deliberation over its results to gauge success.
Red-teaming is often internal to companies; it is such a critical part of the product development cycle that it has also made its way into the process for developing algorithmic systems. For example, Microsoft has been working on the intersection of red-teaming and responsible AI for a while now, but it is not the only company investing its resources in this space. Most companies with foundation models have been working on this front with varying strategies. Open AI has a red-teaming network; Google’s DeepMind has been working on using language models to red-team language models; and Anthropic’s focus has been on red-teaming their language models for frontier threats such as designing biological weapons.
Generative AI has broken the barriers around the kind of skill set needed to do red-teaming. Because the prompts are written in natural language, it opens up the possibility of a very broad public participating in identifying flaws within a system. Cybersecurity culture has always allowed anyone from the public to identify bugs, but you needed deep knowledge of how the systems work in order to participate. Now anyone can write a prompt and report that they got a weird output.
That’s led to events like last year’s generative red-teaming event at DEFCON 31—one of the world’s leading cybersecurity conferences—where thousands of attendees participated in a red-teaming contest that was supported by the White House, federal agencies, major AI companies, and non-profits working in the space of AI accountability.
Companies have also set up their own red-teaming games that people can play. A great example is Gandalf, where you’re tasked with getting a chatbot to reveal a password. Another one is the Adversarial Nibbler where the focus is on red-teaming text-to-image models.
These are all mechanisms for encouraging people to experiment with how systems produce outputs. These outputs might be offensive or may not represent the original intention of how the interaction should go. From them we can learn about new failure modes for generative AI models.
Given that auditing and other forms of AI accountability exist, why has red-teaming become such a key part of AI governance?
Let’s step back and consider what red-teaming stands for when it comes to AI governance. We are looking for a governance mechanism that can match the scale at which conversations with models are happening. I am using “conversation” here as a shorthand for prompting and getting a response from a model. We are trying to find a method that can help us first identify problematic responses and then secondly measure how often they happen. Red-teaming addresses the first challenge well. When employed consistently over time and across red-teamers it can also address the second challenge.
Once we can identify and measure problematic responses, we can start working towards thinking through whether they indicate a systemic vulnerability of the model that needs to be mitigated. There is so much enthusiasm around red-teaming as an AI governance strategy because it gives us a way to prevent potential harms from the system to people and society broadly.
The problem is that red-teaming cannot potentially cover the gamut of issues that generative AI systems raise. It cannot guarantee that the system is “safe” for all possible interactions because the universe of possible things that can go wrong is always going to be broader than what can be tested through red-teaming. We risk creating a false sense of safety if we rely solely on red-teaming as an evaluative mechanism.
Instead, we need to use red-teaming in combination with other accountability mechanisms, such as impact assessments, external audits, participatory governance, and regulation on grievance redressal in high risk use cases. Despite broadening the field of who can participate, red-teaming is still fairly limited to evaluations at particular moments of time. It does not cover the longer timeframe of engagement with the lived experience of harm that people may face as these systems get increasingly embedded in our everyday life. So, despite so much interest and enthusiasm around it, we should understand that red-teaming is not a replacement for other forms of public oversight.
What implications does the fluidity of the definition and practice of red-teaming have for standardization and regulation?
This definitional problem is a real challenge for regulation. You can require that companies do generative AI red-teaming, but there’s no clear consensus and agreement on what it should entail. Companies can simply say that they did red-teaming. That’s why the focus of the recent Executive Order on AI is so important. It has asked the National Institute of Standards and Technology (NIST) to figure out what the standards around red-teaming should be. It will help to clarify what is an audit versus what is a red-teaming exercise versus how actual usage translates into a form of evaluation, for example.
For example, the work of Virginia Eubanks on automating inequality is a form of evaluation of the use of automated systems in organizing welfare services in the U.S. Of course, this kind of deep, on-the-ground engagement does not fall under the scope of red-teaming. It’s talking to people, trying to understand how they’re living with these systems. But in the current fluidity of the way in which red-teaming is positioned, it feels like it is the only “legitimate” evaluation practice of generative AI models.
I have been working on a collaborative research project between Data & Society and AI Risk and Vulnerability Alliance (ARVA) that is trying to understand the spectrum of practices that make up red-teaming. We hope to describe the conditions under which red-teaming produces data and insights that illuminate a representative range of actual harms that generative AI models can produce. We also aim to identify actionable solutions to patch systems that use these models in order to address those harms.
Given the fuzziness around defining red-teaming, who should be responsible for conducting it? Who should hold companies accountable?
That’s a great question. One of the things that we’re working on at Data & Society is what the best structure of accountability is. Many in the field imagined that companies would do the work of evaluation, which would now include red-teaming. Then some type of regulator or public forum would evaluate whether this work was done correctly and appropriately. That’s very similar to the FDA when that agency assesses whether a pharmaceutical company has done its due diligence and then makes a judgment on whether the drug is safe for commercialization.
My colleagues and I have previously argued that we need the courts to provide a counterbalance so that you’re not just relying on a regulator to evaluate the work of companies. Without that counterbalance, you can create conditions for regulatory capture. You might have a person who was harmed because the regulator was too sympathetic to the companies it was meant to regulate. The person who was harmed might say that the regulator didn’t do its job properly. That’s the voice of the public, coming to the court for judgment.
One of the arguments that we were trying to make in our work was that algorithmic systems make it really hard to show that you have been harmed. For example, how do you show that you could potentially be a victim of identity fraud? The harm is not immediate and material; it’s speculative. A lot of court cases are rejected because the harms being argued by the plaintiff are speculative. Actual harm hasn’t happened yet.
We need different accountability measures given the way that these systems work. To some extent, the work of evaluating how these models behave clearly lies with the people who are creating the models. That’s why there’s so much discussion within companies on AI safety. They understand that the impact of their systems can be quite consequential.
The questions of regulation for these AI-driven systems are fundamentally different from the questions of regulation that we have been grappling with in the context of social media platforms and Section 230. AI models generate content. That’s different from social media platforms, which argue that users generate content and thus that the platforms are not responsible for that content. This issue of responsibility becomes even more salient given that each of these AI models lives within a company. It is the Open AI model, or the Microsoft model, or the Google model that is creating the content.
It becomes even more complicated when we need to distinguish whether the model said something only when provoked by a user versus whether it is baked into the model. In addition, how do we determine if and how people are being harmed by these systems? What does harm look like? These are hard challenges and still open questions.
These challenges are especially hard when you think about how users the world over interact with systems and the varying degree of power that different populations have. You’ve done a lot of work on the “Majority World,” a term that you use to highlight the fact that the majority of the world’s population remains on the receiving end of technological systems. What have you learned from that work?
Yes, we certainly need to acknowledge that most of the people in the world are on the receiving end of digital systems. Systems come into our lives. We use them. We make do with them in one way or the other. Sometimes we have no choice in the matter. At the same time, it also does not mean that we have no agency. I have been broadly interested in ways people exercise their agency as they come to deal with data systems that have become crucial to the organization of their daily lives.
For example, credit scores are something that you have to deal with if you’re living in the United States. That’s an area that I’ve been studying, to try to understand how people engage with credit score systems. Another good example is India’s biometrics-based national ID system that is now increasingly used in organizing all kinds of government services. I have worked extensively on showing that the use of such data systems can become so pervasive that access to your basic rights as a citizen can become dependent on whether you can access your ID.
In the case of credit scores in the United States or the biometrics-based national ID system in India, it’s important that we study how people deal with making systems work for them. That can help us think through the ordinary ethics of making decisions about representing ourselves through data or dealing with decisions made about us by data systems.
This lens can help us think about designing systems so that they work most effectively for more people. Majority World framing allows us to focus on how ordinary people interface with data systems and make them work for themselves. For example, the easier a form is to fill out, the easier it is for a person to enroll in a system and interface with it. And given that more than half of the public interfaces with the internet on their phones, we could ask the question: Why are government websites still designed to work well on laptops or desktops, but not on mobile?
What is the relationship between how easy a system is to use and reducing the harms that a system can produce?
The more we move towards new technologies that are imagined to solve these problems, the more we need to focus on the basic infrastructure needed for people to represent themselves well through data. The better the representation in the data on a digital system, the easier it will be for digital governance systems to work effectively for users.
People can use their data to more effectively represent their lives when we design for a broader scope of public participation. That might include investing in building deeper public understanding of the networked logic of data infrastructures or creating redundancies in services so that people can choose how and when they interface with digital services. Most importantly, we can offer opportunities for the public to speak back and seek redress when faced with harms.
A good example of this occurs in the context of credit scores where it’s so complicated to dispute a transaction. Majority World scholarship doesn’t start with the question of how to make a complex system in order to address a particular social problem. Instead, it starts with existing systems and examines the ways in which they work together and the ordinary, yet crucial ways in which they fail. Those failures tell us a lot about where the system is misaligned with how people actually get things done. It also tells us a lot about how AI does not work by itself. Rather, it is made to work and we as humans work with it. I have been thinking a lot about the work that we do in the background of our everyday lives to make data systems work for us.
We need to make systems like those used to dispute transactions contributing to credit score reports easier to use. If we do, a lot of the major challenges of representation will be easier to resolve. It makes it easier to say, “The problem is that this dataset is biased in this way.” That only happens when we give enough opportunities for people to represent themselves through data.
You’re already extending this work through collaborations with other Siegel Research Fellows. Tell us what you’ve been up to.
New York City now has a law requiring companies to audit their employment hiring tools. This is the first law of its kind anywhere in the world. Discussions about that law at a Siegel Research Fellow convening has led to an interesting collaboration with Nathan Matias, who is now an assistant professor in the Cornell University Department of Communication and field member in Information Science. Nate’s students are looking at publicly available audit reports published by employers in compliance with the New York City law. Nate is currently collaborating with our team at Data & Society on publishing this research.
I’m also involved in discussions with Siegel Research Fellow Caroline Sinders and Siegel Research Advisor for Emerging Technology Eryk Salvaggio to design a research project at the intersection of art and red-teaming. Do artists red-team image creation tools? How do they work with those systems? We need to take the expertise of artists more seriously.
The AI field is moving so quickly. What are you reading and listening to stay up-to-date?
There are two podcasts that I listen to regularly: The AI Breakdown and Cognitive Revolution. The AI Breakdown explores what is happening today around AI. Cognitive Revolution offers conversations with practitioners to examine these issues more broadly.
I’ve also been re-reading the works of science fiction writer Isaac Asimov lately. It’s interesting how some of the themes of the conversation around what’s happening in the present moment are fundamentally science fiction in a way.
Last but not the least, I recently came across this collection of works on resisting data colonialism put together by the Tierra Común network. I am really enjoying the stories that they have put together in the collection.
More from Ranjit Singh:
- Policy Brief: AI Red-Teaming Is Not a One-Stop Solution to AI Harms: Recommendations for Using Red-Teaming for AI Accountability by Sorelle Friedler, Ranjit Singh, Borhane Blili-Hamelin, Jacob Metcalf, and Brian J. Chen, published by Data & Society, October 2023
- Paper: Scaling Up Mischief: Red-Teaming AI and Distributing Governance by Jacob Metcalf and Ranjit Singh, published by Harvard Data Science Review, December 2023
- In the news: “The AI 100 2023: The Top People in Artificial Intelligence,” published by Business Insider, October 2023