Q&A with Siegel Research Fellow at Cornell University’s Department of Information Science Breanna Green 

Breanna E. Green is a PhD Candidate in the Information Science department of Cornell University. Her background in psychology paired with an interest in methods from communication, computer science and political science has led to doctoral work focused on examining online spaces for moral opinions of political violence using natural language processing (NLP).

Tell us about yourself and your background. What brought you to this point in your career? What drew you to this area of research and your current projects?

Though I’m currently a PhD candidate in the Information Science department at Cornell University, my academic background is in Psychology. I earned both my Bachelors and Masters degrees from minority serving institutions. My bachelors is from Prairie View A&M University, a Historically Black College or University (HBCU) outside of Houston. My masters is from the University of Texas at San Antonio, a predominately Hispanic serving institution. I began coding and programming only after starting my career as a Research Analyst in the Institutional Research department of Tarrant County College, following my Masters degree. While I always knew I would eventually pursue my doctorate, I never imagined it would be in Information Science or focused on computational social science!

My current research interests grew in tandem with the computational skill sets I acquired. I was initially interested in the psychology of extremism and planned to make that the focus of my doctoral work. Yet, as I started taking courses at Cornell on social media analysis and natural language processing (NLP) I began to shift my focus towards more extreme and politicized conversations happening online. The basis of my interests always stemmed from wanting to understand opposing group dynamics.

Tell us about your current role. What kind of research are you working on right now? 

My research centers on the moralization of political violence and studying these types of conversations. By “moralization”, I specifically refer to the types of reasonings, opinions, phrases, etc. utilizing morality to any degree to justify the use of political violence. My goal is to explore how U.S. political left and right leaning actors morally appraise and oppose harm. Each side of the political spectrum responds to harm-related events in drastically different ways depending on the victims, perpetrators, or situation. Examples of harm-related events can include the January 6th storming of the capitol, Wynn Bruce’s self-immolation outside the Supreme Court, the Unite the Right rallies in Charlottesville, Virginia, or the recent assassination attempt of former President Donald Trump. Drawing on methods from information science, political science, and socio-linguistics, I aim to uncover the values, rationales, and methods behind moralizing acts of harm within the U.S. political sphere. By examining online discussions of harm-related events, I seek to broaden our understanding of moral framing regarding these acts and how such moralization is shaping opinions about harm in the U.S., explicitly. 

I have had the opportunity to work as a data scientist on some exciting collaborations which helped to build my computational and programming skills. For example, I’ve collaborated with a great friend and colleague, Aspen Russell, my mentor Dr. Drew Margolin, and other amazing researchers from the Communication Department at Cornell on a paper examining YouTube comments for politicizing conversations – Time to Politicization.  We examined conversations that did not start off as political but later became tense for one reason or another. In instances where someone then makes the conversation political (i.e. accusing someone of being a Liberal or Conservative without knowing their political ideology), we examined what happened later within the conversation. My role was to collect, clean and analyze the comments so the larger research team could distill insights. This work is under review for publication. 

You are working on a paper titled “How Polarized are Online Conversations about Childhood.” Can you tell me a bit more about how this research came about?

This was my first opportunity to be lead author on a paper, and I’ve enjoyed the process. For this study, I collaborated with another member of my doctoral committee, Dr. William Hobbs, who has been such a fantastic guide and supporter throughout my journey. As part of my advancement to candidacy, this project was the brainchild born of my desire to implement NLP methods at scale and focus on analyzing moral language use. In this case, I chose to focus on a topic both of interest to myself and timely in the larger political landscape – conversations about children.  

In the United States, 2020 through 2023 seemed to be unusually tumultuous years where children’s welfare was prominent in political debate. Theories in moral psychology suggest political parties might treat concerns for children using different moral frames, and moral conflict might drive substantial polarization in discussions about children. However, whether or not this is true needs to be examined. Thus, we utilized tweets mentioning children posted between 2019-2023 and focused on expressed morality. 

Our results showed mentions of children by Republicans and Democrats (as indicated by linked voter records) were usually similar and tended to contain no large differences in accompanying moral words. When mentions of children did differ across parties, these differences were constrained to topics which were already polarized rather than stemming from new concerns that may have cropped up during the pandemic. These topics reflected a small fraction of conversations about children. 

As we enter election season in the U.S., how do you think this research should help us consider the centrality of children’s issues and the partisan divide it may spark?

I will focus on two key takeaways from this work. First, we showed online conversations mentioning children (at least on Twitter or X as it is now called) tended to use similar moral language. What might this mean? By and large, mentions of children are not polarized by partisanship. When it comes to discussing children in everyday online settings, Democrats’ and Republicans’ conversations are mostly similar. When these groups connect mentions of children with already polarized topics such as gender, race, immigration, and guns – then we certainly see polarization. However, we would still see polarization whether children were mentioned or not. Ultimately, the topic of children has the opportunity to be a unifying force rather than a divisive one because adults care about their well-being. This is the more optimistic takeaway. 

The second, less optimistic takeaway, is there is still a lot we couldn’t answer in this research regarding the impact of evoking children in political conversations on partisan opinions and behaviors or the types of offline conversations that might be taking place. We cannot (and to be clear, we do not) definitively state mentions of children in political conversations have no impact. Rather, the premise of this study comes from recognizing children and children’s issues can and likely will, to a certain extent, be a central issue in the coming election season whether at the national or local levels. Going into the season with such an understanding means we should take more concerted efforts towards scientific research on highlighting the best strategies for effective communication in polarized settings. 

You are also working on a paper comparing the classification abilities for social media content by human classifiers and GPT-4. Can you tell me a bit more about how this research came about?

This work had been predominantly driven by colleagues and friends from the Communication department and funded by the NSF – Ashley Shea and Pengfei Zhao. They were exploring objectionable online content, such as toxic speech, uncivil comments, hate speech and mis-/disinformation. While individual perceptions of what constitutes objectionable content may vary,  many users find this behavior harmful and problematic. 

One of the most pressing challenges my colleagues faced was a need to examine objectionable content and the ways other individuals confront such behavior. This confrontation of objectionable behavior is what we call a discursive tactic. Conducting this examination at scale can be both costly and time consuming, especially when having to rely on human annotators. In brainstorming solutions, we considered whether ChatGPT might be able to help alleviate this pain point. I had recently worked on a different project which utilized ChatGPT and wanted to further those efforts in support of this study and my team.  

The working title includes an example of misclassification by GPT-4: “It seems to be cheering on someone named Brandon.” What does this example from the data tell us about using AI to make sense of deeply contextual and culturally specific ideas?

This is a fantastic question and exactly the type of conversation we hoped could be sparked by this work (and title)! There is preliminary evidence suggesting large language models (LLMs) such as OpenAI’s ChatGPT, Google Bard, or Anthropic’s Claude can perform text labeling and classification tasks, such as tagging a chunk of text containing specific information with a “True” or “False”, at similar or better levels to that of human annotators, such as university students or paid Amazon Mechanical Turk workers (MTurkers). But is this true when the content requiring annotation is nuanced, and (to your question) both contextually and culturally rich? 

Overall we found GPT-4 had difficulty classifying nuanced language. Qualitative analysis revealed four specific findings: 1) cultural euphemisms are too nuanced for GPT-4 to understand, 2) interpreting the type of language found on social media platforms is also a challenge, 3) GPT-4 has issues determining who or what is the target of directed attacks (e.g. the content or the user), and 4) the rationale GPT-4 provides has inconsistencies in logic. Our results therefore suggest the use of ChatGPT in classification tasks involving nuanced language should be conducted with prudence and caution. The black-box nature of LLMs continues to be a concern when applied in computational social science settings. Even if trained on vast amounts of general language from across the web, AI may lack the domain-specific knowledge needed to annotate nuanced social media comments such as the ones my colleagues are looking to investigate at scale. It is surely possible as newer models are deployed to the public, LLMs and generative AI will get better at this task over time. The rate of improvement has been staggering, to say the least. But for now, we emphasize using caution. 

What are the biggest challenges you face in approaching your research? What obstacles stand in the way of this kind of research more broadly?

My most recent obstacle has been getting access to the data sources I had previously, such as  Twitter (X) or Reddit. The shift towards closing or limiting API access to these platforms means I have to consider other platforms I am less familiar with, and this shift comes so late in the process. But, these are not insurmountable obstacles! My entire academic journey has molded me into someone who is flexible and open to adapting, so I won’t let this impact my ability to conduct research.

What’s next for you after finishing your PhD? 

I wish I knew! My eyes are set towards graduating in 2025, but I want to remain open to the number of opportunities that may cross my path. I started this PhD with the desire to stay in academia and become a professor. If I might have the space to dream a bit here – it would be amazing if I could return to Prairie View A&M University and lead efforts towards building an Information Science program there. Other times I think I could best serve academia by joining a teaching institution, because I love working with students directly. If I were to leave academia, I imagine I would still be in a role related to research and/or data science. I’m motivated by curiosity and challenges, so I know wherever I end up will be the best path for me. Looking forward to next year with that in mind. 

What are you reading/watching/listening to right now that you would recommend to readers, and why?

Two podcasts I love are Data Skeptic and Freakonomics Radio. Data Skeptic is specifically geared towards conversations about data science, stats, machine learning, and AI. I enjoy the conversations themselves and the host’s voice is fun to listen to on a drive. Freakonomics Radio is hosted by the co-author of the book, Freakonomics, which I read many years ago. I find he asks interesting questions that make me want to know more. 

More from Breanna Green: 
  • Accepted paper at the 2025 International AAAI Conference on Web and Social Media: Green, B. E., & Hobbs, W. R. (2025). How Polarized are Online Conversations about Childhood? Proceedings of the International AAAI Conference on Web and Social Media. Pre-print available on ArXiv.
  • Submitted to Political Analysis: Green, Breanna, Will Hobbs, Sofia Avila, Pedro L. Rodriguez, Arthur Spirling, and Brandon M. Stewart. 2024. “Measuring Distances in High Dimensional Spaces: Why Average Group Vector Comparisons Exhibit Bias, And What to Do About it.” Previous version (as of Jan. 2024) available on SocArXiv.
  • Anderson, Rajen A., Benjamin C. Ruisch, Breanna E. Green, and Amy R. Krosch. 2024. “Ideological Asymmetries in the Foundations of Affective Polarization: Different Perceived Harms Drive Political Dislike”. Study details as of May 2024 available on OSF, though full paper is yet to be distributed.