ChatGPT Is Replacing Humans in Studies on Human Behavior—and It Works Surprisingly Well

I’m a huge fan of Anthony Bourdain’s travel show Parts Unknown. In each episode, the chef visits remote villages across the globe, documenting the lives, foods, and cultures of regional tribes with an open heart and mind.

The show provides a glimpse into humanity’s astonishing diversity. Social scientists have a similar goal—understanding the behavior of different people, groups, and cultures—but use a variety of methods in controlled situations. For both, the stars of these pursuits are the subjects: humans.

But what if you replaced humans with AI chatbots?

The idea sounds preposterous. Yet thanks to the advent of ChatGPT and other large language models (LLMs), social scientists are flirting with the idea of using these tools to rapidly construct diverse groups of “simulated humans” and run experiments to probe their behavior and values as a proxy to their biological counterparts.

If you’re imagining digitally recreated human minds, that’s not it. The idea is to tap into ChatGPT’s expertise at mimicking human responses. Because the models scrape enormous amounts of online data—blogs, Youtube comments, fan fiction, books—they readily capture relationships between words in multiple languages. These sophisticated algorithms can also decode nuanced aspects of language, such as irony, sarcasm, metaphors, and emotional tones, a critical aspect of human communication in every culture. These strengths set LLMs up to mimic multiple synthetic personalities with a wide range of beliefs.

Another bonus? Compared to human participants, ChatGPT and other LLMs don’t get tired, allowing scientists to collect data and test theories about human behavior with unprecedented speed.

The idea, though controversial, already has support. A recent article reviewing the nascent field found that in certain carefully-designed scenarios, ChatGPT’s responses correlated with those of roughly 95 percent of human participants.

AI “could change the game for social science research,” said Dr. Igor Grossman at the University of Waterloo, who with colleagues recently penned a look-ahead article in Science. The key for using Homo silicus in research? Careful bias management and data fidelity, said the team.

Probing the Human Societal Mind

What exactly is social science?

Put simply, it’s studying how humans—either as individuals or as a group—behave under different circumstances, how they interact with each other and develop as a culture. It’s an umbrella of academic pursuit with multiple branches: economics, political science, anthropology, and psychology.

The discipline tackles a wide range of topics prominent in the current zeitgeist. What’s the impact of social media on mental health? What are current public attitudes towards climate change as severe weather episodes increase? How do different cultures value methods of communication—and what triggers misunderstandings?

A social science study starts with a question and a hypothesis. One of my favorites: do cultures tolerate body odor differently? (No kidding, the topic has been studied quite a bit, and yes, there is a difference!)

Scientists then use a variety of methods, like questionnaires, behavioral tests, observation, and modeling to test their ideas. Surveys are an especially popular tool, because the questions can be stringently designed and vetted and easily reach a wide range of people when distributed online. Scientists then analyze written responses and draw insights into human behavior. In other words, a participant’s use of language is critical for these studies.

So how does ChatGPT fit in?

The ‘Homo Silicus’

To Grossman, the LLMs behind chatbots such as ChatGPT or Google’s Bard represent an unprecedented opportunity to redesign social science experiments.

Because they are trained on massive datasets, LLMs “can represent a vast array of human experiences and perspectives,” said the authors. Because the models “roam” freely without borders across the internet—like people who often travel internationally—they may adopt and display a wider range of responses compared to recruited human subjects.

ChatGPT also doesn’t get influenced by other members of a study or get tired, potentially allowing it to generate less biased responses. These traits may be especially useful in “high-risk projects”—for example, mimicking the responses of people living in countries at war or under difficult regimes through social media posts. In turn, the responses could inform real-world interventions.

Similarly, LLMs trained on cultural hot topics such as gender identity or misinformation could reproduce different theoretical or ideological schools of thought to inform policies. Rather than painstakingly polling hundreds of thousands of human participants, the AI can rapidly generate responses based on online discourse.

Potential real-life uses aside, LLMs can also act as digital subjects that interact with human participants in social science experiments, somewhat similar to nonplayer characters (NPC) in video games. For example, the LLM could adopt different “personalities” and interact with human volunteers across the globe online using text by asking them the same question. Because algorithms don’t sleep, it could run 24/7. The resulting data may then help scientists explore how diverse cultures evaluate similar information and how opinions—and misinformation—spread.

Baby Steps

The idea of using chatbots in lieu of humans in studies isn’t yet mainstream.

But there’s early proof that it could work. A preprint study released this month from Georgia Tech, Microsoft Research, and Olin College found that an LLM replicated human responses in numerous classical psychology experiments, including the infamous Milgram shock experiments.

Yet a critical question remains: how well can these models really capture a human’s response?

There are several stumbling blocks.

First is the quality of the algorithm and the training data. Most online content is dominated by just a handful of languages. An LLM trained on these data could easily mimic the sentiment, perspective, or even moral judgment of people who use those languages—in turn inheriting bias from the training data.

“This bias reproduction is a major concern because it could amplify the very disparities social scientists strive to uncover in their research,” said Grossman.

Some scientists also worry that LLMs are just regurgitating what they’re told. It’s the antithesis of a social science study, in which the main point is to capture humanity in all of its diverse and complex beauty. On the other hand, ChatGPT and similar models are known to “hallucinate,” making up information that sounds plausible but is false.

For now, “large language models rely on ‘shadows’ of human experiences,” said Grossman. Because these AI systems are largely black boxes, it’s difficult to understand how or why they generate certain responses—a tad troubling when using them as human proxies in behavioral experiments.

Despite limitations, “LLMs allow social scientists to break from traditional research methods and approach their work in innovative ways,” the authors said. As a first step, Homo silicus could help brainstorm and rapidly test hypotheses, with promising ones being further validated in human populations.

But for the social sciences to truly welcome AI, there will need to be transparency, fairness, and equal access to these powerful systems. LLMs are difficult and expensive to train, with recent models increasingly shut behind hefty paywalls.

“We must insure that social science LLMs, like all scientific models, are open-source, meaning that their algorithms and ideally data are available to all to scrutinize, test, and modify,” said study author Dr. Dawn Parker at the University of Waterloo. “Only by maintaining transparency and replicability can we ensure that AI-assisted social science research truly contributes to our understanding of human experience.”

Image Credit: Gerd AltmannPixabay

* This article was originally published at Singularity Hub

Post a Comment