Institute Researchers Develop a More Accessible Way to Conduct Social Media Research Using Volunteers

What role is social media playing in rising political polarization? How does social media activity affect mental health in teens? Does social media play a role in the proliferation of misinformation and conspiracy theories?

Researchers trying to answer these questions often must first get access to data from social media companies. But those platforms have become increasingly restrictive in the data they share following a number of troubling research findings in recent years.

Now researchers from the Institute for Experiential AI at Northeastern University are creating a new way to conduct social media research that relies on data from volunteers. In a new paper published in Nature Computational Science, the researchers explain the advantages of such user-sourced data collection and lay out the beginnings of an ethical framework to preserve the privacy and interests of volunteers.

The researchers say the user-based approach is a necessary response to recent decisions by companies like Twitter (now X) to restrict access to their data. They also believe the approach could democratize access to social media data and unlock a torrent of important social science research.

 Right now, it’s very expensive to collect social media data, and the costs are prohibitive for most junior scholars,” says EAI affiliate faculty member David Lazer. “We should make it possible for anyone to access these data. It could fundamentally change the game for research in this space.”

Lazer, who is also a Distinguished Professor at Northeastern and co-director of the NULab for Texts, Maps, and Networks, coauthored the paper with EAI core faculty member John Basl and affiliate faculty members David Choffnes and Christo Wilson. Michelle Meyer, a researcher with the Geisinger Health System, was lead author of the paper.

A Framework Centered On Ethics

The vast majority of research on how social media is impacting society was done using data from Twitter/X. The company has always allowed researchers to collect data on its platform for free, but in February it began charging for the data, pricing out many researchers.

Using data provided by social media companies for research was always problematic. Such data were structured for advertisers more than researchers, and it didn’t reveal the content users actually saw in their feeds.

“None of the existing [internet platforms] let you see the back and forth between people and platforms,” Lazer explains. “You couldn’t see what people searched for in Google and you couldn’t see what prompted users of Twitter or Facebook to see some things and not others.”

The researchers believe shifting to consensually collecting data from users will improve access. They call the user-sourced system the National Internet Observatory (NIO). NIO’s data could involve activity across platforms, revealing richer insights about online activity, and it could be structured in a way that lets researchers easily access the variables they care most about.

Trust will be paramount in such a system, so the researchers also laid out some high-level pillars of an ethics framework for ensuring the data are used ethically.

“In the area of AI and big data analytics, we don’t have a robust ethics ecosystem like in the medical field,” Basl says. “We’re stuck building the ethics tools we need from the ground up because the AI industry doesn’t have that robust ethics ecosystem. We want to enable research to understand internet platforms, but at the same time we lack the ethics infrastructure that’s going to make sure we do it responsibly.”

The researchers say AI and data analytics present new ethical challenges and require a better approach than the 1970s guidelines currently governing research involving human subjects. With that in mind, they set out to establish a more comprehensive ethical framework that corrects for problems like bias and unfairness. The framework involves 18 ways to protect users and addresses concerns around privacy, data misuse, unethical behavior by researchers, and more.

The methods apply to all uses of the NIO dataset, and the researchers believe they hold value for similar research efforts.

“You can’t just grab these interventions and not think carefully about the context in which you’re applying them, but I do think these interventions are useful tools in the sense that they can provide a template,” Basl says.

From Paper to Platform

To explain the impact NIO could have, Lazer tells the story of a recent Nature paper he co-authored. The work investigated whether Google perpetuated an online filter bubble by pointing toward results that reinforced peoples' political positions, and found that it did not. Collecting that data took years and hundreds of thousands of dollars.

“It was an important question, but it took a lot of time and effort, and it should be possible to do a better job than we did,” Lazer says. “My goal is by next spring, you should be able to reproduce what we did in that paper at a larger scale and in a more effective way.”

Now the authors are building out the NIO database by adding volunteers and hoping to study how researchers use it.

“We’ve got a broad framework for the interventions we want to deploy, and now it’s a matter of deploying those components, seeing what works and what doesn’t and improving it,” Basl says.

 Ultimately, the goal is to build a trusted system that allows for more accessible and revealing research about the interaction between social media and humanity.

“Put simply, social media platforms matter,” Lazer says. “How people get information, how they connect with each other, how they decide what to buy—all that is conducted through a small number of internet platforms. They’re powerful and they’re extremely relevant to society, and there are many ways in which access to their data has been increasingly limited in ways that have made much research impossible.”

Learn more about the focus areas of our 95+ faculty members.

Latest Posts

Fighting Antimicrobial Resistance Research with Artificial Intelligence

Aligning interests in the fight against antibiotic resistance will require greater cooperation between industry and academia. Antimicrobial resistance is one of those civilizational perils that leaves little room for hope—a borderless, humanity-wide hazard whose central challenge is as much about human nature as it is about natural forces. And to attend a conference on antimicrobial […]

Harnessing AI to Turn Complexity Into Planet Saving Innovations

New AI for Climate and Sustainability (AI4CaS) focus area will use research, entrepreneurship, and training to fight climate change. The Earth’s climate is determined by an intricate web of ecosystems that interact across local and global scales. Turning that complexity into actionable insights, like building coastal resilience or predicting natural disasters, is where human-centered and […]

New Institute Collaboration Aims to Prepare Public Health Officials For The Next Pandemic

The Institute for Experiential AI (EAI) at Northeastern University is pleased to announce an innovative new collaboration with the Center for Advanced Preparedness and Threat Response Simulation (CAPTRS) to improve public health officials’ response to the next pandemic. Under the partnership, the institute is developing a large language model (LLM) that will simulate emerging global […]

World Suicide Prevention Day: Can AI Help Researchers Better Understand Factors that Contribute to Suicide?

  Studying mental illness poses a uniquely thorny challenge. Researchers have to rely on subjective reports and behavioral analysis to make diagnoses, but the idiosyncrasies of mental illness all but guarantee that the picture remains fuzzy. For suicide, the problem is even stickier, shrouded in privacy and unpredictability and regulated by a complex mix of […]

VB Takeaways: The Truth About Generative AI For Customer Service

  In this section of Institute for Experiential AI Executive Director Usama Fayyad’s takeaways from conversations with executives and leaders at VentureBeat Transform, Usama shares his thoughts on the last of three roundtable discussions, in which attendees discussed the potential and challenges of using generative AI for customer service.   Generative AI for customer service […]