What role is social media playing in rising political polarization? How does social media activity affect mental health in teens? Does social media play a role in the proliferation of misinformation and conspiracy theories?
Researchers trying to answer these questions often must first get access to data from social media companies. But those platforms have become increasingly restrictive in the data they share following a number of troubling research findings in recent years.
Now researchers from the Institute for Experiential AI at Northeastern University are creating a new way to conduct social media research that relies on data from volunteers. In a new paper published in Nature Computational Science, the researchers explain the advantages of such user-sourced data collection and lay out the beginnings of an ethical framework to preserve the privacy and interests of volunteers.
The researchers say the user-based approach is a necessary response to recent decisions by companies like Twitter (now X) to restrict access to their data. They also believe the approach could democratize access to social media data and unlock a torrent of important social science research.
“Right now, it’s very expensive to collect social media data, and the costs are prohibitive for most junior scholars,” says EAI affiliate faculty member David Lazer. “We should make it possible for anyone to access these data. It could fundamentally change the game for research in this space.”
Lazer, who is also a Distinguished Professor at Northeastern and co-director of the NULab for Texts, Maps, and Networks, coauthored the paper with EAI core faculty member John Basl and affiliate faculty members David Choffnes and Christo Wilson. Michelle Meyer, a researcher with the Geisinger Health System, was lead author of the paper.
A Framework Centered On Ethics
The vast majority of research on how social media is impacting society was done using data from Twitter/X. The company has always allowed researchers to collect data on its platform for free, but in February it began charging for the data, pricing out many researchers.
Using data provided by social media companies for research was always problematic. Such data were structured for advertisers more than researchers, and it didn’t reveal the content users actually saw in their feeds.
“None of the existing [internet platforms] let you see the back and forth between people and platforms,” Lazer explains. “You couldn’t see what people searched for in Google and you couldn’t see what prompted users of Twitter or Facebook to see some things and not others.”
The researchers believe shifting to consensually collecting data from users will improve access. They call the user-sourced system the National Internet Observatory (NIO). NIO’s data could involve activity across platforms, revealing richer insights about online activity, and it could be structured in a way that lets researchers easily access the variables they care most about.
Trust will be paramount in such a system, so the researchers also laid out some high-level pillars of an ethics framework for ensuring the data are used ethically.
“In the area of AI and big data analytics, we don’t have a robust ethics ecosystem like in the medical field,” Basl says. “We’re stuck building the ethics tools we need from the ground up because the AI industry doesn’t have that robust ethics ecosystem. We want to enable research to understand internet platforms, but at the same time we lack the ethics infrastructure that’s going to make sure we do it responsibly.”
The researchers say AI and data analytics present new ethical challenges and require a better approach than the 1970s guidelines currently governing research involving human subjects. With that in mind, they set out to establish a more comprehensive ethical framework that corrects for problems like bias and unfairness. The framework involves 18 ways to protect users and addresses concerns around privacy, data misuse, unethical behavior by researchers, and more.
The methods apply to all uses of the NIO dataset, and the researchers believe they hold value for similar research efforts.
“You can’t just grab these interventions and not think carefully about the context in which you’re applying them, but I do think these interventions are useful tools in the sense that they can provide a template,” Basl says.
From Paper to Platform
To explain the impact NIO could have, Lazer tells the story of a recent Nature paper he co-authored. The work investigated whether Google perpetuated an online filter bubble by pointing toward results that reinforced peoples' political positions, and found that it did not. Collecting that data took years and hundreds of thousands of dollars.
“It was an important question, but it took a lot of time and effort, and it should be possible to do a better job than we did,” Lazer says. “My goal is by next spring, you should be able to reproduce what we did in that paper at a larger scale and in a more effective way.”
Now the authors are building out the NIO database by adding volunteers and hoping to study how researchers use it.
“We’ve got a broad framework for the interventions we want to deploy, and now it’s a matter of deploying those components, seeing what works and what doesn’t and improving it,” Basl says.
Ultimately, the goal is to build a trusted system that allows for more accessible and revealing research about the interaction between social media and humanity.
“Put simply, social media platforms matter,” Lazer says. “How people get information, how they connect with each other, how they decide what to buy—all that is conducted through a small number of internet platforms. They’re powerful and they’re extremely relevant to society, and there are many ways in which access to their data has been increasingly limited in ways that have made much research impossible.”
Learn more about the focus areas of our 95+ faculty members.