AI for Coping with Complexity: Sam Scarpino on Core Challenges in AI + Life Sciences

By: Tyler Wells Lynch

In the life sciences, complex systems are sometimes seen as the achilles heel of artificial intelligence. The human brain, disease outbreaks, living cells, pharmacology—each is marked by an impossibly vast set of parameters that, when combined, produce functionalities that do not exist at the constituent level. Take, for instance, an Ebola outbreak in Africa. To effectively model an epidemic, you need to know a lot more than just the biomolecular functions of the Ebola virus. You need to know about the environmental conditions of the outbreak, the regional capacity to respond, the pre-existing immunological landscape, variability in social network structure, and myriad other factors that are difficult to model with reliable and consistent predictive power.

So how accurate can we expect predictive algorithms to be if they reduce living systems to static datasets rather than capture them in their natural complexity? For Sam Scarpino, the newly appointed Director of AI + Life Sciences at the Institute for Experiential AI (EAI) at Northeastern University, the goal is to remove the assumption that the relationship between AI and life science is a one-way street.

EAI and Northeastern University: A Unique Opportunity

Sam believes Northeastern is well positioned to link data to action through technology. The “data to action gap,” as he calls it, exists in the life sciences as a kind of rift between “wet laboratories,” which work with biological and chemical materials, and “dry labs,” which are concerned with more theoretical or computational methodologies. One approach to address this concern, which he means to implement as Director of AI + Life Sciences at EAI, is to develop processes where wet lab researchers and dry lab researchers are integrated together. He calls this approach the “wet lab in the loop,” and it’s characteristically Northeastern.

Scarpino’s previous work—initially as a field biologist studying cancer genetics and, later, as Vice President of Pathogen Surveillance at The Rockefeller Foundation—informed his understanding of complex systems and the ideal way to study them.

“One of the things that I have learned along the way is that, given the complexity of living systems and the dynamic nature of living systems, you have to bring large systems level data sets to bear on the problem,” says Sam, who is also Director of NU’s Emergent Epidemics Lab. “You have to be hypothesis-driven and you have to be iterative in terms of re-evaluating what you know or you think you know about the world.”

Synthesizing data is easier said than done, especially when you’re trying to bridge gaps like those that exist between theory and practice or AI and life science. One of the first tasks, he says, is to unite key data sets under one roof, which will allow wet lab and dry lab researchers to connect when trying to predict outcomes from complex datasets. According to Sam, that process is the only way to effectively model AI-based solutions that can adapt and respond to real-world perturbations that are often unforeseeable. He returns to a hypothetical Ebola outbreak as an example:

“If I want to make a prediction about what's going to happen in a pandemic, I have to have data sets at the system level,” he says. “But, importantly, I also have to be continually relearning the rules as things evolve, right? So I can't just make a prediction today about Ebola and expect it to be correct three or four weeks from now, because in two weeks there may be vaccines on the ground, which aren't there now, and a whole host of other things. It requires this iterative training, testing, training, testing of our predictive models.” 

How to Model Nature Through Data


More than bridging the gap between wet and dry labs, Sam’s research vision promises to advance the state of the art in AI, which along with the life sciences may turn out to be the most consequential research focus of the 21st century. AI’s impact on society has already been felt in numerous ways—from self-driving cars to facial recognition tools. New large language and deep learning models like GPT-3 and DALL-E have stirred debate around copyright infringement, disinformation campaigns, and academic authorship. On the life sciences front, CRISPR mediated therapies spark similarly heated conversations about the ethics and potential of genetic engineering. But Sam believes a more integrated research framework can advance both fields.

“I am convinced that we can bring more realistic aspects of life science into artificial intelligence,” he says. “Improving our ability to design CRISPR mediated therapies, improving our ability to diagnose cancers, improving our understanding of the evolution of complex traits like drought adaptation in plants. All these things are going to be critical for climate change.”

Sam also envisions an AI model with “life sciences in the loop,” where developers learn how to improve and iterate on their algorithms by comparing them to living systems. Usama Fayyad, executive director of the Institute for Experiential AI (EAI), shares this unique vision. 

“We believe AI can play a transformative role in many major aspects of the life sciences,” Fayyad says. “Sam has the depth of knowledge of the technology and the market needs to lead our efforts in making EAI a relevant, useful, and necessary ingredient to modern computational approaches that are now dominant in the life sciences.”

Success in the life sciences hinges on deciphering the complexity behind not only biological systems, but also the political, sociological, and physical systems that surround them—a degree of complexity that is overwhelming. But data begets more data. That fact alone suggests a tantalizing promise for human-centric approaches to AI, and a potential roadmap for major breakthroughs in the life sciences.

The Institute for Experiential AI welcomes Sam Scarpino to his new role as Director of AI + Life Sciences. You can learn more about Sam’s work and research here.