By: Tyler Wells Lynch
As we strive to better understand and predict complex phenomena like climate change, it’s clear we need to develop better methods for understanding how systems behave. After all, nature doesn’t distinguish between weather and climate; that boundary is created and imposed by people. It should come as no surprise, then, when our models fail to capture certain natural phenomena. But is there a way AI can help?
Auroop Ganguly is a distinguished professor of civil and environmental engineering at Northeastern University and director of the Sustainability and Data Sciences Laboratory (SDS Lab). He’s also the director of AI for Climate & Sustainability at the Institute for Experiential AI (EAI), where he is working with colleagues to help develop the “AI for climate and sustainability” thrust. In a recent paper in Nature Communications, Ganguly and his co-authors showed how deep learning tools can be used to find connections between distant weather phenomena, specifically the role El Niño plays on droughts and floods around the world.
We sat down with Auroop for a wide ranging discussion of his paper, complex systems, and how AI can help predict and understand extreme weather events—and why there is still a lot of work to do.
Why is it so hard to model weather patterns?
Climate has been called a big data problem. We’ve seen significant increases in datasets mainly from remote sensing and Earth system models, but also from data-ingested weather models like reanalysis, which are essentially reconstructions of the past weather. When you take all of these data together, it's a huge increase in the availability of data.
The other thing that has increased is our ability to process that data. Even as we started to collect climate data we couldn't get much of an idea about the physics or the knowledge or predictive power. We could only guess at some of them. Then the data started to exist. But El Niño is a very complex phenomenon. If you're comparing one river flow to the interannual variability of a different river flow, how do you compare the huge mass of spatio-temporal data relevant for sea surface temperature, for example, with this one river flow data? It's impossible with our traditional set of tools. We have novel approaches that originated from machine learning and computer vision, like convolutional neural nets (CNNs) and LSTMs (long short-term memory), but now the problem is that we’ve lost the ability to explain. This is difficult for a decision-maker to use, because if I cannot explain it, can I really trust it?
Your paper is about how AI can help to predict El Niño and river flow patterns. What is El Niño and why is it so difficult to predict?
El Niño is a phenomenon that happens at the intersection of ocean and atmosphere, primarily in the eastern tropical part of the Pacific ocean. If it’s warmer than the long term average it's called El Niño, and if it's cooler it's called La Nina, but as a phenomenon it has global implications. It impacts regional climate, regional weather, and hydrology across different land regions. Each time there is El Niño, regions in Peru will likely see flooding, and then if it’s a La Nina year, meaning the water temperatures are cooler, then Peru might see droughts. And we see impacts in far away places, too—a seesaw effect in the Sahel region of Africa relative to Peru, for example, and across all inhabited continents.
In some of these cases, we don't really see how it's happening. It seems like magic, like a long-distance effect. We call these teleconnections. If we knew all the physics perfectly, we would be able to understand the physics of the flow and perhaps even write down partial differential equations to model it, except we don't know it perfectly. We do have observed and model-simulated data and process knowledge, but both are partial and incomplete. How we blend the power of physics and data-driven sciences to extract information in such systems has intrigued scientists for decades.
Is climate modeling a black box?
Yes, it’s a black box. And if it's a black box, then in a field where you have to make decisions and stakeholders need to be convinced, how can they ever be convinced? Take the flu, for example. If we observe that people have more flu during winter, then we’ll reason that flu shots should be given just before winter. And to explain it, we might say it’s because in winter people huddle together indoors. That understanding is useful for the need to be translated into action. It’s the same in other contexts, like in making important water resource decisions. And it speaks to trustworthiness a lot because explainability and interpretability is a first step.
With climate, what we can say is there is information content about sea temperatures, regional hydrology, rivers flows, and such, which we are leveraging. However, in the future, we don't have any observations. We only have model simulations. Using the same mapping we used to learn from the past, we can use it to improve our predictions of regional hydrology in the future. This helps in making decisions about whether we should build more dams, reservoirs, levees, or seawalls.
It sounds like the overarching challenge is about complex systems modeling in general, which [EAI Director of AI + Life Science Sam Scarpino] has referred to as one of the biggest challenges in science—this disconnect between emergent phenomena and the constituent data underneath. How does that assessment differ when it comes to modeling climate versus other complex systems, like energy systems or epidemics?
If you think about climate resilience, the complexity comes in a variety of ways. First of all, there’s the physics of fluid flow, the basics of which we understand fairly well and can define through differential equations. When you're talking about weather or climate, it's Newton's laws of motion but on a rotating sphere: the Earth. But then we have things that we don't understand as well but are very important—things like convective storms, things like hurricanes. These things don’t come directly out of Newton's laws of motion alone. And then there is something that has been known since the 1960s: that the weather system is chaotic; it's extremely sensitive to initial conditions with strange patterns in mathematical “spaces”. And then there are questions about climate versus weather—whether they are ultimately the same thing. Nature doesn't say, “Here is the difference between weather and climate.” We say there is a difference. We say that weather is day-to-day phenomena and climate is a long-term average of weather.
The one essence of complex systems is that you cannot easily break down the entire system into small components, solve those small components, and add them up and the big thing is solved. It just doesn't work that way. That is why we have things like teleconnections, because these are very complex things. Given the data sets that we have, the data-driven tools, the physics and biogeochemistry—all of those together is still not enough to find a good mathematical or quantitative description of nature. We are getting better at understanding and modeling this complex system, but it is absolutely one of the top challenges.
In thinking about EAI’s vision of AI across climate, health, and life science—with inputs from philosophy, ethics, engineering, and other fields—what is the one thing that ties everything together?
From the point of view of how these systems interact, yes, they are not really as independent as we think. They're much more connected. Weather versus climate is one example. Climate change impacts the statistics of weather extremes and weather fluctuations, not just the average weather. Another is how weather and climate variables impact the spread of diseases like flu or COVID. Whether you can make a claim depends on the quality and availability of data. We’ve been studying flu (and influenza-like illness) for a long time and while we do not yet have a full understanding, at least we believe we know enough about flu to be able to design policies that say, for example, flu shots should be taken seasonally over a certain time window.
But still there are so many things we don't know about how weather impacts flu, how climate impacts public health, the transmissibility of disease, industrial production, air pollution. The more we think about each of these different areas that AI is targeting, we see that there are always various levels of connection that we have not focused on simply because they happen to be in different disciplines.
Scientists used to be called natural philosophers and chemistry used to be called alchemy. Would you agree it seems like there’s always been a dynamic process of inventing new categories and systems to adapt to the needs of the time.
And you still have doctors of philosophy. But you're right. From an AI perspective, it’s useful to convey that many of these systems are changing, so we have lots of data and we have lots of process knowledge and physics and parameterizations that have come down through the ages. But sometimes those datasets are imperfect. We may have lots of data, and we think they're great, but they may have all kinds of quality issues, collinearity issues.
How do we best use all of that data to model systems that are fairly complex because of the various time scales, spatial scales, and processes that interact and form these complex systems? It's very difficult to say you’ll just break it down into smaller and smaller components and then solve the problem. It doesn't always work that way. There are, like you said, emergent processes.
But when we talk about using data and physics and moving towards some kind of hybrid approach with AI and machine learning as a guide, might that help us understand different kinds of complex systems and interactions? I think it's a very interesting idea.
Learn more about Ganguly’s research and watch his recent Seminar Series talk.