How We Arrived in ‘A World Ruled by Data’
How did data come to play such a central role in our society? How should we think about the problems and potential posed by that reality? To answer those questions, New York Times Chief Data Scientist and Columbia University associate professor of applied mathematics Chris Wiggins believes history can be clarifying.
Wiggins gave a sweeping talk on data as part of a Distinguished Lecturer Seminar hosted by the Institute for Experiential AI on May 29. The talk was based on a 2023 book he co-authored titled, “How Data Happened: A History from the Age of Reason to the Age of Algorithms.” It began with an introduction by Northeastern University Provost David Madigan and concluded with a fireside chat with institute Executive Director Usama Fayyad. The book is the result of years of learnings from Wiggins and his co-author while teaching a cross-disciplinary class on data at Columbia.
“The class and the book deal head on with problems on the internet around a reality which is mediated by data-empowered algorithms, or as young people sometimes say, dumpster fires,” Wiggins said.
The Columbia professor gave a multi-century overview of the history of data and the statistical approaches used to understand that data, including artificial intelligence and machine learning. He also discussed some of the most pressing questions around data today, such as privacy, bias, and the use of “black box” algorithms to make high-stakes decisions.
But Wiggins was also sure to infuse a spirit of hope in his story about what he described as ‘a world ruled by data.’ He characterized himself first and foremost as an optimist.
“We’d teach students how regression came to be and then put the original datasets from regression to work,” Wiggins said of the goal for the class. “Then we’d bring them all the way to the present day and explain what we mean when we talk about machine learning and artificial intelligence, etc.
Most of all, we wanted to give people hope: some sense that these are problems that we will collectively fix.”
History Repeats Itself
Wiggins began his history around 1770, a point in time when there were fights around the usefulness of data to describe a society. The practice became commonplace in the 19th century, and, later, an inflection point for statistics came during the Second World War, when codebreaking took on critical importance.
“One of the things that’s useful about history is to look at contests from the past which are similar enough to today that it allows you to look at the present-day contests in a new light,” Wiggins explained.
The presentation also covered early thinking around computational intelligence, including Alan Turing’s famous writing on the subject in 1950, and questions over whether machine intelligence would arise from the use of more data or explicit programming.
With each story from history, Wiggins showed that people have been wrestling with questions around AI and the use of data for a lot longer than most people think.
“We’re now in an environment in which people talk a lot about whether different things are intelligent and whether different pieces of mathematics are intelligent,” Wiggins said. “That was a fight more than 100 years ago.”
On Algorithms and Ethics
Wiggins’ deep dive across history touched on the origins of digital computers as well as the modern, ubiquitous advertising models that leverage data like browsing history and location to target people.
“Digital computation is really born from data science problems,” Wiggins explained. “[In the book and class] we give students and readers a sense for the present-day milieu, in which there are contests around what is data ethics, and in particular the role of persuasion and the business model which funds it.”
That business model inevitably brought questions of privacy and fairness. Referring to congressional testimony in 1984, Wiggins showed how concern around the power of datasets has evolved over time.
“Fears people had [back then] were not around private control of data, they were around state control of data,” Wiggins said. “We try to explain how privacy was invented and prosecuted as a way to defend the electorate and defend citizens against too much power in the hands of one [actor].”
Wiggins also discussed instances of unfairness arising from the use of data to make high-stakes decisions, citing a 2016 ProPublica article on the use of algorithms to help make sentencing decisions that unfairly punished black people, and how ethics should be implemented in different contexts.
The subject is something experts at the Institute for Experiential AI think about often. As part of its Responsible AI (RAI) practice, the institute works with partners to build AI ethics roadmaps and offers other services like technical audits and independent ethics advisory boards. All of the solutions are designed to help organizations navigate ethical challenges presented by AI.
A Conversation on Power and Control
At the conclusion of the talk, during the fireside chat, Usama Fayyad asked Wiggins what he was most worried about in today’s world dominated by data, where, as he put it, “almost everything in our lives is being digitized and recorded.”
“I’m worried about unchecked power, and right now unchecked power happens to be tightly associated with abundant datasets,” Wiggins said.
Fayyad added to Wiggins’ response:
“What worries me in particular is that you collect data, which enables AI, and as you do more AI you end up getting more data, so AI begets data begets AI begets data,” Fayyad explained. “The cycle seems to get worse in terms of [concentrating] power.”
Fayyad then asked Wiggins what we can do to break the cycle and steer AI toward public good. Wiggins said that although people tend to think regulation is the only option, other factors such as societal norms, the ways people spend money (markets), and technological architecture also shape how a technology is used. Regulation, Wiggins argued, often comes only after a societal harm has been exposed and attributed to an actor.
“There’s a lot of things that need to happen for a regulatory force — usually the state — to reshape power that is unchecked,” Wiggins concluded. “But I think it’s a reasonable playbook — it’s just a playbook that requires vigilance in all steps.”
The talk, which took place on the Boston campus of Northeastern University, was the latest Distinguished Lecturer Seminar. Other upcoming events include a virtual seminar June 12 on the use of Responsible AI for Suicide Prevention led by institute research scientist Annika Marie Schoene and the recently-announced The State of AI in Precision Health event Oct. 10.