by Tyler Wells Lynch
Encoded biases, public misconceptions, demographic misrepresentation — the challenges and controversies surrounding the field of Artificial Intelligence (AI) are often one and the same. In his podcast titled, Legends of Data & AI, Dr. Usama Fayyad recently spoke with Tina Eliassi-Rad, professor of Computer Science at Northeastern University and faculty member of the Institute for Experiential AI (EAI), about how data scientists can navigate some of these issues and bring about a more fair and democratic field of inquiry.
Spanning cutting-edge problems in ethics and AI in both business and academia, the conversation centered on three core challenges and areas of progress: context, uncertainty, and transparency.
Context: How to Tell Good Data from Bad Data
Contextuality is often at the heart of the communication problems between data scientists and the wider public, including students. Any successful application of AI — whether that’s a measure of business performance or ethical factors — depends in large part on the context in which a machine learning (ML) system is developed: Where did the data come from? What were the developers trying to accomplish? What gamified systems were working against them?
In her talk with Fayyad, Eliassi-Rad highlighted her work at the Catholic University in Belgium as an example of how organizations must consider the context of their data from the very start. The Belgian government contracted Eliassi-Rad and other data scientists to develop an ML system that could detect companies who intentionally declared bankruptcy to avoid paying taxes.
At the outset, the team dissected the problem by identifying its core components — namely, the source of the data, the context in which it was collected, as well as more sociological considerations like commonality, social networking, dynamic behavior, and game theory. Eventually, the team developed an algorithm that detected up to 55 percent more tax fraud over time.
Uncertainty: Embracing the Unknowns
Successful AI systems depend on a systemic embrace of uncertainty. That’s an easy lesson for data scientists trained in probability theory, but when it comes to convincing executives, students, or the lay public, it’s not such a slam dunk. Eliassi-Rad points to uncertainty and the failure to explain its significance as a major roadblock to democratizing AI.
Machine learning algorithms learn from experience, which means outcomes are often uncertain. Risk assessments, for example, are typically measured on a scale of 0–10, but those assessments aren’t usually provided alongside the uncertainties baked into the algorithm. When it comes to articulating those probabilities, data scientists too often fall short, failing to elucidate where their data came from, how it was distributed, and what processes went into collecting it.
All those factors play into the success or failure of an algorithm, including a mess of ethical concerns serious enough to make headlines. From racialized facial recognition tools to biased recidivism algorithms, ML systems have been shown to carry flaws with myriad social consequences. The problem inspired Eliassi-Rad’s 2017 public lecture at Harvard University, Just Machine Learning, which attempted to answer two critical questions:
Is there such a thing as just machine learning?
If so, is just machine learning possible in our unjust world?
Part of the challenge in answering these questions returns, once again, to the idea of context. Particularly in business and marketing environments, systems managers prefer to hide complexities. Companies want to assure their clients and customers of the reliability of their products. But it’s the hidden hyper-parameters — the limitations — that determine a model’s outcomes, including its impact on social subgroups or marginalized members of society.
Citing Ayanna Howard, dean at The Ohio State University, Eliassi-Rad points out that, when we build a bridge, we build it for everybody, not just, say, white people. The same must apply to machine learning. The fact that we have designed and implemented a variety of systems with harmful encoded biases erodes public trust in those systems and, in turn, makes it harder to democratize them.
Transparency: Clearing the Fog of Big Data
Eliassi-Rad argues for greater transparency at the point of gathering data. One solution is a type of long-form data “birth certificate,” which discloses the motivation behind a dataset, including how it was composed, collected, and pre-processed, as well as any potential future applications or distributions. Comparing the applicability of algorithms to prescription drugs, Eliassi-Rad also suggests a need for “warning labels.”
“Machine learning works well for certain groups in our population and doesn’t work well for others,” she says. “If I don’t have data on the elderly then — similar to prescription drugs — an ML algorithm will not predict well for them and may even have adverse effects.”
As more and more algorithms find their way into social settings, it raises the stakes even higher. Enterprises may find themselves in a situation where they can either pay now — by investing in audits, data cleaning, and ethical first principles prior to development — or they can pay much more later on — in the form of lawsuits, bad press, and public controversy.
To that end, Eliassi-Rad offers two takeaways. The first is to care about the uncertainty in machine learning and AI systems: Decision-makers need to ask about uncertainty because it is certain that the systems will not be certain. The second takeaway is to care about how products work on marginalized people. Representation matters, and the only thing that’s truly going to make AI more democratic — and, thus, more mainstream — is an industry that reflects the population these systems are designed to benefit.
“There are no quick solutions,” Eliassi-Rad says. “But to make it better we need more representation. To have more representation we need to start young — in elementary schools, in middle schools — and go and talk to those folks and encourage them to pursue STEM. That’s how you build up the representation.”