by Tyler Wells Lynch
The Institute for Experiential AI welcomed Michael I. Jordan, Distinguished Professor of Computer Science and Statistics at the University of California, Berkeley, to speak about emerging problems and breakthroughs on the decision-making front of artificial intelligence. The lecture is part of IEAI’s Distinguished Lecturer Series. Visit the IEAI website to watch the replay or read on for a summary of the event.
Progress moves in stages, and the budding field of Artificial Intelligence (AI) is no exception. For the past decade or so, the AI community has focused on pattern recognition as its algorithmic weapon of choice but, for Michael I. Jordan, pattern recognition doesn’t go far enough. While there are some promising research areas, it just hasn’t led to the kind of productivity gains industry experts would like to see in deployed settings. A wealth of interacting systems, uncertainties, high stakes, and system-level variables make pattern recognition a misshapen piece of the puzzle.
What’s missing is the ability for systems to make decisions that are not only accurate but trustworthy. AI has to be about more than just creating a line of autonomous vehicles or machines that can be distributed and forgotten. Core to the principle of autonomy is the overhead view or the social context in which decisions are made, including the ethical, economic, and philosophical inputs that moderate our relationships to technology.
Said another way, it’s not just about how decision-making systems make decisions, but also about how they help make decisions.
That’s why, according to Jordan, the future of AI research will be about decision-making. Myriad research obstacles stand in the way — some statistical, some economic, and some political — but a flexible decision-making model can help researchers steer clear of some of them. Any successful model will also need to take an overhead view, incorporating data reliably and consistently but in a way that, ultimately, proves helpful to people.
What Does it Mean to Make a Decision?
Consider the individual model of decision-making: the person. People are complicated systems of constant decision-making. Some decisions last seconds while others last days or even months. But the inputs are always moderated by environments, people, and data. Individual autonomy is federated further by medical systems, bureaucracies, governments, transportation networks, and other structures — all of which contribute to the process of making a decision. A successful machine learning model needs to incorporate this kind of overhead view, but most existing systems are much narrower in scope. Recommendation systems like Netflix, Amazon, and Spotify, for example, have delivered a degree of productivity that makes other learning systems blush, but do not translate whatsoever to an environment of scarcity and competition. For example, an Amazon recommendation algorithm applied to a food delivery app or navigation system will only lead to congestion. Merely balancing the output loads will not solve the problem either because modified directives in a purposely balanced system lose sight of individual preferences, tastes, and schedules. Moreover, machines cannot ever fully understand individual preferences because people are constantly being regulated by subconscious biases and impulses that have little bearing on rationality.
The solution, according to Jordan, is to model learning systems after markets which are essentially decentralized algorithms. Participants can accomplish complex tasks, like feeding whole cities, without any centralized command or need for a high level of intelligence. Agents are also adaptive, responding to social and physical changes; robust, able to work at all times; and scalable, meaning they operate in small villages and giant cities. Moreover, agents on both sides of a market gamify their participation, optimizing outputs in a way that is not necessarily perfect, but is fluid — and that may be a price worth paying.
Bandits and Markets
There are two helpful models or problems for understanding how machines can learn to make better, more informed, and cooperative decisions. They are the Multi-Armed Bandit (MAB) problem and the concept of Matching Markets.
The MAB problem refers to a situation involving limited resources and agents who, not knowing the right outcome, have to explore to find an optimal action. A classic example of this is a gambler at a slot machine. The gambler’s goal is to maximize their reward by choosing the best machine and pulling the lever at the best time. There is a trade-off between “exploitation” and “exploration.” The gambler has to find the machine with the highest expected payout while continuously seeking information about payouts at other machines.
On the other hand, matching markets refer to the markets where price is not the only or primary concern. Think Uber: Participants have to be matched to one another in a way that isn’t determined by whether or not they can afford the transaction.
Michael Jordan proposes a machine learning model that fuses the two ideas. Can you build a learning market using a bandit algorithm — where systems exploit and explore for optimal “lever pulls,” while other agents on the other side of the market are simultaneously doing the same thing?
Jordan argues that this approach imitates real-world conditions: Agents get “noisy” rewards from pulling certain levers. The levers themselves adapt to their skill levels, and when multiple agents pull the same lever, only the most preferred agent gets a reward. This system would seem to mirror the behavior of human decision-making, where outcomes are not necessarily optimal but are indeed decentralized.
To test this theory, Jordan offers several algorithms. One is a straightforward merging of the Gale-Shapley algorithm with the Upper Confidence Bound algorithm. The nice thing about this system is that it’s incentive-compatible — no single agent has an incentive to deviate from the method. Said another way, if everybody is playing the algorithm, then each agent is incentivized to continue playing the algorithm as well.
Another avenue is strategic classification, which refers to the problem of predictive models in social settings where people are incentivized to game or strategize their inputs. A classic example of this scenario is when people dishonestly fill out health insurance or unemployment forms to secure a potential benefit. Another real-life example is when, a few years ago, Uber drivers in California worked together to create price surges by turning their phones — and, thus, their availability — off and on at predetermined times, allowing them to capitalize on the resulting surge pricing.
Jordan argues that we can’t expect to regulate away this kind of behavior. People are always going to attempt to game systems to their advantage. A successful learning algorithm needs to account for strategic agents seeking favorable predictions and decision-makers seeking to minimize prediction losses. Often, this behavior plays out in the form of a Stackelberg competition, where one player follows the other and adapts their behavior accordingly. The strategy lies in the variable response time, i.e., the space given to form a strategy, among leaders and followers.
One approach is to generalize the system to allow both players to learn gradually within their own timeframes. You can think of this in terms of slow and speedy decision-making systems: An example of a slow or “proactive” decision-maker would be a college admission or credit scoring system, both of which have to accumulate data slowly before making changes. Faster, more reactive decision-makers include online platforms like YouTube, Instagram, and Uber, who often need to make changes quickly in response to new data sets. The latter is better for the central decision-maker and worse for strategic agents because the system is harder to game.
However, by tuning their update frequency appropriately, the decision-maker can, theoretically, drive natural learning dynamics with rational strategic agents to arrive at a Stackelberg equilibrium, i.e., an agreeable balance between both parties. In many learning settings, both players prefer the equilibrium where the strategic agent leads and the decision-maker follows. In other words, a slow decision-maker is preferable to both the strategic agent and the decision-maker.
Jordan and fellow researchers have studied many other algorithms to tackle the problem of decision-making in AI, including those designed to handle the adversarial setting of a live auction. This algorithm uses an “approximate distribution” of bidder valuations to produce near-optimal revenues for “all true distributions.” They have also tested algorithms designed to learn and identify equilibria in matching markets via MAB feedback, including how to quantify distances from a determined equilibrium.
The Next Era
Despite some promising research avenues, many open problems remain. Jordan cautions that AI or machine learning is still in its infancy, reminiscent of the primordial stages of chemical and electrochemical engineering. In both those fields, like AI, researchers grasped some fundamental concepts — such as fluid dynamics, optics, and electromagnetism — but they lacked the mathematical and design principles needed to scale them.
Right now, with machine learning, we have proto-principles from statistics and computer science. And we have algorithms that allow us to conceive of AI systems encapsulating both data and human aspirations, but they’re going to take decades to mature. The journey has only just begun.
Q: With regards to the optimal routing of vehicles in traffic, how does the asynchronous feedback mechanism used by platforms like Uber and Google fit in with the aforementioned research vignettes?
Michael Jordan: It’s not so much about optimal routes. There’s this notion of “price of anarchy” in algorithmic game theory, and I think that’s a good way of thinking here. There could be some oracle, optimal routing thing — like a Google or supreme being — but I don’t think in human life that’s really what we should be aspiring to. That’s too much centralized control. Rather, we should have some looser, easier to trust and interactive mechanism that involves local transparency, bidding, and awareness. We pay a cost that way — the “price of anarchy” of being in that system — but we have a federated notion of contracts, slots, and interactions among each other that allows the system to exist.
I have worked on this problem, where you have to know, a priori, whether you have a slot or not. Think about the gig economy where people have cars and trucks and bicycles and they want to help move goods around the city, and there’s a matching market to provide it. People want to know in advance whether agents are available, and companies want to know if there’s enough supply to transport goods. Maybe there’s a spot market at the end where people can make extra money, and a whole little economy based around it.
Thinking of that as the “optimal” routing problem I don’t think is right — it’s very asynchronous — but I think the market metaphor is still right on here. What I’m bringing is not just the economics language but an awareness that this is all based on data analysis. Places like Uber and Amazon have this kind of problem emerging, but even there you don’t see the kind of intellectual merging of the full market design perspective with the full machine learning perspective. To me, that is the grand intellectual challenge of this era — how to make this real, scalable, trustable system — where people don’t just rely on Google to tell them where to go, but where they feel engaged and part of the process.
Q: When it comes to public policy decisions, trust becomes a big issue. There seems to be a trade-off between the optimality of those decisions and getting closer to human intuition. Do you think we should sacrifice some of that optimality of decisions in favor of getting closer to the intuition? If that’s the case, is there any framework that we can use or quantify in order to think about these trade-offs?
MJ: Humans trusting a system: Again, it’s not just that there’s an Underwriters Laboratory or Intel Inside brand statement on it that says you should trust us. The goal is to make people trust and believe in a system after they’ve interacted with it, because it works and, more importantly, because it delivers value.
Immediately you’re into the kind of game-theoretical coalitional side of it: Do I really want to be a part of this, or do I just want to disengage? That’s part of trust, and transparency is critically part of it. I’ve got to be able to probe and push and do counterfactuals and see what kinds of things come back before I start to trust it.
For example, if the doctor can’t give me good responses to my questions — help me identify where the uncertainty is coming from, then I start to not trust the doctor. It’s not just about the prediction that was made. It’s got to be this probing activity with a “what if” back-and-forth dialogue that builds trust and transparency. So, optimality is not the right word here. There’s a little tolerance of chaos or the price of anarchy, where you’re not optimal but you are trustable or believable.
Finally, there’s the interaction with the actual human being: How does trust and transparency arrive? A neural network is something that none of us will never understand: It’s input-output behavior. I don’t think spending a lot more time on all the pictures you can show about the neural network is really the way to go.
Here’s another simple proposal: I’ve just gone to the bank, I put my data into the network, and it denied me credit. Well, why? I need an explanation. You can’t just show me the weights on the neural network or the most important features and expect me to understand. Something else you could’ve done: In parallel with the neural network, on the side, you could’ve had a nearest neighbor system. Maybe you take 50 neighbors around the contours of the neural net, smoothing the surface locally just like a neural net does. The prediction will be about the same. You may lose a little accuracy — optimality, if you will — but it’s probably going to be a good prediction. We don’t use this in the real world because it’s extremely costly to find the nearest neighbors, so we prefer neural nets. On the other hand, for this auditing purpose, if you can return the nearest 50 neighbors to me, and I look at the neighbors and see how they differed, that might help me understand why I was denied credit while another person received it.
We can have systems in parallel. When we quantify them, we might see that we lost a little accuracy, but I actually don’t think we do. I think we just got a little slower, but maybe that’s what we need.
About Michael I. Jordan:
Michael I. Jordan is the Pehong Chen Distinguished Professor in the Department of EECS and the Department of Statistics at the University of California, Berkeley. His research interests include machine learning, optimization, and control theory. Professor Jordan is a member of the National Academy of Sciences, the National Academy of Engineering and a Foreign Member of the Royal Society (UK). He has given a Plenary Lecture at the International Congress of Mathematicians, he has received the IEEE John von Neumann Medal, the IJCAI Research Excellence Award, the AMS Ulf Grenander Prize in Stochastic Theory and Modeling, the David Rumelhart Prize, the ACM/AAAI Allen Newell Award, and he holds an Honorary Doctorate from Yale University.