Imagine two teams on a soccer field. The players can work together to achieve a goal and compete against other players with conflicting interests. That’s how the game works.
Creating artificial intelligence agents that can learn to compete and collaborate as effectively as humans remains a thorny issue. A key challenge is to enable AI agents to anticipate the future behavior of other agents when they are all learning at the same time.
Due to the complexity of this problem, current approaches are short-sighted; the agents can only guess the next moves of their teammates or competitors, leading to poor performance in the long run.
Researchers at MIT, the MIT-IBM Watson AI Lab and elsewhere have developed a new approach that gives AI agents a foresight. Their machine learning framework allows cooperative or competitive AI agents to consider what other agents will do as time approaches infinity, not just a few next steps. The agents adjust their behavior accordingly in order to influence the future behavior of other agents and arrive at an optimal long-term solution.
This framework could be used by a group of autonomous drones working together to find a lost hiker in a dense forest, or by self-driving cars that aim to keep passengers safe by anticipating the future movements of other vehicles traveling on a busy highway. to drive.
“When AI agents cooperate or compete, the most important thing is that their behavior converges at some point in the future. There are many transient behaviors along the way that don’t matter much in the long run. Achieving this converged behavior is what we really care about, and we now have a mathematical way to make that happen,” said Dong-Ki Kim, a graduate student in the MIT Laboratory for Information and Decision Systems (LIDS) and lead author of a paper describing this framework.
The senior author is Jonathan P. How, the Richard C. Maclaurin Professor of Aeronautics and Astronautics and a member of the MIT-IBM Watson AI Lab. Co-authors include the MIT-IBM Watson AI Lab, IBM Research, Mila-Quebec Artificial Intelligence Institute, and Oxford University. The research will be presented at the Conference on Neural Information Processing Systems.
In this demo video, the red robot, trained using the researchers’ machine learning system, can beat the green robot by learning more effective behaviors that take advantage of its opponent’s constantly changing strategy.
More cops, more problems
The researchers focused on a problem known as multiagent reinforcement learning. Reinforcement learning is a form of machine learning where an AI agent learns through trial and error. Researchers give the agent a reward for “good” behavior that helps him achieve a goal. The agent modifies its behavior to maximize that reward until it eventually becomes an expert at a task.
But when many co-operative or competitive agents learn at the same time, things become increasingly complex. As agents consider more of their fellow agents’ future moves, and how their own behavior affects others, the problem soon requires far too much computing power to efficiently solve. Therefore, other approaches only focus on the short term.
“The AIs really want to think about the end of the game, but they don’t know when the game will end. They need to think about how they can keep adapting their behavior indefinitely so that they can win sometime far into the future. Our paper essentially proposes a new objective that enables an AI to think about infinity,” says Kim.
But since it’s impossible to plug infinity into an algorithm, the researchers designed their system so that agents focus on a future point where their behavior coincides with that of other agents, known as equilibrium. An equilibrium point determines the long-term performance of agents, and multiple equilibria can exist in a multi-agent scenario. Therefore, an effective agent actively influences the future behavior of other agents in such a way that they reach a desirable equilibrium from the agent’s perspective. If all the agents influence each other, they converge to a general concept that the researchers call an “active equilibrium.”
The machine learning framework they developed, known as FURTHER (which stands for FUlly Reinforcing Active Influence with Average Reward), allows agents to learn how to modify their behavior while interacting with other agents to maintain this active balance. to achieve.
VERDER does this with the help of two machine-learning modules. The first, an inference module, allows an agent to guess the future behavior of other agents and the learning algorithms they use based solely on their past actions.
This information feeds into the reinforcement learning module, which the agent uses to modify its behavior and influence other agents in a way that maximizes its reward.
“The challenge was to think about infinity. We had to use a lot of different mathematical tools to make that happen, and make some assumptions to make it work in practice,” says Kim.
Win in the long run
They tested their approach in a variety of scenarios against other multi-agent reinforcement learning frameworks, including a pair of robots battling sumo-style and a battle pitting two teams of 25 agents against each other. In both cases, the AI agents using VERDER won the games more often.
Because their approach is decentralized, meaning the agents learn to win the games independently, it’s also more scalable than other methods that require a central computer to control the agents, Kim explains.
The researchers used games to test their approach, but VERDER could be used to tackle any kind of multi-agent problem. For example, it could be applied by economists who want to develop sound policies in situations where many interacting rightsholders have behaviors and interests that change over time.
Economics is an application Kim is particularly enthusiastic about studying. He also wants to elaborate on the concept of active equilibrium and further improve the FURTHER framework.
This research is funded in part by the MIT-IBM Watson AI Lab.