Author: Rob Matheson | MIT News Office
An autonomous robotic system invented by researchers at MIT and the Woods Hole Oceanographic Institution (WHOI) efficiently sniffs out the most scientifically interesting — but hard-to-find — sampling spots in vast, unexplored waters.
Environmental scientists are often interested in gathering samples at the most interesting locations, or “maxima,” in an environment. One example could be a source of leaking chemicals, where the concentration is the highest and mostly unspoiled by external factors. But a maximum can be any quantifiable value that researchers want to measure, such as water depth or parts of coral reef most exposed to air.
Efforts to deploy maximum-seeking robots suffer from efficiency and accuracy issues. Commonly, robots will move back and forth like lawnmowers to cover an area, which is time-consuming and collects many uninteresting samples. Some robots sense and follow high-concentration trails to their leak source. But they can be misled. For example, chemicals can get trapped and accumulate in crevices far from a source. Robots may identify those high-concentration spots as the source yet be nowhere close.
In a paper being presented at the International Conference on Intelligent Robots and Systems (IROS), the researchers describe “PLUMES,” a system that enables autonomous mobile robots to zero in on a maximum far faster and more efficiently. PLUMES leverages probabilistic techniques to predict which paths are likely to lead to the maximum, while navigating obstacles, shifting currents, and other variables. As it collects samples, it weighs what it’s learned to determine whether to continue down a promising path or search the unknown — which may harbor more valuable samples.
Importantly, PLUMES reaches its destination without ever getting trapped in those tricky high-concentration spots. “That’s important, because it’s easy to think you’ve found gold, but really you’ve found fool’s gold,” says co-first author Victoria Preston, a PhD student in the Computer Science and Artificial Intelligence Laboratory (CSAIL) and in the MIT-WHOI Joint Program.
The researchers built a PLUMES-powered robotic boat that successfully detected the most exposed coral head in the Bellairs Fringing Reef in Barbados — meaning, it was located in the shallowest spot — which is useful for studying how sun exposure impacts coral organisms. In 100 simulated trials in diverse underwater environments, a virtual PLUMES robot also consistently collected seven to eight times more samples of maxima than traditional coverage methods in allotted time frames.
“PLUMES does the minimal amount of exploration necessary to find the maximum and then concentrates quickly on collecting valuable samples there,” says co-first author Genevieve Flaspohler, a PhD student and in CSAIL and the MIT-WHOI Joint Program.
Joining Preston and Flaspohler on the paper are: Anna P.M. Michel and Yogesh Girdhar, both scientists in the Department of Applied Ocean Physics and Engineering at the WHOI; and Nicholas Roy, a professor in CSAIL and in the Department of Aeronautics and Astronautics.
Navigating an exploit-explore tradeoff
A key insight of PLUMES was using techniques from probability to reason about navigating the notoriously complex tradeoff between exploiting what’s learned about the environment and exploring unknown areas that may be more valuable.
“The major challenge in maximum-seeking is allowing the robot to balance exploiting information from places it already knows to have high concentrations and exploring places it doesn’t know much about,” Flaspohler says. “If the robot explores too much, it won’t collect enough valuable samples at the maximum. If it doesn’t explore enough, it may miss the maximum entirely.”
Dropped into a new environment, a PLUMES-powered robot uses a probabilistic statistical model called a Gaussian process to make predictions about environmental variables, such as chemical concentrations, and estimate sensing uncertainties. PLUMES then generates a distribution of possible paths the robot can take, and uses the estimated values and uncertainties to rank each path by how well it allows the robot to explore and exploit.
At first, PLUMES will choose paths that randomly explore the environment. Each sample, however, provides new information about the targeted values in the surrounding environment — such as spots with highest concentrations of chemicals or shallowest depths. The Gaussian process model exploits that data to narrow down possible paths the robot can follow from its given position to sample from locations with even higher value. PLUMES uses a novel objective function — commonly used in machine-learning to maximize a reward — to make the call of whether the robot should exploit past knowledge or explore the new area.
The decision where to collect the next sample relies on the system’s ability to “hallucinate” all possible future action from its current location. To do so, it leverages a modified version of Monte Carlo Tree Search (MCTS), a path-planning technique popularized for powering artificial-intelligence systems that master complex games, such as Go and Chess.
MCTS uses a decision tree — a map of connected nodes and lines — to simulate a path, or sequence of moves, needed to reach a final winning action. But in games, the space for possible paths is finite. In unknown environments, with real-time changing dynamics, the space is effectively infinite, making planning extremely difficult. The researchers designed “continuous-observation MCTS,” which leverages the Gaussian process and the novel objective function to search over this unwieldy space of possible real paths.
The root of this MCTS decision tree starts with a “belief” node, which is the next immediate step the robot can take. This node contains the entire history of the robot’s actions and observations up until that point. Then, the system expands the tree from the root into new lines and nodes, looking over several steps of future actions that lead to explored and unexplored areas.
Then, the system simulates what would happen if it took a sample from each of those newly generated nodes, based on some patterns it has learned from previous observations. Depending on the value of the final simulated node, the entire path receives a reward score, with higher values equaling more promising actions. Reward scores from all paths are rolled back to the root node. The robot selects the highest-scoring path, takes a step, and collects a real sample. Then, it uses the real data to update its Gaussian process model and repeats the “hallucination” process.
“As long as the system continues to hallucinate that there may be a higher value in unseen parts of the world, it must keep exploring,” Flaspohler says. “When it finally converges on a spot it estimates to be the maximum, because it can’t hallucinate a higher value along the path, it then stops exploring.”
Now, the researchers are collaborating with scientists at WHOI to use PLUMES-powered robots to localize chemical plumes at volcanic sites and study methane releases in melting coastal estuaries in the Arctic. Scientists are interested in the source of chemical gases released into the atmosphere, but these test sites can span hundreds of square miles.
“They can [use PLUMES to] spend less time exploring that huge area and really concentrate on collecting scientifically valuable samples,” Preston says.