{"id":6092,"date":"2022-11-10T17:00:00","date_gmt":"2022-11-10T17:00:00","guid":{"rendered":"https:\/\/www.aiproblog.com\/index.php\/2022\/11\/10\/ensuring-ai-works-with-the-right-dose-of-curiosity\/"},"modified":"2022-11-10T17:00:00","modified_gmt":"2022-11-10T17:00:00","slug":"ensuring-ai-works-with-the-right-dose-of-curiosity","status":"publish","type":"post","link":"https:\/\/www.aiproblog.com\/index.php\/2022\/11\/10\/ensuring-ai-works-with-the-right-dose-of-curiosity\/","title":{"rendered":"Ensuring AI works with the right dose of curiosity"},"content":{"rendered":"<p>Author: Rachel Gordon | MIT CSAIL<\/p>\n<div>\n<p>It\u2019s a dilemma as old as time. Friday night has rolled around, and you\u2019re trying to pick a restaurant for dinner. Should you visit your most beloved watering hole or try a new establishment, in the hopes of discovering something superior? Potentially, but that curiosity comes with a risk: If you explore the new option, the food could be worse. On the flip side, if you stick with what you know works well, you won&#8217;t grow out of your narrow pathway.\u00a0<\/p>\n<p>Curiosity drives artificial intelligence to explore the world, now in boundless use cases \u2014 autonomous navigation, robotic decision-making, optimizing health outcomes, and more. Machines, in some cases, use \u201creinforcement learning\u201d to accomplish a goal, where an AI agent iteratively learns from being rewarded for good behavior and punished for bad. Just like the dilemma faced by humans in selecting a restaurant, these agents also struggle with balancing the time spent discovering better actions (exploration) and the time spent taking actions that led to high rewards in the past (exploitation). Too much curiosity can distract the agent from making good decisions, while too little means the agent will never discover good decisions.<\/p>\n<p>In the pursuit of making AI agents with just the right dose of curiosity, researchers from MIT\u2019s Improbable AI Laboratory and Computer Science and Artificial Intelligence Laboratory (CSAIL) <a href=\"https:\/\/williamd4112.github.io\/pubs\/neurips22_eipo.pdf\" target=\"_blank\" rel=\"noopener\">created an algorithm<\/a> that overcomes the problem of AI being too \u201ccurious\u201d and getting distracted by a given task. Their algorithm automatically increases curiosity when it&#8217;s needed, and suppresses it if the agent gets enough supervision from the environment to know what to do.<\/p>\n<p>When tested on over 60 video games, the algorithm was able to succeed at both hard and easy exploration tasks, where previous algorithms have only been able to tackle only a hard or easy domain alone. With this method, AI agents use fewer data for learning decision-making rules that maximize incentives.\u00a0\u00a0<\/p>\n<p>\u201cIf you master the exploration-exploitation trade-off well, you can learn the right decision-making rules faster \u2014 and anything less will require lots of data, which could mean suboptimal medical treatments, lesser profits for websites, and robots that don&#8217;t learn to do the right thing,\u201d says Pulkit Agrawal, an assistant professor of electrical engineering and computer science (EECS) at MIT, director of the Improbable AI Lab, and CSAIL affiliate who supervised the research. \u201cImagine a website trying to figure out the design or layout of its content that will maximize sales. If one doesn\u2019t perform exploration-exploitation well, converging to the right website design or the right website layout will take a long time, which means profit loss. Or in a health care setting, like with Covid-19, there may be a sequence of decisions that need to be made to treat a patient, and if you want to use decision-making algorithms, they need to learn quickly and efficiently \u2014 you don&#8217;t want a suboptimal solution when treating a large number of patients. We hope that this work will apply to real-world problems of that nature.\u201d\u00a0<\/p>\n<p>It\u2019s hard to encompass the nuances of curiosity\u2019s psychological underpinnings; the underlying neural correlates of challenge-seeking behavior are a poorly understood phenomenon. Attempts to categorize the behavior have spanned studies that dived deeply into studying our impulses, deprivation sensitivities, and social and stress tolerances.\u00a0<\/p>\n<p>With reinforcement learning, this process is \u201cpruned\u201d emotionally and stripped down to the bare bones, but it\u2019s complicated on the technical side. Essentially, the agent should only be curious when there\u2019s not enough supervision available to try out different things, and if there is supervision, it must adjust curiosity and lower it.\u00a0<\/p>\n<p>Since a large subset of gaming is little agents running around fantastical environments looking for rewards and performing a long sequence of actions to achieve some goal, it seemed like the logical test bed for the researchers\u2019 algorithm. In experiments, researchers divided games like \u201cMario Kart\u201d and \u201cMontezuma\u2019s Revenge\u201d into two different buckets: one where supervision was sparse, meaning the agent had less guidance, which were considered \u201chard\u201d exploration games, and a second where supervision was more dense, or the \u201ceasy\u201d exploration games.\u00a0<\/p>\n<p>Suppose in \u201cMario Kart,\u201d for example, you only remove all rewards so you don\u2019t know when an enemy eliminates you. You\u2019re not given any reward when you collect a coin or jump over pipes. The agent is only told in the end how well it did. This would be a case of sparse supervision. Algorithms that incentivize curiosity do really well in this scenario.\u00a0<\/p>\n<p>But now, suppose the agent is provided dense supervision \u2014 a reward for jumping over pipes, collecting coins, and eliminating enemies. Here, an algorithm without curiosity performs really well because it gets rewarded often. But if you instead take the algorithm that also uses curiosity, it learns slowly. This is because the curious agent might attempt to run fast in different ways, dance around, go to every part of the game screen \u2014 things that are interesting, but do not help the agent succeed at the game. The team\u2019s algorithm, however, consistently performed well, irrespective of what environment it was in.\u00a0<\/p>\n<p>Future work might involve circling back to the exploration that\u2019s delighted and plagued psychologists for years: an appropriate metric for curiosity \u2014 no one really knows the right way to mathematically define curiosity.\u00a0<\/p>\n<p>\u201cGetting consistent good performance on a novel problem is extremely challenging \u2014 so by improving exploration algorithms, we can save your effort on tuning an algorithm for your problems of interest, says Zhang-Wei Hong, an EECS PhD student, CSAIL affiliate, and co-lead author along with Eric Chen\u00a0\u201920, MEng\u00a0\u201921 on a <a href=\"https:\/\/williamd4112.github.io\/pubs\/neurips22_eipo.pdf\" target=\"_blank\" rel=\"noopener\">new paper about the work<\/a>. \u201cWe need curiosity to solve extremely challenging problems, but on some problems it can hurt performance. We propose an algorithm that removes the burden of tuning the balance of exploration and exploitation. Previously what took, for instance, a week to successfully solve the problem, with this new algorithm, we can get satisfactory results in a few hours.\u201d<\/p>\n<p>\u201cOne of the greatest challenges for current AI and cognitive science is how to balance exploration and exploitation \u2014 the search for information versus the search for reward. Children do this seamlessly, but it is challenging computationally,\u201d notes Alison Gopnik, professor of psychology and affiliate professor of philosophy at the University of California at Berkeley, who was not involved with the project. \u201cThis paper uses impressive new techniques to accomplish this automatically, designing an agent that can systematically balance curiosity about the world and the desire for reward, [thus taking] another step towards making AI agents (almost) as smart as children.\u201d<\/p>\n<p>\u201cIntrinsic rewards like curiosity are fundamental to guiding agents to discover useful diverse behaviors, but this shouldn\u2019t come at the cost of doing well at the given task. This is an important problem in AI, and the paper provides a way to balance that trade-off,\u201d adds Deepak Pathak, an assistant professor at Carnegie Mellon University, who was also not involved in the work.\u00a0\u201cIt would be interesting to see how such methods scale beyond games to real-world robotic agents.\u201d<\/p>\n<p>Chen, Hong, and Agrawal wrote the paper alongside Joni Pajarinen, assistant professor at Aalto University and research leader at the Intelligent Autonomous Systems Group at TU Darmstadt. The research was supported, in part, by the MIT-IBM Watson AI Lab, DARPA Machine Common Sense Program, the Army Research Office by the United States Air Force Research Laboratory, and the United States Air Force Artificial Intelligence Accelerator. The paper will be presented at Neural Information and Processing Systems (NeurIPS) 2022.<\/p>\n<\/div>\n<p><a href=\"https:\/\/news.mit.edu\/2022\/ensuring-ai-works-with-right-dose-curiosity-1110\">Go to Source<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Author: Rachel Gordon | MIT CSAIL It\u2019s a dilemma as old as time. Friday night has rolled around, and you\u2019re trying to pick a restaurant [&hellip;] <span class=\"read-more-link\"><a class=\"read-more\" href=\"https:\/\/www.aiproblog.com\/index.php\/2022\/11\/10\/ensuring-ai-works-with-the-right-dose-of-curiosity\/\">Read More<\/a><\/span><\/p>\n","protected":false},"author":1,"featured_media":473,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_bbp_topic_count":0,"_bbp_reply_count":0,"_bbp_total_topic_count":0,"_bbp_total_reply_count":0,"_bbp_voice_count":0,"_bbp_anonymous_reply_count":0,"_bbp_topic_count_hidden":0,"_bbp_reply_count_hidden":0,"_bbp_forum_subforum_count":0,"footnotes":""},"categories":[24],"tags":[],"_links":{"self":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/6092"}],"collection":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/comments?post=6092"}],"version-history":[{"count":0,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/6092\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media\/475"}],"wp:attachment":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media?parent=6092"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/categories?post=6092"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/tags?post=6092"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}