{"id":1963,"date":"2019-04-02T15:15:01","date_gmt":"2019-04-02T15:15:01","guid":{"rendered":"https:\/\/www.aiproblog.com\/index.php\/2019\/04\/02\/teaching-machines-to-reason-about-what-they-see\/"},"modified":"2019-04-02T15:15:01","modified_gmt":"2019-04-02T15:15:01","slug":"teaching-machines-to-reason-about-what-they-see","status":"publish","type":"post","link":"https:\/\/www.aiproblog.com\/index.php\/2019\/04\/02\/teaching-machines-to-reason-about-what-they-see\/","title":{"rendered":"Teaching machines to reason about what they see"},"content":{"rendered":"<p>Author: Kim Martineau | MIT Quest for Intelligence<\/p>\n<div>\n<p>A child who has never seen a pink elephant can still describe one \u2014 unlike a computer. \u201cThe computer learns from data,\u201d says Jiajun Wu, a PhD student at MIT.\u00a0\u201cThe ability to generalize and recognize something you\u2019ve never seen before \u2014 a pink elephant \u2014\u00a0is very hard for machines.\u201d<\/p>\n<p>Deep learning systems interpret the world by picking out statistical patterns in data. This form of machine\u00a0learning is now everywhere, automatically tagging friends on Facebook, narrating Alexa\u2019s latest weather forecast, and delivering fun facts via Google search. But statistical learning has its limits. It requires tons of data, has trouble explaining its decisions, and is terrible at applying past knowledge to new situations; It can\u2019t comprehend an elephant that\u2019s pink instead of gray.\u00a0\u00a0<\/p>\n<p>To give computers the ability to reason more like us, artificial intelligence (AI) researchers are returning to abstract, or symbolic, programming. Popular in the 1950s and 1960s, symbolic AI wires in the rules and logic that allow machines to make comparisons and interpret how objects and entities relate. Symbolic AI uses less data, records the chain of steps it takes to reach a decision, and when combined with the brute processing power of statistical neural networks, it can even beat humans in a complicated image comprehension test.\u00a0<\/p>\n<p>A new\u00a0<a href=\"https:\/\/openreview.net\/pdf?id=rJgMlhRctm\">study<\/a> by a team of researchers at\u00a0<a href=\"http:\/\/web.mit.edu\/\">MIT<\/a>,\u00a0<a href=\"https:\/\/mitibmwatsonailab.mit.edu\/\">MIT-IBM Watson AI Lab<\/a>, and\u00a0<a href=\"https:\/\/deepmind.com\/\">DeepMind<\/a>\u00a0shows the promise of merging statistical and symbolic AI.\u00a0Led by Wu and\u00a0<a href=\"http:\/\/web.mit.edu\/cocosci\/josh.html\">Joshua Tenenbaum<\/a>, a professor in MIT\u2019s\u00a0<a href=\"http:\/\/bcs.mit.edu\/\">Department of Brain and Cognitive Sciences<\/a>\u00a0and the\u00a0<a href=\"https:\/\/www.csail.mit.edu\/\">Computer Science and Artificial Intelligence Laboratory<\/a>, the team shows that its hybrid model can learn object-related concepts like color and shape, and leverage that knowledge to interpret complex object relationships in a scene. With minimal training data and no explicit programming, their model could transfer concepts to larger scenes and answer increasingly tricky questions as well as or better than its state-of-the-art peers. The team presents its results at the\u00a0<a href=\"https:\/\/iclr.cc\/\">International Conference on Learning Representations<\/a>\u00a0in May.<\/p>\n<p>\u201cOne way children learn concepts is by connecting words with images,\u201d says the study\u2019s lead author\u00a0<a href=\"http:\/\/jiayuanm.com\/\">Jiayuan Mao<\/a>, an undergraduate at Tsinghua University who worked on the project as a visiting fellow at MIT. \u201cA machine that can learn the same way needs much less data, and is better able to transfer its knowledge to new scenarios.\u201d<\/p>\n<p>The study is a strong argument for moving back toward abstract-program approaches, says\u00a0<a href=\"https:\/\/people.eecs.berkeley.edu\/~jda\/\">Jacob Andreas<\/a>, a recent graduate of the University of California at Berkeley, who starts at MIT as an assistant professor this fall and was not involved in the work. \u201cThe trick, it turns out, is to add more symbolic structure, and to feed the neural networks a representation of the world that\u2019s divided into objects and properties rather than feeding it raw images,\u201d he says. \u201cThis work gives us insight into what machines need to understand before language learning is possible.\u201d<\/p>\n<p>The team trained their model on images paired with related questions and answers, part of the\u00a0<a href=\"https:\/\/cs.stanford.edu\/people\/jcjohns\/clevr\/\">CLEVR<\/a>\u00a0image comprehension test developed at Stanford University. As the model learns, the questions grow progressively harder, from, \u201cWhat\u2019s the color of the object?\u201d to \u201cHow many objects are both right of the green cylinder and have the same material as the small blue ball?\u201d Once object-level concepts are mastered, the model advances to learning how to relate objects and their properties to each other.<\/p>\n<p>Like other hybrid AI models, MIT\u2019s works by splitting up the task. A perception module of neural networks crunches the pixels in each image and maps the objects. A language module, also made of neural nets, extracts a meaning from the words in each sentence and creates symbolic programs, or instructions, that tell the machine how to answer the question. A third reasoning module runs the symbolic programs on the scene and gives an answer, updating the model when it makes mistakes.<\/p>\n<p>Key to the team\u2019s approach is a perception module that translates the image into an object-based representation, making the programs easier to execute. Also unique is what they call curriculum learning, or selectively training the model on concepts and scenes that grow progressively more difficult. It turns out that feeding the machine data in a logical way, rather than haphazardly, helps the model learn faster while improving accuracy.<\/p>\n<p>Once the model has a solid foundation, it can interpret new scenes and concepts, and increasingly difficult questions, almost perfectly. Asked to answer an unfamiliar question like, \u201cWhat\u2019s the shape of the big yellow thing?\u201d it outperformed its peers at Stanford and nearby\u00a0<a href=\"https:\/\/www.ll.mit.edu\/\">MIT Lincoln Laboratory<\/a>\u00a0with a fraction of the data.\u00a0<\/p>\n<p>While other models trained on the full CLEVR dataset of 70,000 images and 700,000 questions, the MIT-IBM model used 5,000 images and 100,000 questions. As the model built on previously learned concepts, it absorbed the programs underlying each question, speeding up the training process.\u00a0<\/p>\n<p>Though statistical, deep learning models are now embedded in daily life, much of their decision process remains hidden from view. This lack of transparency makes it difficult to anticipate where the system is susceptible to manipulation, error, or bias. Adding a symbolic layer can open the black box, explaining the growing interest in hybrid AI systems.<\/p>\n<p>\u201cSplitting the task up and letting programs do some of the work is the key to building interpretability into deep learning models,\u201d says Lincoln Laboratory researcher\u00a0<a href=\"https:\/\/davidmascharka.com\/\">David Mascharka<\/a>, whose hybrid model,\u00a0<a href=\"http:\/\/news.mit.edu\/2018\/mit-lincoln-laboratory-ai-system-solves-problems-through-human-reasoning-0911\">Transparency by Design Network<\/a>,\u00a0is benchmarked in the MIT-IBM study.\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/p>\n<p>The MIT-IBM team is now working to improve the model\u2019s performance on real-world photos and extending it to video understanding and robotic manipulation. Other authors of the study are\u00a0<a href=\"http:\/\/people.csail.mit.edu\/ganchuang\/\">Chuang Gan<\/a>\u00a0and\u00a0<a href=\"https:\/\/sites.google.com\/site\/pushmeet\/\">Pushmeet Kohli<\/a>, researchers at the MIT-IBM Watson AI Lab and DeepMind, respectively.<\/p>\n<\/div>\n<p><a href=\"http:\/\/news.mit.edu\/2019\/teaching-machines-to-reason-about-what-they-see-0402\">Go to Source<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Author: Kim Martineau | MIT Quest for Intelligence A child who has never seen a pink elephant can still describe one \u2014 unlike a computer. [&hellip;] <span class=\"read-more-link\"><a class=\"read-more\" href=\"https:\/\/www.aiproblog.com\/index.php\/2019\/04\/02\/teaching-machines-to-reason-about-what-they-see\/\">Read More<\/a><\/span><\/p>\n","protected":false},"author":1,"featured_media":457,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_bbp_topic_count":0,"_bbp_reply_count":0,"_bbp_total_topic_count":0,"_bbp_total_reply_count":0,"_bbp_voice_count":0,"_bbp_anonymous_reply_count":0,"_bbp_topic_count_hidden":0,"_bbp_reply_count_hidden":0,"_bbp_forum_subforum_count":0,"footnotes":""},"categories":[24],"tags":[],"_links":{"self":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/1963"}],"collection":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/comments?post=1963"}],"version-history":[{"count":0,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/1963\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media\/466"}],"wp:attachment":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media?parent=1963"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/categories?post=1963"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/tags?post=1963"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}