{"id":5833,"date":"2022-08-16T14:00:00","date_gmt":"2022-08-16T14:00:00","guid":{"rendered":"https:\/\/www.aiproblog.com\/index.php\/2022\/08\/16\/making-robots-more-helpful-with-language\/"},"modified":"2022-08-16T14:00:00","modified_gmt":"2022-08-16T14:00:00","slug":"making-robots-more-helpful-with-language","status":"publish","type":"post","link":"https:\/\/www.aiproblog.com\/index.php\/2022\/08\/16\/making-robots-more-helpful-with-language\/","title":{"rendered":"Making robots more helpful with language"},"content":{"rendered":"<p>Author: <\/p>\n<div>\n<div class=\"block-paragraph\">\n<div class=\"rich-text\">\n<p data-block-key=\"4iyie\">Even the simplest human tasks are unbelievably complex. The way we perceive and interact with the world requires a lifetime of accumulated experience and context. For example, if a person tells you, \u201cI am running out of time,\u201d you don\u2019t immediately worry they are jogging on a street where the space-time continuum ceases to exist. You understand that they\u2019re probably coming up against a deadline. And if they hurriedly walk toward a closed door, you don\u2019t brace for a collision, because you trust this person can open the door, whether by turning a knob or pulling a handle.<\/p>\n<\/div>\n<\/div>\n<div class=\"block-paragraph\">\n<div class=\"rich-text\">\n<p data-block-key=\"4iyie\">A robot doesn\u2019t innately have that understanding. And that\u2019s the inherent challenge of programming helpful robots that can interact with humans. We know it as \u201cMoravec&#8217;s paradox\u201d \u2014 the idea that in robotics, it\u2019s the easiest things that are the most difficult to program a robot to do. This is because we\u2019ve had all of human evolution to master our basic motor skills, but relatively speaking, humans have only just learned algebra.<\/p>\n<\/div>\n<\/div>\n<div class=\"block-paragraph\">\n<div class=\"rich-text\">\n<p data-block-key=\"4iyie\">In other words, there\u2019s a genius to human beings \u2014 from understanding idioms to manipulating our physical environments \u2014 where it seems like we just \u201cget it.\u201d The same can\u2019t be said for robots.<\/p>\n<\/div>\n<\/div>\n<div class=\"block-paragraph\">\n<div class=\"rich-text\">\n<p data-block-key=\"4iyie\">Today, robots by and large exist in industrial environments, and are painstakingly coded for narrow tasks. This makes it impossible for them to adapt to the unpredictability of the real world. That\u2019s why <a href=\"https:\/\/research.google\/\">Google Research<\/a> and <a href=\"https:\/\/everydayrobots.com\/\">Everyday Robots<\/a> are working together to combine the best of language models with robot learning.<\/p>\n<\/div>\n<\/div>\n<div class=\"block-paragraph\">\n<div class=\"rich-text\">\n<p data-block-key=\"4iyie\">Called <a href=\"https:\/\/sites.research.google\/palm-saycan\">PaLM-SayCan<\/a>, this joint research uses <a href=\"https:\/\/arxiv.org\/pdf\/2204.02311.pdf\">PaLM<\/a> \u2014 or Pathways Language Model \u2014 in a robot learning model running on an Everyday Robots helper robot. This effort is the first implementation that uses a large-scale language model to plan for a real robot. It not only makes it possible for people to communicate with helper robots via text or speech, but also improves the robot\u2019s overall performance and ability to execute more complex and abstract tasks by tapping into the world knowledge encoded in the language model.<\/p>\n<\/div>\n<\/div>\n<div class=\"block-image_full_width\">\n<div class=\"h-c-page\">\n<div class=\"article-image h-c-grid__col--10 h-c-grid__col--offset-1 h-c-grid__col-l--offset-2 h-c-grid__col-l--8\"><video alt=\"I just worked out bring me a snack\" autoplay=\"\" class=\"article-image__media\" loop=\"\" muted=\"\" playsinline=\"\" src=\"https:\/\/storage.googleapis.com\/gweb-uniblog-publish-prod\/original_videos\/I_just_worked_out_bring_me_a_snack_QT.mp4\" tabindex=\"0\" title=\"A helper robot responding to the task \u2018I\u2019m tired. Bring me a snack that\u2019ll give me some energy, please.\" type=\"video\/mp4\">Video format not supported<\/video><\/div>\n<\/div>\n<\/div>\n<div class=\"block-paragraph\">\n<div class=\"rich-text\">\n<h3 data-block-key=\"9g4gd\">Using language to improve robots<\/h3>\n<\/div>\n<\/div>\n<div class=\"block-paragraph\">\n<div class=\"rich-text\">\n<p data-block-key=\"9g4gd\">PaLM-SayCan enables the robot to understand the way we communicate, facilitating more natural interaction. Language is a reflection of the human mind\u2019s ability to assemble tasks, put them in context and even reason through problems. Language models also contain enormous amounts of information about the world, and it turns out that can be pretty helpful to the robot. PaLM can help the robotic system process more complex, open-ended prompts and respond to them in ways that are reasonable and sensible.<\/p>\n<\/div>\n<\/div>\n<div class=\"block-paragraph\">\n<div class=\"rich-text\">\n<p data-block-key=\"9g4gd\">PaLM-SayCan shows that a robot\u2019s performance can be improved simply by enhancing the underlying language model. When the system was integrated with PaLM, compared to a less powerful baseline model, we saw a 14% improvement in the planning success rate, or the ability to map a viable approach to a task. We also saw a 13% improvement on the execution success rate, or ability to successfully carry out a task. This is half the number of planning mistakes made by the baseline method. The biggest improvement, at 26%, is in planning long horizon tasks, or those in which eight or more steps are involved. Here\u2019s an example: \u201cI left out a soda, an apple and water. Can you throw them away and then bring me a sponge to wipe the table?\u201d Pretty demanding, if you ask me.<\/p>\n<p data-block-key=\"fsadd\">\n<\/div>\n<\/div>\n<div class=\"block-paragraph\">\n<div class=\"rich-text\">\n<h3 data-block-key=\"9g4gd\">Making sense of the world through language<\/h3>\n<\/div>\n<\/div>\n<div class=\"block-paragraph\">\n<div class=\"rich-text\">\n<p data-block-key=\"9g4gd\">With PaLM, we\u2019re seeing new capabilities emerge in the language domain such as reasoning via <a href=\"https:\/\/ai.googleblog.com\/2022\/05\/language-models-perform-reasoning-via.html\">chain of thought prompting<\/a>. This allows us to see and improve how the model interprets the task. For example, if you show the model a handful of examples with the thought process behind how to respond to a query, it learns to reason through those prompts. This is similar to how we learn by showing our work on our algebra homework.<\/p>\n<\/div>\n<\/div>\n<div class=\"block-image_full_width\">\n<div class=\"h-c-page\">\n<div class=\"article-image h-c-grid__col--10 h-c-grid__col--offset-1 h-c-grid__col-l--offset-2 h-c-grid__col-l--8\"><img decoding=\"async\" alt=\"PaLM-SayCan uses chain of thought prompting, which interprets the instruction in order to score the likelihood of completing the task\" class=\"article-image--full\" src=\"https:\/\/storage.googleapis.com\/gweb-uniblog-publish-prod\/images\/Chain_of_thought_prompting.max-1000x1000.png\" tabindex=\"0\"><\/div>\n<\/div>\n<\/div>\n<div class=\"block-paragraph\">\n<div class=\"rich-text\">\n<p data-block-key=\"9g4gd\">So if you ask PaLM-SayCan, \u201cBring me a snack and something to wash it down with,\u201d it uses chain of thought prompting to recognize that a bag of chips may be a good snack, and that \u201cwash it down\u201d means bring a drink. Then PaLM-SayCan can respond with a series of steps to accomplish this. While we\u2019re early in our research, this is promising for a future where robots can handle complex requests.<\/p>\n<p data-block-key=\"5jkg1\">\n<\/div>\n<\/div>\n<div class=\"block-paragraph\">\n<div class=\"rich-text\">\n<h3 data-block-key=\"9g4gd\">Grounding language through experience<\/h3>\n<\/div>\n<\/div>\n<div class=\"block-paragraph\">\n<div class=\"rich-text\">\n<p data-block-key=\"9g4gd\">Complexity exists in both language and the environments around us. That\u2019s why grounding artificial intelligence in the real world is a critical part of what we do in Google Research. A language model may suggest something that appears reasonable and helpful, but may not be safe or realistic in a given setting. Robots, on the other hand, have been trained to know what is possible given the environment. By fusing language and robotic knowledge, we\u2019re able to improve the overall performance of a robotic system.<\/p>\n<\/div>\n<\/div>\n<div class=\"block-paragraph\">\n<div class=\"rich-text\">\n<p data-block-key=\"9g4gd\">Here\u2019s how this works in PaLM-SayCan: PaLM suggests possible approaches to the task based on language understanding, and the robot models do the same based on the feasible skill set. The combined system then cross-references the two to help identify more helpful and achievable approaches for the robot.<\/p>\n<\/div>\n<\/div>\n<div class=\"block-image_full_width\">\n<div class=\"h-c-page\">\n<div class=\"article-image h-c-grid__col--10 h-c-grid__col--offset-1 h-c-grid__col-l--offset-2 h-c-grid__col-l--8\"><img decoding=\"async\" alt=\"By combining language and robotic affordances, PaLM-SayCan breaks down the requested task to perform it successfully\" class=\"article-image--full\" src=\"https:\/\/storage.googleapis.com\/gweb-uniblog-publish-prod\/original_images\/palm_gif_8_12_22.gif\" tabindex=\"0\"><\/div>\n<\/div>\n<\/div>\n<div class=\"block-paragraph\">\n<div class=\"rich-text\">\n<p data-block-key=\"9g4gd\">For example, if you ask the language model, \u201cI spilled my drink, can you help?,\u201d it may suggest you try using a vacuum. This seems like a perfectly reasonable way to clean up a mess, but generally, it\u2019s probably not a good idea to use a vacuum on a liquid spill. And if the robot can\u2019t pick up a vacuum or operate it, it\u2019s not a particularly helpful way to approach the task. Together, the two may instead be able to realize \u201cbring a sponge\u201d is both possible and more helpful.<\/p>\n<p data-block-key=\"da6u8\">\n<\/div>\n<\/div>\n<div class=\"block-paragraph\">\n<div class=\"rich-text\">\n<h3 data-block-key=\"9g4gd\">Experimenting responsibly<\/h3>\n<\/div>\n<\/div>\n<div class=\"block-paragraph\">\n<div class=\"rich-text\">\n<p data-block-key=\"9g4gd\">We take a responsible approach to this research and follow Google\u2019s <a href=\"https:\/\/ai.google\/principles\/\">AI\u2019s Principles<\/a> in the development of our robots. Safety is our number-one priority and especially important for a learning robot: It may act clumsily while exploring, but it should always be safe. We follow all the tried and true principles of robot safety, including risk assessments, physical controls, safety protocols and emergency stops. We also always implement multiple levels of safety such as force limitations and algorithmic protections to mitigate risky scenarios. PaLM-SayCan is constrained to commands that are safe for a robot to perform and was also developed to be highly interpretable, so we can clearly examine and learn from every decision the system makes.<\/p>\n<p data-block-key=\"1h6ep\">\n<\/div>\n<\/div>\n<div class=\"block-paragraph\">\n<div class=\"rich-text\">\n<h3 data-block-key=\"9g4gd\">Making sense of our worlds<\/h3>\n<\/div>\n<\/div>\n<div class=\"block-paragraph\">\n<div class=\"rich-text\">\n<p data-block-key=\"9g4gd\">Whether it\u2019s moving about busy offices \u2014 or understanding common sayings \u2014 we still have many mechanical and intelligence challenges to solve in robotics. So, for now, these robots are just getting better at grabbing snacks for Googlers in our micro-kitchens.<\/p>\n<\/div>\n<\/div>\n<div class=\"block-paragraph\">\n<div class=\"rich-text\">\n<p data-block-key=\"9g4gd\">But as we continue to uncover ways for robots to interact with our ever-changing world, we\u2019ve found that language and robotics show enormous potential for the helpful, human-centered robots of tomorrow.<\/p>\n<\/div>\n<\/div>\n<\/div>\n<p><a href=\"https:\/\/blog.google\/technology\/ai\/making-robots-more-helpful-with-language\/\">Go to Source<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Author: Even the simplest human tasks are unbelievably complex. The way we perceive and interact with the world requires a lifetime of accumulated experience and [&hellip;] <span class=\"read-more-link\"><a class=\"read-more\" href=\"https:\/\/www.aiproblog.com\/index.php\/2022\/08\/16\/making-robots-more-helpful-with-language\/\">Read More<\/a><\/span><\/p>\n","protected":false},"author":1,"featured_media":5834,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_bbp_topic_count":0,"_bbp_reply_count":0,"_bbp_total_topic_count":0,"_bbp_total_reply_count":0,"_bbp_voice_count":0,"_bbp_anonymous_reply_count":0,"_bbp_topic_count_hidden":0,"_bbp_reply_count_hidden":0,"_bbp_forum_subforum_count":0,"footnotes":""},"categories":[24],"tags":[],"_links":{"self":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/5833"}],"collection":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/comments?post=5833"}],"version-history":[{"count":0,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/5833\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media\/5834"}],"wp:attachment":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media?parent=5833"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/categories?post=5833"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/tags?post=5833"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}