{"id":2204,"date":"2019-05-29T18:00:01","date_gmt":"2019-05-29T18:00:01","guid":{"rendered":"https:\/\/www.aiproblog.com\/index.php\/2019\/05\/29\/teaching-language-models-grammar-really-does-make-them-smarter\/"},"modified":"2019-05-29T18:00:01","modified_gmt":"2019-05-29T18:00:01","slug":"teaching-language-models-grammar-really-does-make-them-smarter","status":"publish","type":"post","link":"https:\/\/www.aiproblog.com\/index.php\/2019\/05\/29\/teaching-language-models-grammar-really-does-make-them-smarter\/","title":{"rendered":"Teaching language models grammar really does make them smarter"},"content":{"rendered":"<p>Author: Kim Martineau | MIT Quest for Intelligence<\/p>\n<div>\n<p>Voice assistants like Siri and Alexa can tell the weather and crack a good joke, but any 8-year-old can carry on a better conversation.<\/p>\n<p>The deep learning models that power Siri and Alexa learn to understand our commands by picking out patterns in sequences of words and phrases. Their narrow, statistical understanding of language stands in sharp contrast to our own creative, spontaneous ways of speaking, a skill that starts\u00a0developing even before we are born,\u00a0while we&#8217;re\u00a0still in the womb.\u00a0<\/p>\n<p>To give computers some of our innate feel for language, researchers have started training deep learning models on the grammatical rules that most of us grasp intuitively, even if we never learned how to diagram a sentence in school. Grammatical constraints seem to help the models learn faster and perform better, but because neural networks reveal very little about their decision-making process, researchers have struggled to confirm that the gains are due to the grammar, and not the models\u2019 expert ability at finding patterns in sequences of words.\u00a0<\/p>\n<p>Now psycholinguists have stepped in to help. To peer inside the models, researchers have taken psycholinguistic tests originally developed to study human language understanding and adapted them to probe what neural networks know about language. In a pair of papers to be presented in June at the\u00a0<a href=\"https:\/\/naacl2019.org\/\">North American Chapter of the Association for Computational Linguistics<\/a>\u00a0conference, researchers from MIT, Harvard University, University of California, IBM Research, and Kyoto University have devised a set of tests to tease out the models\u2019 knowledge of specific grammatical rules. They find evidence that grammar-enriched deep learning models comprehend some fairly sophisticated rules, performing better than models trained on little-to-no grammar, and using a fraction of the data.<\/p>\n<p>\u201cGrammar helps the model behave\u00a0in more human-like ways,\u201d says\u00a0<a href=\"http:\/\/miguelballesteros.com\/index.html\">Miguel Ballesteros<\/a>, an IBM researcher with the\u00a0<a href=\"https:\/\/mitibmwatsonailab.mit.edu\/\">MIT-IBM Watson AI Lab<\/a>, and co-author of both studies. \u201cThe sequential models don\u2019t seem to care if you finish a sentence with a non-grammatical phrase. Why? Because they don\u2019t see that hierarchy.\u201d<\/p>\n<p>As a postdoc at Carnegie Mellon University, Ballesteros helped develop a method for training modern language models on sentence structure called\u00a0<a href=\"https:\/\/arxiv.org\/abs\/1602.07776\">recurrent neural network grammars<\/a>, or RNNGs. In the current research, he and his colleagues exposed the RNNG model, and similar models with little-to-no grammar training, to sentences with good, bad, or ambiguous syntax. When human subjects are asked to read sentences that sound grammatically off, their surprise is registered by longer response times. For computers, surprise is expressed in probabilities; when low-probability words appear in the place of high-probability words, researchers give the models a higher surprisal score.<\/p>\n<p>They found that the best-performing model \u2014\u00a0the grammar-enriched RNNG model\u00a0\u2014 showed greater surprisal when exposed to grammatical anomalies; for example, when the word \u201cthat\u201d improperly appears instead of \u201cwhat\u201d to introduce an embedded clause; \u201cI know what the lion devoured at sunrise\u201d is a perfectly natural sentence, but \u201cI know that the lion devoured at sunrise\u201d sounds like it has something missing \u2014 because it does.<\/p>\n<p>Linguists call this type of construction a dependency between a filler (a word like\u00a0who or\u00a0what) and a gap (the absence of a phrase where one is typically required). Even when more complicated constructions of this type are shown to grammar-enriched models, they \u2014 like native speakers of English \u2014 clearly know which ones are wrong.\u00a0<\/p>\n<p>For example, \u201cThe policeman who the criminal shot the politician with his gun shocked during the trial\u201d<em> <\/em>is anomalous; the gap corresponding to the filler \u201cwho\u201d should come after the\u00a0verb,\u00a0\u201cshot,\u201d<em>\u00a0<\/em>not\u00a0\u201cshocked.\u201d<em>\u00a0<\/em>Rewriting the sentence to change the position of the gap, as in \u201cThe policeman who the criminal shot with his gun shocked the jury during the trial,\u201d<em>\u00a0<\/em>is longwinded, but perfectly grammatical.<\/p>\n<p>\u201cWithout being trained on tens of millions of words, state-of-the-art sequential models don\u2019t care where the gaps are and aren\u2019t in sentences like those,\u201d says\u00a0<a href=\"http:\/\/www.mit.edu\/~rplevy\/\">Roger Levy<\/a>, a professor in MIT\u2019s\u00a0<a href=\"https:\/\/bcs.mit.edu\/\">Department of Brain and Cognitive Sciences<\/a>, and co-author of both studies. \u201cA human would find that really weird, and, apparently, so do grammar-enriched models.\u201d<\/p>\n<p>Bad grammar, of course, not only sounds weird, it can turn an entire sentence into gibberish, underscoring the importance of syntax in cognition, and to psycholinguists who study syntax to learn more about the brain\u2019s capacity for symbolic thought.\u201cGetting the structure right is important to understanding the meaning of the sentence and how to interpret it,\u201d says\u00a0<a href=\"https:\/\/bcs.mit.edu\/users\/pqianmitedu\">Peng Qian<\/a>, a graduate student at MIT and co-author of both studies.\u00a0<\/p>\n<p>The researchers plan to next run their experiments on larger datasets and find out if grammar-enriched models learn new words and phrases faster. Just as submitting neural networks to psychology tests is helping AI engineers understand and improve language models, psychologists hope to use this information to build better models of the brain.\u00a0<\/p>\n<p>\u201cSome component of our genetic endowment gives us this rich ability to speak,\u201d says\u00a0<a href=\"https:\/\/linguistics.fas.harvard.edu\/people\/ethan-wilcox\">Ethan Wilcox<\/a>, a graduate student at Harvard and co-author of both studies. \u201cThese are the sorts of methods that can produce insights into how we learn and understand language when our closest kin cannot.\u201d<\/p>\n<\/div>\n<p><a href=\"http:\/\/news.mit.edu\/2019\/teaching-language-models-grammar-makes-them-smarter-0529\">Go to Source<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Author: Kim Martineau | MIT Quest for Intelligence Voice assistants like Siri and Alexa can tell the weather and crack a good joke, but any [&hellip;] <span class=\"read-more-link\"><a class=\"read-more\" href=\"https:\/\/www.aiproblog.com\/index.php\/2019\/05\/29\/teaching-language-models-grammar-really-does-make-them-smarter\/\">Read More<\/a><\/span><\/p>\n","protected":false},"author":1,"featured_media":466,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_bbp_topic_count":0,"_bbp_reply_count":0,"_bbp_total_topic_count":0,"_bbp_total_reply_count":0,"_bbp_voice_count":0,"_bbp_anonymous_reply_count":0,"_bbp_topic_count_hidden":0,"_bbp_reply_count_hidden":0,"_bbp_forum_subforum_count":0,"footnotes":""},"categories":[24],"tags":[],"_links":{"self":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/2204"}],"collection":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/comments?post=2204"}],"version-history":[{"count":0,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/2204\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media\/470"}],"wp:attachment":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media?parent=2204"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/categories?post=2204"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/tags?post=2204"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}