{"id":4381,"date":"2021-02-10T05:00:00","date_gmt":"2021-02-10T05:00:00","guid":{"rendered":"https:\/\/www.aiproblog.com\/index.php\/2021\/02\/10\/a-language-learning-system-that-pays-attention-more-efficiently-than-ever-before\/"},"modified":"2021-02-10T05:00:00","modified_gmt":"2021-02-10T05:00:00","slug":"a-language-learning-system-that-pays-attention-more-efficiently-than-ever-before","status":"publish","type":"post","link":"https:\/\/www.aiproblog.com\/index.php\/2021\/02\/10\/a-language-learning-system-that-pays-attention-more-efficiently-than-ever-before\/","title":{"rendered":"A language learning system that pays attention \u2014 more efficiently than ever before"},"content":{"rendered":"<p>Author: Daniel Ackerman | MIT News Office<\/p>\n<div>\n<p>Human language can be inefficient. Some words are vital. Others, expendable.<\/p>\n<\/p>\n<p>Reread the first sentence of this story. Just two words, \u201clanguage\u201d and \u201cinefficient,\u201d convey almost the entire meaning of the sentence. The importance of key words underlies a popular new tool for natural language processing (NLP) by computers: the attention mechanism. When coded into a broader NLP algorithm, the attention mechanism homes in on key words rather than treating every word with equal importance. That yields better results in NLP tasks like detecting positive or negative sentiment or predicting which words should come next in a sentence.<\/p>\n<\/p>\n<p>The attention mechanism\u2019s accuracy often comes at the expense of speed and computing power, however. It runs slowly on general-purpose processors like you might find in consumer-grade computers. So, MIT researchers have designed a combined software-hardware system, dubbed SpAtten, specialized to run the attention mechanism. SpAtten enables more streamlined NLP with less computing power.<\/p>\n<\/p>\n<p>\u201cOur system is similar to how the human brain processes language,\u201d says Hanrui Wang. \u201cWe read very fast and just focus on key words. That\u2019s the idea with SpAtten.\u201d<\/p>\n<\/p>\n<p>The research will be presented this month at the IEEE International Symposium on High-Performance Computer Architecture. Wang is the paper\u2019s lead author and a PhD student in the Department of Electrical Engineering and Computer Science. Co-authors include Zhekai Zhang and their advisor, Assistant Professor Song Han.<\/p>\n<\/p>\n<p>Since its introduction in 2015, the attention mechanism has been a boon for NLP. It\u2019s built into state-of-the-art NLP models like Google\u2019s BERT and OpenAI\u2019s GPT-3. The attention mechanism\u2019s key innovation is selectivity \u2014 it can infer which words or phrases in a sentence are most important, based on comparisons with word patterns the algorithm has previously encountered in a training phase. Despite the attention mechanism\u2019s rapid adoption into NLP models, it\u2019s not without cost.<\/p>\n<\/p>\n<p>NLP models require a hefty load of computer power, thanks in part to the high memory demands of the attention mechanism. \u201cThis part is actually the bottleneck for NLP models,\u201d says Wang. One challenge he points to is the lack of specialized hardware to run NLP models with the attention mechanism. General-purpose processors, like CPUs and GPUs, have trouble with the attention mechanism\u2019s complicated sequence of data movement and arithmetic. And the problem will get worse as NLP models grow more complex, especially for long sentences. \u201cWe need algorithmic optimizations and dedicated hardware to process the ever-increasing computational demand,\u201d says Wang.<\/p>\n<\/p>\n<p>The researchers developed a system called SpAtten to run the attention mechanism more efficiently. Their design encompasses both specialized software and hardware. One key software advance is SpAtten\u2019s use of \u201ccascade pruning,\u201d or eliminating unnecessary data from the calculations. Once the attention mechanism helps pick a sentence\u2019s key words (called tokens), SpAtten prunes away unimportant tokens and eliminates the corresponding computations and data movements. The attention mechanism also includes multiple computation branches (called heads). Similar to tokens, the unimportant heads are identified and pruned away. Once dispatched, the extraneous tokens and heads don\u2019t factor into the algorithm\u2019s downstream calculations, reducing both computational load and memory access.<\/p>\n<\/p>\n<p>To further trim memory use, the researchers also developed a technique called \u201cprogressive quantization.\u201d The method allows the algorithm to wield data in smaller bitwidth chunks and fetch as few as possible from memory. Lower data precision, corresponding to smaller bitwidth, is used for simple sentences, and higher precision is used for complicated ones. Intuitively it\u2019s like fetching the phrase \u201ccmptr progm\u201d as the low-precision version of \u201ccomputer program.\u201d<\/p>\n<\/p>\n<p>Alongside these software advances, the researchers also developed a hardware architecture specialized to run SpAtten and the attention mechanism while minimizing memory access. Their architecture design employs a high degree of \u201cparallelism,\u201d meaning multiple operations are processed simultaneously on multiple processing elements, which is useful because the attention mechanism analyzes every word of a sentence at once. The design enables SpAtten to rank the importance of tokens and heads (for potential pruning) in a small number of computer clock cycles. Overall, the software and hardware components of SpAtten combine to eliminate unnecessary or inefficient data manipulation, focusing only on the tasks needed to complete the user\u2019s goal.<\/p>\n<\/p>\n<p>The philosophy behind the system is captured in its name. SpAtten is a portmanteau of \u201csparse attention,\u201d and the researchers note in the paper that SpAtten is \u201chomophonic with \u2018spartan,\u2019 meaning simple and frugal.\u201d Wang says, \u201cthat\u2019s just like our technique here: making the sentence more concise.\u201d That concision was borne out in testing.<\/p>\n<\/p>\n<p>The researchers coded a simulation of SpAtten\u2019s hardware design \u2014 they haven\u2019t fabricated a physical chip yet \u2014 and tested it against competing general-purposes processors. SpAtten ran more than 100 times faster than the next best competitor (a TITAN Xp GPU). Further, SpAtten was more than 1,000 times more energy efficient than competitors, indicating that SpAtten could help trim NLP\u2019s substantial electricity demands.<\/p>\n<\/p>\n<p>The researchers also integrated SpAtten into their previous work, to help validate their philosophy that hardware and software are best designed in tandem. They built a specialized NLP model architecture for SpAtten, using their Hardware-Aware Transformer (HAT) framework, and achieved a roughly two times speedup over a more general model.<\/p>\n<\/p>\n<p>The researchers think SpAtten could be useful to companies that employ NLP models for the majority of their artificial intelligence workloads. \u201cOur vision for the future is that new algorithms and hardware that remove the redundancy in languages will reduce cost and save on the power budget for data center NLP workloads\u201d says Wang.<\/p>\n<\/p>\n<p>On the opposite end of the spectrum, SpAtten could bring NLP to smaller, personal devices. \u201cWe can improve the battery life for mobile phone or IoT devices,\u201d says Wang, referring to internet-connected \u201cthings\u201d \u2014 televisions, smart speakers, and the like. \u201cThat\u2019s especially important because in the future, numerous IoT devices will interact with humans by voice and natural language, so NLP will be the first application we want to employ.\u201d<\/p>\n<\/p>\n<p>Han says SpAtten\u2019s focus on efficiency and redundancy removal is the way forward in NLP research. \u201cHuman brains are sparsely activated [by key words]. NLP models that are sparsely activated will be promising in the future,\u201d he says. \u201cNot all words are equal \u2014 pay attention only to the important ones.\u201d<\/p>\n<\/div>\n<p><a href=\"https:\/\/news.mit.edu\/2021\/language-learning-efficiency-0210\">Go to Source<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Author: Daniel Ackerman | MIT News Office Human language can be inefficient. Some words are vital. Others, expendable. Reread the first sentence of this story. [&hellip;] <span class=\"read-more-link\"><a class=\"read-more\" href=\"https:\/\/www.aiproblog.com\/index.php\/2021\/02\/10\/a-language-learning-system-that-pays-attention-more-efficiently-than-ever-before\/\">Read More<\/a><\/span><\/p>\n","protected":false},"author":1,"featured_media":473,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_bbp_topic_count":0,"_bbp_reply_count":0,"_bbp_total_topic_count":0,"_bbp_total_reply_count":0,"_bbp_voice_count":0,"_bbp_anonymous_reply_count":0,"_bbp_topic_count_hidden":0,"_bbp_reply_count_hidden":0,"_bbp_forum_subforum_count":0,"footnotes":""},"categories":[24],"tags":[],"_links":{"self":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/4381"}],"collection":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/comments?post=4381"}],"version-history":[{"count":0,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/4381\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media\/461"}],"wp:attachment":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media?parent=4381"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/categories?post=4381"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/tags?post=4381"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}