{"id":2101,"date":"2019-05-06T20:00:00","date_gmt":"2019-05-06T20:00:00","guid":{"rendered":"https:\/\/www.aiproblog.com\/index.php\/2019\/05\/06\/smarter-training-of-neural-networks\/"},"modified":"2019-05-06T20:00:00","modified_gmt":"2019-05-06T20:00:00","slug":"smarter-training-of-neural-networks","status":"publish","type":"post","link":"https:\/\/www.aiproblog.com\/index.php\/2019\/05\/06\/smarter-training-of-neural-networks\/","title":{"rendered":"Smarter training of neural networks"},"content":{"rendered":"<p>Author: Adam Conner-Simons | CSAIL<\/p>\n<div>\n<p>These days, nearly all the artificial intelligence-based products in our lives rely on \u201cdeep neural networks\u201d that automatically learn to process labeled data.<\/p>\n<p>For most organizations and individuals, though, deep learning is tough to break into. To learn well, neural networks normally have to be quite large and need massive datasets. This training process usually requires multiple days of training and expensive graphics processing units (GPUs) \u2014 and sometimes even custom-designed hardware.<\/p>\n<p>But what if they don\u2019t actually have to be all that big, after all?<\/p>\n<p>In a new paper, researchers from MIT\u2019s Computer Science and Artificial Intelligence Lab (CSAIL) have shown that neural networks contain subnetworks that are up to one-tenth the size yet capable of being trained to make equally accurate predictions \u2014 and sometimes can learn to do so even faster than the originals.<\/p>\n<p>The team\u2019s approach isn\u2019t particularly efficient now \u2014 they must train and \u201cprune\u201d the full network several times before finding the successful subnetwork. However, MIT Assistant Professor Michael Carbin says that his team\u2019s findings suggest that, if we can determine precisely which part of the original network is relevant to the final prediction, scientists might one day be able to skip this expensive process altogether. Such a revelation has the potential to save hours of work and make it easier for meaningful models to be created by individual programmers, and not just huge tech companies.<\/p>\n<p>\u201cIf the initial network didn\u2019t have to be that big in the first place, why can\u2019t you just create one that\u2019s the right size at the beginning?\u201d says PhD student Jonathan Frankle, who presented his new paper co-authored with Carbin at the International Conference on Learning Representations (ICLR) in New Orleans. The project was named one of ICLR\u2019s two best papers, out of roughly 1,600 submissions.<br \/>\n\u00a0<br \/>\nThe team likens traditional deep learning methods to a lottery. Training large neural networks is kind of like trying to guarantee you will win the lottery by blindly buying every possible ticket. But what if we could select the winning numbers at the very start?<\/p>\n<p>\u201cWith a traditional neural network you randomly initialize this large structure, and after training it on a huge amount of data it magically works,\u201d Carbin says. &#8220;This large structure is like buying a big bag of tickets, even though there\u2019s only a small number of tickets that will actually make you rich. The remaining science is to figure how to identify the winning tickets without seeing the winning numbers first.&#8221;<\/p>\n<p>The team\u2019s work may also have implications for so-called \u201ctransfer learning,\u201d where networks trained for a task like image recognition are built upon to then help with a completely different task.<\/p>\n<p>Traditional transfer learning involves training a network and then adding one more layer on top that\u2019s trained for another task. In many cases, a network trained for one purpose is able to then extract some sort of general knowledge that can later be used for another purpose.<\/p>\n<p>For as much hype as neural networks have received, not much is often made of how hard it is to train them. Because they can be prohibitively expensive to train, data scientists have to make many concessions, weighing a series of trade-offs with respect to the size of the model, the amount of time it takes to train, and its final performance.<\/p>\n<p>To test their so-called &#8220;lottery ticket hypothesis&#8221; and demonstrate the existence of these smaller subnetworks, the team needed a way to find them. They began by using a common approach for eliminating unnecessary connections from trained networks to make them fit on low-power devices like smartphones: They &#8220;pruned&#8221; connections with the lowest &#8220;weights&#8221; (how much the network prioritizes that connection).<\/p>\n<p>Their key innovation was the idea that connections that were pruned after the network was trained might never have been necessary at all. To test this hypothesis, they tried training the exact same network again, but without the pruned connections. Importantly, they &#8220;reset&#8221; each connection to the weight it was assigned at the beginning of training. These initial weights are vital for helping a lottery ticket win: Without them, the pruned networks wouldn&#8217;t learn. By pruning more and more connections, they determined how much could be removed without harming the network&#8217;s ability to learn.<\/p>\n<p>To validate this hypothesis, they repeated this process tens of thousands of times on many different networks in a wide range of conditions.<\/p>\n<p>\u201cIt was surprising to see that resetting a well-performing network would often result in something better,\u201d says Carbin. \u201cThis suggests that whatever we were doing the first time around wasn\u2019t exactly optimal, and that there\u2019s room for improving how these models learn to improve themselves.\u201d<\/p>\n<p>As a next step, the team plans to explore why certain subnetworks are particularly adept at learning, and ways to efficiently find these subnetworks.<\/p>\n<p>\u201cUnderstanding the \u2018lottery ticket hypothesis\u2019 is likely to keep researchers busy for years to come,\u201d says Daniel Roy, an assistant professor of statistics at the University of Toronto, who was not involved in the paper. \u201cThe work may also have applications to network compression and optimization. Can we identify this subnetwork early in training, thus speeding up training? Whether these techniques can be used to build effective compression schemes deserves study.\u201d<\/p>\n<\/div>\n<p><a href=\"http:\/\/news.mit.edu\/2019\/smarter-training-neural-networks-0506\">Go to Source<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Author: Adam Conner-Simons | CSAIL These days, nearly all the artificial intelligence-based products in our lives rely on \u201cdeep neural networks\u201d that automatically learn to [&hellip;] <span class=\"read-more-link\"><a class=\"read-more\" href=\"https:\/\/www.aiproblog.com\/index.php\/2019\/05\/06\/smarter-training-of-neural-networks\/\">Read More<\/a><\/span><\/p>\n","protected":false},"author":1,"featured_media":474,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_bbp_topic_count":0,"_bbp_reply_count":0,"_bbp_total_topic_count":0,"_bbp_total_reply_count":0,"_bbp_voice_count":0,"_bbp_anonymous_reply_count":0,"_bbp_topic_count_hidden":0,"_bbp_reply_count_hidden":0,"_bbp_forum_subforum_count":0,"footnotes":""},"categories":[24],"tags":[],"_links":{"self":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/2101"}],"collection":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/comments?post=2101"}],"version-history":[{"count":0,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/2101\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media\/457"}],"wp:attachment":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media?parent=2101"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/categories?post=2101"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/tags?post=2101"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}