{"id":977,"date":"2018-08-28T06:51:31","date_gmt":"2018-08-28T06:51:31","guid":{"rendered":"https:\/\/www.aiproblog.com\/index.php\/2018\/08\/28\/what-comes-after-deep-learning\/"},"modified":"2018-08-28T06:51:31","modified_gmt":"2018-08-28T06:51:31","slug":"what-comes-after-deep-learning","status":"publish","type":"post","link":"https:\/\/www.aiproblog.com\/index.php\/2018\/08\/28\/what-comes-after-deep-learning\/","title":{"rendered":"What Comes After Deep Learning"},"content":{"rendered":"<p>Author: William Vorhies<\/p>\n<div>\n<p><strong><em>Summary:<\/em><\/strong> <em>We\u2019re stuck.\u00a0 There hasn\u2019t been a major breakthrough in algorithms in the last year.\u00a0 Here\u2019s a survey of the leading contenders for that next major advancement.<\/em><\/p>\n<p>\u00a0<\/p>\n<p><a href=\"http:\/\/api.ning.com\/files\/TWPQV8Q8N57gijvqpyRjZH1Zcv6J*AhIrj86GY9*SzGiioW3zQO1HUqBnaCFa6ghFEgnKeLFCDlFev*-cIlkE61Wke0G2-PX\/stuckinarut.jpg\" target=\"_self\"><img decoding=\"async\" src=\"http:\/\/api.ning.com\/files\/TWPQV8Q8N57gijvqpyRjZH1Zcv6J*AhIrj86GY9*SzGiioW3zQO1HUqBnaCFa6ghFEgnKeLFCDlFev*-cIlkE61Wke0G2-PX\/stuckinarut.jpg?width=275\" width=\"275\" class=\"align-right\"><\/a>We\u2019re stuck.\u00a0 Or at least we\u2019re plateaued.\u00a0 Can anyone remember the last time a year went by without a major notable advance in algorithms, chips, or data handling?\u00a0 It was so unusual to go to the Strata San Jose conference a few weeks ago and see no new eye catching developments.<\/p>\n<p>As I <a href=\"https:\/\/www.datasciencecentral.com\/profiles\/blogs\/strata-what-a-difference-a-year-makes\"><em><u>reported earlier<\/u><\/em><\/a>, it seems we\u2019ve hit maturity and now our major efforts are aimed at either making sure all our powerful new techniques work well together (converged platforms) or making a buck from those massive VC investments in same.<\/p>\n<p>I\u2019m not the only one who noticed.\u00a0 Several attendees and exhibitors said very similar things to me.\u00a0 And just the other day I had a note from a team of well-regarded researchers who had been evaluating the relative merits of different advanced analytic platforms, and concluding there weren\u2019t any differences worth reporting.<\/p>\n<p>\u00a0<\/p>\n<p><span style=\"font-size: 12pt;\"><strong>Why and Where are We Stuck?<\/strong><\/span><\/p>\n<p>Where we are right now is actually not such a bad place.\u00a0 Our advances over the last two or three years have all been in the realm of deep learning and reinforcement learning.\u00a0 Deep learning has brought us terrific capabilities in processing speech, text, image, and video.\u00a0 Add reinforcement learning and we get big advances in game play, autonomous vehicles, robotics and the like.<\/p>\n<p>We\u2019re in the earliest stages of a commercial explosion based on these like the huge savings from customer interactions through chatbots; new personal convenience apps like personal assistants and Alexa, and level 2 automation in our personal cars like adaptive cruise control, accident avoidance braking, and lane maintenance.<\/p>\n<p>Tensorflow, Keras, and the other deep learning platforms are more accessible than ever, and thanks to GPUs, more efficient than ever.<\/p>\n<p>However, the known list of deficiencies hasn\u2019t moved at all.<\/p>\n<ol>\n<li>The need for too much labeled training data.<\/li>\n<li>Models that take either too long or too many expensive resources to train and that still may fail to train at all.<\/li>\n<li>Hyperparameters especially around nodes and layers that are still mysterious. Automation or even well accepted rules of thumb are still out of reach.<\/li>\n<li>Transfer learning that means only going from the complex to the simple, not from one logical system to another.<\/li>\n<\/ol>\n<p>I\u2019m sure we could make a longer list.\u00a0 It\u2019s in solving these major shortcomings where we\u2019ve become stuck.<\/p>\n<p>\u00a0<\/p>\n<p><span style=\"font-size: 12pt;\"><strong>What\u2019s Stopping Us<\/strong><\/span><\/p>\n<p>In the case of deep neural nets the conventional wisdom right now is that if we just keep pushing, just keep investing, then these shortfalls will be overcome.\u00a0 For example, from the 80\u2019s through the 00\u2019s we knew how to make DNNs work, we just didn\u2019t have the hardware.\u00a0 Once that caught up then DNNs combined with the new open source ethos broke open this new field.<\/p>\n<p>All types of research have their own momentum.\u00a0 Especially once you\u2019ve invested huge amounts of time and money in a particular direction you keep heading in that direction.\u00a0 If you\u2019ve invested years in developing expertise in these skills you\u2019re not inclined to jump ship.\u00a0<\/p>\n<p>\u00a0<\/p>\n<p><span style=\"font-size: 12pt;\"><strong>Change Direction Even If You\u2019re Not Entirely Sure What Direction that Should Be<\/strong><\/span><\/p>\n<p><a href=\"http:\/\/api.ning.com\/files\/TWPQV8Q8N54-h7Ewd2MGYgb6sahfs9c4UxnVfJaTlbh3IzxFtsCEBjKsleCi82D5oE1zLjeME66eoDwTtZL0PcJw6teLNoWj\/changedirection.jpg\" target=\"_self\"><img decoding=\"async\" src=\"http:\/\/api.ning.com\/files\/TWPQV8Q8N54-h7Ewd2MGYgb6sahfs9c4UxnVfJaTlbh3IzxFtsCEBjKsleCi82D5oE1zLjeME66eoDwTtZL0PcJw6teLNoWj\/changedirection.jpg?width=275\" width=\"275\" class=\"align-right\"><\/a>Sometimes we need to change direction, even if we don\u2019t know exactly what that new direction might be.\u00a0 Recently leading Canadian and US AI researchers did just that.\u00a0 They decided they were misdirected and needed to essentially start over.<\/p>\n<p>This insight was verbalized last fall by Geoffrey Hinton who gets much of the credit for starting the DNN thrust in the late 80s.\u00a0 Hinton, who is now a professor emeritus at the University of Toronto and a Google researcher, said he is now <a href=\"https:\/\/www.axios.com\/artificial-intelligence-pioneer-says-we-need-to-start-over-1513305524-f619efbd-9db0-4947-a9b2-7a4c310a28fe.html\">&#8220;<em><u>deeply suspicious<\/u><\/em>&#8220;<\/a> of back propagation, the core method that underlies DNNs.\u00a0 Observing that the human brain doesn\u2019t need all that labeled data to reach a conclusion, Hinton says &#8220;My view is throw it all away and start again&#8221;.<\/p>\n<p>So with this in mind, here\u2019s a short survey of new directions that fall somewhere between solid probabilities and moon shots, but are not incremental improvements to deep neural nets as we know them.<\/p>\n<p>These descriptions are intentionally short and will undoubtedly lead you to further reading to fully understand them.<\/p>\n<p>\u00a0<\/p>\n<p><span style=\"font-size: 12pt;\"><strong>Things that Look Like DNNs but are Not<\/strong><\/span><\/p>\n<p>There is a line of research closely hewing to Hinton\u2019s shot at back propagation that believes that the fundamental structure of nodes and layers is useful but the methods of connection and calculation need to be dramatically revised.<\/p>\n<p>\u00a0<\/p>\n<p><span style=\"font-size: 12pt;\"><strong>Capsule Networks (CapsNet)<\/strong><\/span><\/p>\n<p><a href=\"http:\/\/api.ning.com\/files\/TWPQV8Q8N55YQV5Gny8Yb8Ghom0Kfyhw93x2MzuDlcyRCx2jKWRqvJSdnzr-ULB-pXJrcf3bDCSMVifYy2xZu8*hitXgAm8n\/capsulenetworks.png\" target=\"_self\"><img decoding=\"async\" src=\"http:\/\/api.ning.com\/files\/TWPQV8Q8N55YQV5Gny8Yb8Ghom0Kfyhw93x2MzuDlcyRCx2jKWRqvJSdnzr-ULB-pXJrcf3bDCSMVifYy2xZu8*hitXgAm8n\/capsulenetworks.png?width=275\" width=\"275\" class=\"align-right\"><\/a>It\u2019s only fitting that we start with Hinton\u2019s own current new direction in research, CapsNet.\u00a0 This relates to image classification with CNNs and the problem, simply stated, is that CNNs are insensitive to the pose of the object.\u00a0 That is, if the same object is to be recognized with differences in position, size, orientation, deformation, velocity, albedo, hue, texture etc. then training data must be added for each of these cases.<\/p>\n<p>In CNNs this is handled with massive increases in training data and\/or increases in max pooling layers that can generalize, but only by losing actual information.<\/p>\n<p>The <a href=\"https:\/\/hackernoon.com\/what-is-a-capsnet-or-capsule-network-2bfbe48769cc\"><em><u>following description<\/u><\/em><\/a> comes from one of many good technical descriptions of CapsNets, this one from Hackernoon.<\/p>\n<p><em>Capsule is a nested set of neural layers. So in a regular neural network you keep on adding more layers. In CapsNet you would add more layers inside a single layer. Or in other words nest a neural layer inside another. The state of the neurons inside a capsule capture the above properties of one entity inside an image. A capsule outputs a vector to represent the existence of the entity. The orientation of the vector represents the properties of the entity. The vector is sent to all possible parents in the neural network. Prediction vector is calculated based on multiplying its own weight and a weight matrix. Whichever parent has the largest scalar prediction vector product, increases the capsule bond. Rest of the parents decrease their bond. This routing by agreement method is superior to the current mechanism like max-pooling.<\/em><\/p>\n<p>CapsNet dramatically reduces the required training set and shows superior performance in image classification in early tests.<\/p>\n<p>\u00a0<\/p>\n<p><span style=\"font-size: 12pt;\"><strong>gcForest<\/strong><\/span><\/p>\n<p>In February we featured research by Zhi-Hua Zhou and Ji Feng of the National Key Lab for Novel Software Technology, Nanjing University, displaying a technique they call <strong>gcForest<\/strong>.\u00a0 Their research paper shows that gcForest regularly beats CNNs and RNNs at both text and image classification.\u00a0 The benefits are quite significant.<\/p>\n<ul>\n<li>Requires only a fraction of the training data.<\/li>\n<li>Runs on your desktop CPU device without need for GPUs.<\/li>\n<li>Trains just as rapidly and in many cases even more rapidly and lends itself to distributed processing.<\/li>\n<li>Has far fewer hyperparameters and performs well on the default settings.<\/li>\n<li>Relies on easily understood random forests instead of completely opaque deep neural nets.<\/li>\n<\/ul>\n<p>In brief, gcForest (multi-Grained Cascade Forest) is a decision tree ensemble approach in which the cascade structure of deep nets is retained but where the opaque edges and node neurons are replaced by groups of random forests paired with completely-random tree forests.\u00a0 <a href=\"https:\/\/www.datasciencecentral.com\/profiles\/blogs\/off-the-beaten-path-using-deep-forests-to-outperform-cnns-and-rnn\"><em><u>Read more about gcForest<\/u><\/em><\/a> in our original article.<\/p>\n<p>\u00a0<\/p>\n<p><span style=\"font-size: 12pt;\"><strong>Pyro and Edward<\/strong><\/span><\/p>\n<p><strong>Pyro and Edward<\/strong> are two new programming languages that merge deep learning frameworks with probabilistic programming.\u00a0 Pyro is the work of Uber and Google, while Edward comes out of Columbia University with funding from DARPA.\u00a0 The result is a framework that allows deep learning systems to measure their confidence in a prediction or decision.<\/p>\n<p>In classic predictive analytics we might approach this by using log loss as the fitness function, penalizing confident but wrong predictions (false positives).\u00a0 So far there\u2019s been no corollary for deep learning.<\/p>\n<p>Where this promises to be of use for example is in self-driving cars or aircraft allowing the control to have some sense of confidence or doubt before making a critical or fatal catastrophic decision.\u00a0 That\u2019s certainly something you\u2019d like your autonomous Uber to know before you get on board.<\/p>\n<p>Both Pyro and Edward are in the early stages of development.<\/p>\n<p>\u00a0<\/p>\n<p><span style=\"font-size: 12pt;\"><strong>Approaches that Don\u2019t Look Like Deep Nets<\/strong><\/span><\/p>\n<p>I regularly run across small companies who have very unusual algorithms at the core of their platforms.\u00a0 In most of the cases that I\u2019ve pursued they\u2019ve been unwilling to provide sufficient detail to allow me to even describe for you what\u2019s going on in there.\u00a0 This secrecy doesn\u2019t invalidate their utility but until they provide some benchmarking and some detail, I can\u2019t really tell you what\u2019s going on inside.\u00a0 Think of these as our bench for the future when they do finally lift the veil.<\/p>\n<p>For now, the most advanced non-DNN algorithm and platform I\u2019ve investigated is this:<\/p>\n<p>\u00a0<\/p>\n<p><span style=\"font-size: 12pt;\"><strong>Hierarchical Temporal Memory (HTM)<\/strong><\/span><\/p>\n<p><a href=\"http:\/\/api.ning.com\/files\/TWPQV8Q8N54xVWjFJQgAHv4vz6EV30GfLb6EcdZ2-XjwM811kU2zkyOxjXePL8gU84Cm2bgUwhNMPrlhXC9FgCm13NGDyKax\/HTMneuron.png\" target=\"_self\"><img decoding=\"async\" src=\"http:\/\/api.ning.com\/files\/TWPQV8Q8N54xVWjFJQgAHv4vz6EV30GfLb6EcdZ2-XjwM811kU2zkyOxjXePL8gU84Cm2bgUwhNMPrlhXC9FgCm13NGDyKax\/HTMneuron.png?width=275\" width=\"275\" class=\"align-right\"><\/a>Hierarchical Temporal Memory (HTM) uses Sparse Distributed Representation (SDR) to model the neurons in the brain and to perform calculations that outperforms CNNs and RNNs at scalar predictions (future values of things like commodity, energy, or stock prices) and at anomaly detection.<\/p>\n<p>This is the devotional work of Jeff Hawkins of Palm Pilot fame in his company Numenta.\u00a0 Hawkins has pursued a strong AI model based on fundamental research into brain function that is not structured with layers and nodes as in DNNs.<\/p>\n<p>HTM has the characteristic that it discovers patterns very rapidly, with as few as on the order of 1,000 observations.\u00a0 This compares with the hundreds of thousands or millions of observations necessary to train CNNs or RNNs.<\/p>\n<p>Also the pattern recognition is unsupervised and can recognize and generalize about changes in the pattern based on changing inputs as soon as they occur.\u00a0 This results in a system that not only trains remarkably quickly but also is self-learning, adaptive, and not confused by changes in the data or by noise.<\/p>\n<p>We featured HTM and Numenta in our February article and we recommend you <a href=\"https:\/\/www.datasciencecentral.com\/profiles\/blogs\/off-the-beaten-path-htm-based-strong-ai-beats-rnns-and-cnns-at-pr\"><em><u>read more about it there<\/u><\/em><\/a>.<\/p>\n<p>\u00a0<\/p>\n<p><span style=\"font-size: 12pt;\"><strong>Some Incremental Improvements of Note<\/strong><\/span><\/p>\n<p>We set out to focus on true game changers but there are at least two examples of incremental improvement that are worthy of mention.\u00a0 These are clearly still classical CNNs and RNNs with elements of back prop but they work better.<\/p>\n<p>\u00a0<\/p>\n<p><span style=\"font-size: 12pt;\"><strong>Network Pruning with Google Cloud AutoML<\/strong><\/span><\/p>\n<p>Google and Nvidia researchers use a process called network pruning to make a neural network smaller and more efficient to run by removing the neurons that do not contribute directly to output. This advancement was rolled out recently as a major improvement in the performance of Google\u2019s new AutoML platform.<\/p>\n<p>\u00a0<\/p>\n<p><span style=\"font-size: 12pt;\"><strong>Transformer<\/strong><\/span><\/p>\n<p><strong>Transformer<\/strong> is a novel approach useful initially in language processing such as language-to-language translations which has been the domain of CNNs, RNNs and LSTMs.\u00a0 Released late last summer by researchers at Google Brain and the University of Toronto, it has demonstrated significant accuracy improvements in a variety of test including this English\/German translation test.\u00a0<a href=\"http:\/\/api.ning.com\/files\/TWPQV8Q8N57XbaPI6SsBrWU2yV6v09PbxJ0NMsNzQji9yl1x5L0mxmepUg10hx*VF1hNWAKtEsSxur7B9tfnAYufrCU9zvjj\/Transformer.png\" target=\"_self\"><img decoding=\"async\" src=\"http:\/\/api.ning.com\/files\/TWPQV8Q8N57XbaPI6SsBrWU2yV6v09PbxJ0NMsNzQji9yl1x5L0mxmepUg10hx*VF1hNWAKtEsSxur7B9tfnAYufrCU9zvjj\/Transformer.png?width=400\" width=\"400\" class=\"align-center\"><\/a><\/p>\n<p>The sequential nature of RNNs makes it more difficult to fully take advantage of modern fast computing devices such as GPUs, which excel at parallel and not sequential processing.\u00a0 CNNs are much less sequential than RNNs, but in CNN architectures the number of steps required to combine information from distant parts of the input still grows with increasing distance.<\/p>\n<p>The accuracy breakthrough comes from the development of a \u2018self-attention function\u2019 that significantly reduces steps to a small, constant number of steps. In each step, it applies a self-attention mechanism which directly models relationships between all words in a sentence, regardless of their respective position.<\/p>\n<p><a href=\"https:\/\/papers.nips.cc\/paper\/7181-attention-is-all-you-need.pdf\"><em><u>Read the original<\/u><\/em><\/a> research paper here.<\/p>\n<p>\u00a0<\/p>\n<p><span style=\"font-size: 12pt;\"><strong>A Closing Thought<\/strong><\/span><\/p>\n<p>If you haven\u2019t thought about it, you should be concerned at the massive investment China is making in AI and its stated goal to overtake the US as the AI leader within a very few years.\u00a0<\/p>\n<p>In an <a href=\"https:\/\/www.axios.com\/chinese-ai-isnt-beating-the-us-yet-0cf27b7d-fe89-48e6-a5da-a7a5a3a1b84d.html\"><em><u>article by Steve LeVine<\/u><\/em><\/a> who is Future Editor at Axios and teaches at Georgetown University he makes the case that China may be a fast follower but will probably never catch up.\u00a0 The reason, because US and Canadian researchers are free to pivot and start over anytime they wish.\u00a0 The institutionally guided Chinese could never do that.\u00a0 This quote from LeVine\u2019s article:<\/p>\n<p>&#8220;In China, that would be unthinkable,&#8221; said Manny Medina, CEO at Outreach.io in Seattle. \u00a0AI stars like Facebook&#8217;s Yann LeCun and the Vector Institute&#8217;s Geoff Hinton in Toronto, he said, &#8220;don&#8217;t have to ask permission. They can start research and move the ball forward.&#8221;<\/p>\n<p>As the VCs say, maybe it\u2019s time to pivot.<\/p>\n<p>\u00a0<\/p>\n<p>About the author:\u00a0 Bill Vorhies is Editorial Director for Data Science Central and has practiced as a data scientist since 2001.\u00a0 He can be reached at:<\/p>\n<p><a href=\"mailto:Bill@DataScienceCentral.com\">Bill@DataScienceCentral.com<\/a><\/p>\n<p><span>\u00a0<\/span><\/p>\n<\/div>\n<p><a href=\"https:\/\/www.datasciencecentral.com\/xn\/detail\/6448529:BlogPost:705697\">Go to Source<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Author: William Vorhies Summary: We\u2019re stuck.\u00a0 There hasn\u2019t been a major breakthrough in algorithms in the last year.\u00a0 Here\u2019s a survey of the leading contenders [&hellip;] <span class=\"read-more-link\"><a class=\"read-more\" href=\"https:\/\/www.aiproblog.com\/index.php\/2018\/08\/28\/what-comes-after-deep-learning\/\">Read More<\/a><\/span><\/p>\n","protected":false},"author":1,"featured_media":473,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_bbp_topic_count":0,"_bbp_reply_count":0,"_bbp_total_topic_count":0,"_bbp_total_reply_count":0,"_bbp_voice_count":0,"_bbp_anonymous_reply_count":0,"_bbp_topic_count_hidden":0,"_bbp_reply_count_hidden":0,"_bbp_forum_subforum_count":0,"footnotes":""},"categories":[26],"tags":[],"_links":{"self":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/977"}],"collection":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/comments?post=977"}],"version-history":[{"count":0,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/977\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media\/470"}],"wp:attachment":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media?parent=977"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/categories?post=977"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/tags?post=977"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}