{"id":1858,"date":"2019-03-12T06:33:59","date_gmt":"2019-03-12T06:33:59","guid":{"rendered":"https:\/\/www.aiproblog.com\/index.php\/2019\/03\/12\/the-coming-revolution-in-recurrent-neural-nets-rnns\/"},"modified":"2019-03-12T06:33:59","modified_gmt":"2019-03-12T06:33:59","slug":"the-coming-revolution-in-recurrent-neural-nets-rnns","status":"publish","type":"post","link":"https:\/\/www.aiproblog.com\/index.php\/2019\/03\/12\/the-coming-revolution-in-recurrent-neural-nets-rnns\/","title":{"rendered":"The Coming Revolution in Recurrent Neural Nets (RNNs)"},"content":{"rendered":"<p>Author: William Vorhies<\/p>\n<div>\n<p><strong><em>Summary:<\/em><\/strong> <em>Recurrent Neural Nets (RNNs) are at the core of the most common AI applications in use today but we are rapidly recognizing broad time series problem types where they don\u2019t fit well.\u00a0 Several alternatives are already in use and one that\u2019s just been introduced, ODE net is a radical departure from our way of thinking about the solution.<\/em><\/p>\n<p>\u00a0<\/p>\n<p><a href=\"https:\/\/storage.ning.com\/topology\/rest\/1.0\/file\/get\/1373084380?profile=original\" target=\"_blank\" rel=\"noopener noreferrer\"><img decoding=\"async\" src=\"https:\/\/storage.ning.com\/topology\/rest\/1.0\/file\/get\/1373084380?profile=RESIZE_710x\" width=\"350\" class=\"align-right\"><\/a>Recurrent Neural Nets (RNNs) and their cousins LSTMs are at the very core of the most common applications of AI, natural language processing (NLP).\u00a0 There are far more real world applications of RNN-NLP than any other form of AI, including image recognition and processing with Convolutional Neural Nets (CNNs).<\/p>\n<p>In a sense, the army of data scientists has split off into two groups, each pursuing the separate applications that might be developed from these two techniques.\u00a0 In application there is essentially no overlap since image processing is about processing data that is static (even if only for a second) while RNN-NLP has always interpreted speech and text as time series data.<\/p>\n<p>It turns out though that while RNN\/LSTMs remain the go-to technique for most NLP, the more we try to expand time series applications the more trouble we run into.\u00a0 What\u2019s on the horizon may not be so much a modification of RNNs but perhaps a hard fork to several other innovative new AI methods.<\/p>\n<p>\u00a0<\/p>\n<p><span style=\"font-size: 12pt;\"><strong>The First Fork<\/strong><\/span><\/p>\n<p>The first fork that we wrote about last year is combining <a href=\"https:\/\/www.datasciencecentral.com\/profiles\/blogs\/combining-cnns-and-rnns-crazy-or-genius\"><em><u>CNNs and RNNs in a single neural network<\/u><\/em><\/a>.\u00a0 The problem to be solved related to images that occurred in time series, that is video, and the most common tasks for this odd mashup is video scene labeling.\u00a0 Turns out this technique is also good for recognizing and labeling emotion in a video and for some types of person recognition based on having seen that person in a video before.<\/p>\n<p>\u00a0<\/p>\n<p><span style=\"font-size: 12pt;\"><strong>RNN Not So Good for Massive Parallel Processing (MPP)<\/strong><\/span><\/p>\n<p>Also last year, both Google and Facebook addressed a second type of problem with RNNs.\u00a0 That is, because the data to be analyzed extends over several layers in the DNN you have to wait for all those layers to complete before calculating.\u00a0 That also means that MPP isn\u2019t really feasible.\u00a0 Yes this all still happens very fast but not fast enough for the real time language translation apps to avoid noticeable latency.<\/p>\n<p>This second fork caused both these leaders to abandon RNNs for real time translation and adopt a variant of CNNs they labeled <a href=\"https:\/\/www.datasciencecentral.com\/profiles\/blogs\/temporal-convolutional-nets-tcns-take-over-from-rnns-for-nlp-pred\"><em><u>Temporal Convolutional Neural Nets (TCNs)<\/u><\/em>.<\/a>\u00a0 This looks a lot like a CNN with the addition of an \u2018Attention\u2019 function.\u00a0 Because they\u2019re structured as CNNs they can be easily adapted to MPP so latency disappears.\u00a0 If you\u2019d like to dig into that follow the hyperlink.<\/p>\n<p>\u00a0<\/p>\n<p><span style=\"font-size: 12pt;\"><strong>The Third Fork Problem \u2013 Irregular Time Series<\/strong><\/span><\/p>\n<p>There are several other classes of time series problems that are not well addressed by RNNs.\u00a0 Mostly these are characterized by systems having continuous values (think economic or financial variables used to forecast stock prices or physically analog systems), and those in which you want to combine time series data with different frequencies, durations, and start points.<\/p>\n<p>If this last one seems mysterious it shouldn\u2019t be.\u00a0 That describes what your medical history looks like as you visit different doctors, have appointments at different intervals, begin or stop medications at different dosages and intervals, have different physical responses (output variables) to these inputs, and simply grow older or stronger or better or worse in some measurable way.<\/p>\n<p>This is at the core of why the vast majority healthcare applications of AI have been in image recognition.\u00a0 Because our ability to use AI with irregular time series is really deficient in its ability to predict an outcome based on these irregular separate data series.<\/p>\n<p>One solution might be to divide up your parallel medical records into discrete steps of weeks or days or even hours (adding layers to the DNN to increase granularity).\u00a0 In theory this would adapt to the discretization required by RNNs.\u00a0 But you begin to see the problem.\u00a0 To gain maximum benefit you would have to use very fine time buckets which would increase computation cost and complexity and rapidly reach a point of impossibility.\u00a0 Then there\u2019s the issue that many of these time buckets would contain no data.<\/p>\n<p>So both the forecasting community and the healthcare community need an AI solution superior to what RNNs can currently deliver.<\/p>\n<p>\u00a0<\/p>\n<p><span style=\"font-size: 12pt;\"><strong>ODE net<\/strong><\/span><\/p>\n<p><a href=\"https:\/\/storage.ning.com\/topology\/rest\/1.0\/file\/get\/1373098225?profile=original\" target=\"_blank\" rel=\"noopener noreferrer\"><img decoding=\"async\" src=\"https:\/\/storage.ning.com\/topology\/rest\/1.0\/file\/get\/1373098225?profile=RESIZE_710x\" width=\"350\" class=\"align-right\"><\/a>At the Neural Information Processing Systems (NIPS) conference held in Montreal last December researchers from Canada\u2019s Vector Institute presented an entirely new concept in AI time series modeling, and was named one of four best papers from the conference.<\/p>\n<p>The name of their system \u201cODE net\u201d stands for Ordinary Differential Equation net.\u00a0 Don\u2019t be misled.\u00a0 ODE net doesn\u2019t look anything like a DNN, with no nodes, layers, or interconnects.\u00a0 It is a method of using a black box differential equation solver (DES) with backpropagation and several other clever adaptions to outperform RNNs in both continuous and discrete time series problems.\u00a0 In other words, this is more like a solid slab of computation than anything that might be visualized as a neural net.<\/p>\n<p>There are several interesting changes in mindset that come along with this method.\u00a0 For example, with an RNN you would specify layers and other hyperparameters, run the experiment and see what accuracy you achieved.\u00a0<\/p>\n<p>With ODE net, there is a direct tradeoff between accuracy and time to train.\u00a0 You specify the desired level of accuracy and the ODE net will find the best way to achieve that, allowing training time to vary.\u00a0 If the training time is unacceptably long, specify a lower accuracy and training is faster.\u00a0 One interesting outcome might be to train at high accuracy but to speed testing by specifying a lower accuracy.<\/p>\n<p>The paper which is available <a href=\"https:\/\/arxiv.org\/abs\/1806.07366\"><em><u>here on arXiv.org<\/u><\/em><\/a> is quite thorough and offers the results of several experiments in which the results are clearly superior to RNNs.\u00a0 It\u2019s still in its research phase, but as will most things in data science that won\u2019t necessarily be long.<\/p>\n<p>\u00a0<\/p>\n<p><span style=\"font-size: 12pt;\"><strong>A New Way Forward<\/strong><\/span><\/p>\n<p>It\u2019s particularly interesting that solutions to some of these most intractable problems with our current deep dive into DNNs may not look anything like neural nets.\u00a0 It makes me also wonder for example whatever happened to that promising work in evolutionary algorithms that is no longer mainstream.\u00a0 We may be at the beginning of a very interesting hard fork into entirely new methods of AI.<\/p>\n<p>\u00a0<\/p>\n<p>\u00a0<\/p>\n<p><a href=\"https:\/\/www.datasciencecentral.com\/profiles\/blog\/list?user=0h5qapp2gbuf8\"><em><u>Other articles by Bill Vorhies<\/u><\/em><\/a><\/p>\n<p>\u00a0<\/p>\n<p>About the author:\u00a0 Bill is Contributing Editor for Data Science Central.\u00a0 Bill is also President &#038; Chief Data Scientist at Data-Magnum and has practiced as a data scientist since 2001.\u00a0\u00a0\u00a0 He can be reached at:<\/p>\n<p><a href=\"mailto:Bill@DataScienceCentral.com\">Bill@DataScienceCentral.com<\/a> <span>or<\/span> <a href=\"mailto:Bill@Data-Magnum.com\">Bill@Data-Magnum.com<\/a><\/p>\n<\/div>\n<p><a href=\"https:\/\/www.datasciencecentral.com\/xn\/detail\/6448529:BlogPost:809007\">Go to Source<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Author: William Vorhies Summary: Recurrent Neural Nets (RNNs) are at the core of the most common AI applications in use today but we are rapidly [&hellip;] <span class=\"read-more-link\"><a class=\"read-more\" href=\"https:\/\/www.aiproblog.com\/index.php\/2019\/03\/12\/the-coming-revolution-in-recurrent-neural-nets-rnns\/\">Read More<\/a><\/span><\/p>\n","protected":false},"author":1,"featured_media":460,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_bbp_topic_count":0,"_bbp_reply_count":0,"_bbp_total_topic_count":0,"_bbp_total_reply_count":0,"_bbp_voice_count":0,"_bbp_anonymous_reply_count":0,"_bbp_topic_count_hidden":0,"_bbp_reply_count_hidden":0,"_bbp_forum_subforum_count":0,"footnotes":""},"categories":[26],"tags":[],"_links":{"self":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/1858"}],"collection":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/comments?post=1858"}],"version-history":[{"count":0,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/1858\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media\/463"}],"wp:attachment":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media?parent=1858"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/categories?post=1858"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/tags?post=1858"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}