{"id":1811,"date":"2019-03-05T06:41:19","date_gmt":"2019-03-05T06:41:19","guid":{"rendered":"https:\/\/www.aiproblog.com\/index.php\/2019\/03\/05\/faster-better-cheaper-image-recognition\/"},"modified":"2019-03-05T06:41:19","modified_gmt":"2019-03-05T06:41:19","slug":"faster-better-cheaper-image-recognition","status":"publish","type":"post","link":"https:\/\/www.aiproblog.com\/index.php\/2019\/03\/05\/faster-better-cheaper-image-recognition\/","title":{"rendered":"Faster Better Cheaper Image Recognition"},"content":{"rendered":"<p>Author: William Vorhies<\/p>\n<div>\n<p><strong><em>Summary:<\/em><\/strong><em>\u00a0 In the literal blink of an eye, image-based AI has gone from high cost, high risk projects to quick and reasonably reliable.\u00a0 C-level execs looking for AI techniques to exploit need to revisit their assumptions and move these up the list.\u00a0 Here\u2019s what\u2019s changed.<\/em><\/p>\n<p>\u00a0<\/p>\n<p>For data scientists these are miraculous times.\u00a0 We tend to think of miracles as something that occurs instantaneously but in our world that\u2019s not quite so.\u00a0 Still the rate of change in deep learning, particularly in image recognition is mind boggling and way up there on the miraculous scale.<\/p>\n<p>Buried deep in the fascinating charts and graphs of the <a href=\"http:\/\/cdn.aiindex.org\/2018\/AI%20Index%202018%20Annual%20Report.pdf\"><em><u>2018 AI Index Annual Report<\/u><\/em><\/a> is this chart.<\/p>\n<p>\u00a0<a href=\"https:\/\/storage.ning.com\/topology\/rest\/1.0\/file\/get\/1260074809?profile=original\" target=\"_blank\" rel=\"noopener\"><img decoding=\"async\" src=\"https:\/\/storage.ning.com\/topology\/rest\/1.0\/file\/get\/1260074809?profile=RESIZE_710x\" width=\"500\" class=\"align-center\"><\/a><\/p>\n<\/p>\n<p>In a year and a half, the time required to train a CNN network in the ImageNet competition has fallen from about one hour to less than 4 minutes.\u00a0 That\u2019s a 16X improvement in 18 months.\u00a0 <em>(Keep in mind that the ImageNet dataset is comprised of 14 million labeled images in 20,000 categories.)<\/em><\/p>\n<p>That\u2019s remarkable in itself, but compare this to our experience with training CNNs in December 2015, 36 months back when automated image detection in the ImageNet competition first exceeded human performance.\u00a0 Roughly another 18 months before this chart begins Microsoft researchers used an extraordinary 152 layer CNN which was five times the size of any previous system to beat the human benchmark for the first time <em>(97% for the CNN, 95% for humans)<\/em>.<\/p>\n<p>That effort took Microsoft many months of trial and error as they pioneered the techniques that led to better-than-human accuracy in image recognition.\u00a0 And while we can\u2019t fairly compare all that experimental time to today\u2019s benchmark of less than 4 minutes, it\u2019s accurate to say that in 2016 training a CNN was a task acknowledged to take weeks if not months, and consume compute resources valued in the tens-of-thousands of dollars for a single model, if it managed to train at all.<\/p>\n<p>So 16X doesn\u2019t begin to express how far we\u2019ve come.\u00a0 For the sake of picking a starting point let\u2019s try about 40 hours of compute time then, now reduced to 4 minutes or about a 600X improvement.<\/p>\n<p>It\u2019s worth looking at what factors brought us here.\u00a0 There are three.<\/p>\n<p>\u00a0<\/p>\n<p><span style=\"font-size: 12pt;\"><strong>Tuning the CNN<\/strong><\/span><\/p>\n<p>Two years ago, setting up a CNN was still largely trial and error.\u00a0 Enough simply wasn\u2019t known about picking a starting point of layers, loss functions, node interconnections, and starting weights.\u00a0 Much less how varying any one of these factors would impact the others once launched.<\/p>\n<p><a href=\"https:\/\/storage.ning.com\/topology\/rest\/1.0\/file\/get\/1260082110?profile=original\" target=\"_blank\" rel=\"noopener\"><img decoding=\"async\" src=\"https:\/\/storage.ning.com\/topology\/rest\/1.0\/file\/get\/1260082110?profile=RESIZE_710x\" width=\"250\" class=\"align-right\"><\/a>Thanks to literally thousands of data scientists piling into deep learning at both the research and application level a much better understanding of best hyperparameter tuning has been put into practice.<\/p>\n<p>Our understanding has advanced so far that Microsoft, Google, and several startups offer <a href=\"https:\/\/www.datasciencecentral.com\/profiles\/blogs\/automated-deep-learning-so-simple-anyone-can-do-it\"><em><u>fully automated deep learning platforms<\/u><\/em><\/a> that are all but fool proof.<\/p>\n<p>In much the same way that automated selection and hyperparameter tuning of machine learning algorithms has deemphasized the importance of deep technical knowledge of the algorithms themselves, the same has happened for deep learning.\u00a0 For some time now (<em>OK, that means at least a year<\/em>) we\u2019ve been more concerned about adequate labeled training data than about the mechanics of the CNNs themselves.<\/p>\n<p>\u00a0<\/p>\n<p><span style=\"font-size: 12pt;\"><strong>Chips Are Increasingly Where It\u2019s At<\/strong><\/span><\/p>\n<p>Our readers are most likely aware of the many software advancements that have made the techniques around implementing CNNs easier and faster.\u00a0 Fewer may be aware that an equal or greater share of the credit for this acceleration belongs to the chipmakers.\u00a0 Increasingly all types of AI mean customized silicon optimized for the many different needs of deep learning.<\/p>\n<p>In fact, the dedicated chip track has been evolving as long as CNNs have been the algorithm of choice for image recognition given the much longer development time and much greater capital required for such an effort.<\/p>\n<p>While the first great leap in speed and efficiency came from utilizing GPUs and FPGAs, by 2016 Google had announced its own customized chip, the TPU (Tensor Processing Unit) optimized for its TensorFlow platform.\u00a0 TensorFlow has captured by far the largest share of AI developers and the proprietary TPU chip and platform have given Google ownership of the largest share of AI cloud revenue.<\/p>\n<p>Essentially all the other majors, both existing chipmakers and AI platform developers are following suit including Intel, NVidia, Microsoft, Amazon, IBM, Qualcomm, and a large cohort of startups.<\/p>\n<p>This would seem to be too many competitors for this specialized area but image recognition, and more broadly deep learning-based AI applications have many ways in which to specialize.<\/p>\n<ul>\n<li>For specific applications like facial recognition.<\/li>\n<li>For the tradeoff between high computation performance versus inference.<\/li>\n<li>Embedded applications needing low power consumption versus enterprise applications needing high power.<\/li>\n<li>Ultra-low power for IoT devices and mid-range power for automotive applications.<\/li>\n<\/ul>\n<p>Where GPUs and FPGAs are programmable, the push is specifically to AI-embedded silicon with dedicated niche applications.\u00a0 All these have contributed to the increase in speed and reliability of results in CNN image recognition applications.<\/p>\n<p>\u00a0<\/p>\n<p><span style=\"font-size: 12pt;\"><strong>Transfer Learning<\/strong><\/span><\/p>\n<p>The final contributor is <a href=\"https:\/\/www.datasciencecentral.com\/profiles\/blogs\/transfer-learning-deep-learning-for-everyone\"><em><u>Transfer Learning<\/u><\/em><\/a>, though in this specific case it was not used in the ImageNet competition.<\/p>\n<p>The central concept is to use a more complex but successful pre-trained CNN model to \u2018transfer\u2019 its learning to your more simplified (or equally but not more complex) problem.<\/p>\n<p>The existing successful pre-trained model has two important attributes:<\/p>\n<ol>\n<li>Its tuning parameters have already been tested and found to be successful, eliminating the experimentation around setting the<\/li>\n<li>The earlier or shallower layers of a CNN are essentially learning the features of the image set such as edges, shapes, textures and the like. Only the last one or two layers of a CNN are performing the most complex tasks of summarizing the vectorized image data into the classification data for the 10, 100, or 1,000 different images they are supposed to identify. These earlier shallow layers of the CNN can be thought of as featurizers, discovering the previously undefined features on which the later classification is based.<\/li>\n<\/ol>\n<p>In simplified TL the pre-trained transfer model is simply chopped off at the last one or two layers. \u00a0Once again, the early, shallow layers are those that have identified and vectorized the features and typically only the last one or two layers need to be replaced.<\/p>\n<p>The output of the truncated \u2018featurizer\u2019 front end is then fed to a standard classifier like an SVM or logistic regression to train against your specific images.<\/p>\n<p>The resulting transfer CNN can be trained with as few as 100 labeled images per class, but as always, more is better.\u00a0 This addresses the problem of the availability and cost of creating sufficient labeled training data and also greatly reduces the compute time and accelerates the overall project.<\/p>\n<p>Large libraries of pretrained CNNs that can be used to \u2018donate\u2019 their front ends are available as open source from the major cloud providers hoping to capture the cloud compute revenue, and also from academic sources.<\/p>\n<p>\u00a0<\/p>\n<p><span style=\"font-size: 12pt;\"><strong>A Note about the 2018 AI Index Annual Report<\/strong><\/span><\/p>\n<p>This annual report is the work of Stanford\u2019s Human-Centered AI Institute who for the last few years has been publishing the encyclopedic \u2018AI Index\u2019.\u00a0 To see how the key chart above was developed and the sources that were used <a href=\"http:\/\/cdn.aiindex.org\/2018\/AI%20Index%202018%20Annual%20Report.pdf\"><em><u>see the report<\/u><\/em><\/a> and its equally informative appendices.<\/p>\n<p>\u00a0<\/p>\n<p><span style=\"font-size: 12pt;\"><strong>Your Strategic Business Assumptions May Need to Change<\/strong><\/span><\/p>\n<p>If you\u2019re a C-level executive looking at reasonable applications of AI to leverage in your newly digital enterprise you need to be careful who you talk to.\u00a0 Not every data scientist will be aware how rapidly image-based AI is becoming better, faster, and cheaper.<\/p>\n<p>24 months ago I was still advising that image-based AI was a bleeding edge technique and a project with high costs and a high risk of failure.\u00a0 In the blink of an eye that\u2019s changed.\u00a0<\/p>\n<p>\u00a0<\/p>\n<p>\u00a0<\/p>\n<p><a href=\"https:\/\/www.datasciencecentral.com\/profiles\/blog\/list?user=0h5qapp2gbuf8\"><em><u>Other articles by Bill Vorhies<\/u><\/em><\/a><\/p>\n<p>\u00a0<\/p>\n<p>About the author:\u00a0 Bill is Contributing Editor for Data Science Central.\u00a0 Bill is also President &#038; Chief Data Scientist at Data-Magnum and has practiced as a data scientist since 2001.\u00a0\u00a0\u00a0 He can be reached at:<\/p>\n<p><a href=\"mailto:Bill@DataScienceCentral.com\">Bill@DataScienceCentral.com<\/a> <span>or<\/span> <a href=\"mailto:Bill@Data-Magnum.com\">Bill@Data-Magnum.com<\/a><\/p>\n<\/div>\n<p><a href=\"https:\/\/www.datasciencecentral.com\/xn\/detail\/6448529:BlogPost:806976\">Go to Source<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Author: William Vorhies Summary:\u00a0 In the literal blink of an eye, image-based AI has gone from high cost, high risk projects to quick and reasonably [&hellip;] <span class=\"read-more-link\"><a class=\"read-more\" href=\"https:\/\/www.aiproblog.com\/index.php\/2019\/03\/05\/faster-better-cheaper-image-recognition\/\">Read More<\/a><\/span><\/p>\n","protected":false},"author":1,"featured_media":471,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_bbp_topic_count":0,"_bbp_reply_count":0,"_bbp_total_topic_count":0,"_bbp_total_reply_count":0,"_bbp_voice_count":0,"_bbp_anonymous_reply_count":0,"_bbp_topic_count_hidden":0,"_bbp_reply_count_hidden":0,"_bbp_forum_subforum_count":0,"footnotes":""},"categories":[26],"tags":[],"_links":{"self":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/1811"}],"collection":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/comments?post=1811"}],"version-history":[{"count":0,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/1811\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media\/472"}],"wp:attachment":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media?parent=1811"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/categories?post=1811"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/tags?post=1811"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}