{"id":1644,"date":"2019-01-29T06:37:16","date_gmt":"2019-01-29T06:37:16","guid":{"rendered":"https:\/\/www.aiproblog.com\/index.php\/2019\/01\/29\/one-shot-learning-and-other-strategies-for-reducing-training-data\/"},"modified":"2019-01-29T06:37:16","modified_gmt":"2019-01-29T06:37:16","slug":"one-shot-learning-and-other-strategies-for-reducing-training-data","status":"publish","type":"post","link":"https:\/\/www.aiproblog.com\/index.php\/2019\/01\/29\/one-shot-learning-and-other-strategies-for-reducing-training-data\/","title":{"rendered":"One Shot Learning and Other Strategies for Reducing Training Data"},"content":{"rendered":"<p>Author: William Vorhies<\/p>\n<div>\n<p><strong><em>Summary:<\/em><\/strong> <em>Not enough labeled training data is a huge barrier to getting at the equally large benefits that could be had from deep learning applications.\u00a0 Here are five strategies for getting around the data problem including the latest in One Shot Learning.<\/em><\/p>\n<p>\u00a0<\/p>\n<p><a href=\"https:\/\/storage.ning.com\/topology\/rest\/1.0\/file\/get\/909934010?profile=original\" target=\"_blank\" rel=\"noopener\"><img decoding=\"async\" src=\"https:\/\/storage.ning.com\/topology\/rest\/1.0\/file\/get\/909934010?profile=RESIZE_710x\" width=\"300\" class=\"align-right\"><\/a>For at least the last two years we\u2019ve been in an interesting period in which data, specifically training data is more important than algorithms.\u00a0 In the area of deep learning for problems involving image classification, the cost and effort to obtain enough labeled training data has held back progress for companies without the deep pockets to build their own.<\/p>\n<p>A <a href=\"https:\/\/www.deeplearningbook.org\/\"><em><u>2016 study<\/u><\/em><\/a> by Goodfellow, Bengio and Courville concluded you could get \u2018acceptable\u2019 performance with about 5,000 labeled examples per category BUT it would take <strong>10 Million labeled examples per category<\/strong> to \u201cmatch or exceed human performance\u201d.\u00a0<\/p>\n<p>Any time there is a pain point like this, the market (meaning us data scientists) responds by trying to solve the problem and remove the road block.\u00a0 Here\u2019s a brief review of the techniques we\u2019ve introduced to help alleviate this, concluding with \u2018One Shot Learning\u2019, what that really means and whether it actually works.<\/p>\n<p>\u00a0<\/p>\n<p><span style=\"font-size: 12pt;\"><strong>Active Learning<\/strong><\/span><\/p>\n<p><a href=\"https:\/\/www.datasciencecentral.com\/profiles\/blogs\/breaking-through-the-cost-barrier-to-deep-learning\"><em><u>Active learning is a data strategy<\/u><\/em><\/a> more than a specific technique.\u00a0 The goal is to determine the best tradeoff between accuracy and the amount of labeled data you\u2019ll need.\u00a0 Properly executed you can see the increases in accuracy begin to flatten as the amount of training data increases and it becomes an optimization problem to pick the right point to stop adding new data and control cost.<\/p>\n<p>You can always employee your own manpower to manually label additional training images.\u00a0 Or you might try a company like Figure 8 that has built an entire service industry around their platform for automated label generation with human-in-the-loop correction.\u00a0 Figure 8 is a leading proponent of Active Learning and there\u2019s a great <a href=\"https:\/\/www.datasciencecentral.com\/video\/dsc-webinar-series-ai-models-and-active-learning\"><em><u>explanatory video here<\/u><\/em><\/a>.<\/p>\n<p>New data can be labeled in one of two ways:<\/p>\n<ul>\n<li>Using human labor.<\/li>\n<li>Using a separate CNN to attempt to predict the label for an image automatically.<\/li>\n<\/ul>\n<p>This second idea of using a separate CNN to label unseen data is appealing because it partially eliminates the direct human labor, but adds complexity and development cost.\u00a0<\/p>\n<p>The folks at <a href=\"https:\/\/www.datasciencecentral.com\/profiles\/blogs\/democratizing-deep-learning-the-stanford-dawn-project\"><em><u>the Stanford Dawn project<\/u><\/em><\/a> for democratizing deep learning are working on several aspects of this including their open source application Snorkel, for automated data labeling.<\/p>\n<p>Whether you use humans or machines, there will need to be a process for humans to check the work so that the new data doesn\u2019t introduce errors into your models.\u00a0 All things considered, if you\u2019re building a de novo CNN at scale, you should incorporate Active Learning in your data acquisition strategy.<\/p>\n<p>\u00a0<\/p>\n<p><span style=\"font-size: 12pt;\"><strong>Transfer Learning<\/strong><\/span><\/p>\n<p><a href=\"https:\/\/storage.ning.com\/topology\/rest\/1.0\/file\/get\/909940251?profile=original\" target=\"_blank\" rel=\"noopener\"><img decoding=\"async\" src=\"https:\/\/storage.ning.com\/topology\/rest\/1.0\/file\/get\/909940251?profile=RESIZE_710x\" width=\"250\" class=\"align-right\"><\/a>The central concept is to use a more complex and already successful pre-trained CNN model to <a href=\"https:\/\/www.datasciencecentral.com\/profiles\/blogs\/transfer-learning-deep-learning-for-everyone\">\u2018<em><u>transfer\u2019 its learning<\/u><\/em><\/a> to your more simplified problem.\u00a0 The earlier or shallower convolutional layers of the already successful CNN have learned the features.\u00a0 In short, in TL we retain the successful front end layers and disconnect the backend classifier replacing it with the classifier for your new problem.<\/p>\n<p>Then we retrain the new hybrid TL with your problem data which can be remarkably successful with far less data.\u00a0 Sometimes as few as 100 items per class but 1,000 is a more reasonable estimate.<\/p>\n<p>Transfer Learning services have been rolled out by Microsoft (Microsoft Custom Vision Services, <a href=\"https:\/\/www.customvision.ai\/\">https:\/\/www.customvision.ai\/<\/a><span>, and Google (in beta as of last January<\/span> <a href=\"https:\/\/cloud.google.com\/automl\/\"><em><u>Cloud AutoML<\/u><\/em><\/a>) as well as some of the automated ML platforms.\u00a0<\/p>\n<p>\u00a0<\/p>\n<p><span style=\"font-size: 12pt;\"><strong>One Shot Learning \u2013 Facial Recognition<\/strong><\/span><\/p>\n<p>There are currently at least two versions of One Shot Learning, the earliest of which addresses a facial recognition problem \u2013 suppose you have only one picture of an individual, an employee for example, how do you design a CNN to determine if that person\u2019s image is already in your file when they present their face for scanning to gain access to whatever you\u2019re trying to protect.\u00a0 In practice it works for any problem where you need to determine if the new image matches any image already in your database.\u00a0 Have we seen this person or thing before?<\/p>\n<p>Ordinary CNN development would require retraining the CNN for each new employee to create a correct classifier for that specific person, because the CNN does not retain weights that can be applied to new unseen data it means basically training from scratch.\u00a0 It\u2019s easy to see this would be a practical non-starter.<\/p>\n<p><a href=\"https:\/\/storage.ning.com\/topology\/rest\/1.0\/file\/get\/909943678?profile=original\" target=\"_blank\" rel=\"noopener\"><img decoding=\"async\" src=\"https:\/\/storage.ning.com\/topology\/rest\/1.0\/file\/get\/909943678?profile=RESIZE_710x\" width=\"350\" class=\"align-right\"><\/a>The solution is to use two parallel CNNs sometimes called Siamese CNNs and a new discriminator function called \u2018similarity\u2019 to compare the stored image vectors to the new image.<\/p>\n<p>The original CNN is a fully trained network but lacks the classifier layer.\u00a0<\/p>\n<p>The second CNN analyzes the new image and creates the same dense vector layer as the original CNN.<\/p>\n<p>The \u2018similarity\u2019 function compares the new image to all available stored image vectors and derives a similarity score (aka discriminator or distance score).\u00a0 High scores indicate no match and low scores indicate a match.<\/p>\n<p>Although this is called a \u2018One Shot\u2019 learner, like all solutions in this category it works best with a few more than one, perhaps more like five or ten in the original CNN scoring base.\u00a0 Also, this approach solves for a specific type of problem comparing images but does not create a classifier required for more general problem types.<\/p>\n<p>\u00a0<\/p>\n<p><span style=\"font-size: 12pt;\"><strong>One Shot Learning \u2013 With Memory<\/strong><\/span><\/p>\n<p>In 2016 researchers at Google\u2019s Deep Mind published a paper with the results of their work on \u201c<a href=\"https:\/\/arxiv.org\/pdf\/1605.06065.pdf\"><em><u>One Shot Learning with Memory Augmented Neural Networks<\/u><\/em><\/a><em>\u201d<\/em> (MANN).\u00a0 The core of the problem is that deep neural nets don\u2019t retain their node weights from epoch to epoch and if they could then the researchers reasoned they would be able to learn intuitively, more like humans.<\/p>\n<p>Their system worked, using feed forward neural nets or LSTMs as the controller married to external memory storage at each node level.<\/p>\n<p><a href=\"https:\/\/storage.ning.com\/topology\/rest\/1.0\/file\/get\/909947850?profile=original\" target=\"_blank\" rel=\"noopener\"><img decoding=\"async\" src=\"https:\/\/storage.ning.com\/topology\/rest\/1.0\/file\/get\/909947850?profile=RESIZE_710x\" width=\"300\" class=\"align-right\"><\/a>This diagram from their paper shows how the external storage is integrated into the network.<\/p>\n<p>The MANN was then trained over more than 10,000 episodes with about 1600 separate classes with only a few examples each.\u00a0 After some experimental tuning the MANN was achieving over 80% accuracy after having seen a class object only four times, and over 95% accuracy having seen the class object ten times.<\/p>\n<p>Where this has proven valuable is for example in correctly classifying hand written numbers or characters.\u00a0 In a separate example a MANN trained on numbers (the MNIST database) was repurposed to correctly identify Greek characters that it had not previously seen.<\/p>\n<p>This may sound experimental but in the last two years it\u2019s now possible to implement this fairly simply with the \u2018one shot handwritten character classifier\u2019 in Python.<\/p>\n<p>\u00a0<\/p>\n<p><span style=\"font-size: 12pt;\"><strong>GANNs \u2013 Generative Adversarial Neural Nets<\/strong><\/span><\/p>\n<p>Correctly classifying unseen objects based on only a few examples is great but there\u2019s another way to come at creating training data and that\u2019s <a href=\"https:\/\/www.datasciencecentral.com\/profiles\/blogs\/a-primer-in-adversarial-machine-learning-the-next-advance-in-ai\"><em><u>with GANNs<\/u><\/em><\/a>.<\/p>\n<p>You\u2019ll recall that this is about two neural nets battling it out, one (the discriminator) trying to correctly classify images and the second (the generator) trying to create false images that will fool the first.\u00a0 When fully trained the generator can create images that will make the discriminator identify them as a true image.\u00a0 Voila, instant training data.<\/p>\n<p>You probably know GANNs from their more recent application, creating life like fakes of celebrities doing inappropriate things, or creating realistic images of non-existent people, animals, building, and everything else.<\/p>\n<p>Personally I think GANNs are not quite ready for commercial rollout but they\u2019re getting close.\u00a0 You should think of this as a bleeding edge application that may work out for you and it may not.\u00a0 There\u2019s a lot we don\u2019t yet really know about those generated images.<\/p>\n<p>\u00a0<\/p>\n<p><span style=\"font-size: 12pt;\"><strong>But Do They Generalize<\/strong><\/span><\/p>\n<p>If you need training data to differentiate dogs from cats, staplers from wastebaskets, or Hondas from Toyotas then by all means to ahead.\u00a0 However, I was asked by a Pediatric Cardiologist if GANNs could be used to create training data of infant hearts for the purpose of training a CNN to identify the various lobes of the tiny heart.<\/p>\n<p>She would then use the inferenced CNN image to guide an instrument into the infant heart to repair defects.\u00a0 She didn\u2019t yet have enough training data based on real CT scans of infant hearts and her application wasn\u2019t sufficiently accurate.<\/p>\n<p>It\u2019s tempting to say that GANN generated images would be legitimate training data.\u00a0 The problem is that if she missed her intended target with her instrument it would puncture the heart leading to death.\u00a0<\/p>\n<p>Since all of our techniques are inherently probabilistic with both false positives and false negatives, there was no way I was comfortable advising that artificially created training data was OK.<\/p>\n<p>So in all of these techniques from Transfer Learning to One Shot to GANNs, be sure to consider whether the small number of images you are using for training are sufficiently representative of the total environment and what the consequences of error might be.<\/p>\n<p>\u00a0<\/p>\n<p>\u00a0<\/p>\n<p><a href=\"https:\/\/www.datasciencecentral.com\/profiles\/blog\/list?user=0h5qapp2gbuf8\"><em><u>Other articles by Bill Vorhies.<\/u><\/em><\/a><\/p>\n<p><em><u>\u00a0<\/u><\/em><\/p>\n<p>About the author:\u00a0 Bill is Editorial Director for Data Science Central.\u00a0 Bill is also President &#038; Chief Data Scientist at Data-Magnum and has practiced as a data scientist since 2001.\u00a0\u00a0\u00a0 He can be reached at:<\/p>\n<p><a href=\"mailto:Bill@DataScienceCentral.com\">Bill@DataScienceCentral.com<\/a> <span>or<\/span> <a href=\"mailto:Bill@Data-Magnum.com\">Bill@Data-Magnum.com<\/a><\/p>\n<\/div>\n<p><a href=\"https:\/\/www.datasciencecentral.com\/xn\/detail\/6448529:BlogPost:797082\">Go to Source<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Author: William Vorhies Summary: Not enough labeled training data is a huge barrier to getting at the equally large benefits that could be had from [&hellip;] <span class=\"read-more-link\"><a class=\"read-more\" href=\"https:\/\/www.aiproblog.com\/index.php\/2019\/01\/29\/one-shot-learning-and-other-strategies-for-reducing-training-data\/\">Read More<\/a><\/span><\/p>\n","protected":false},"author":1,"featured_media":461,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_bbp_topic_count":0,"_bbp_reply_count":0,"_bbp_total_topic_count":0,"_bbp_total_reply_count":0,"_bbp_voice_count":0,"_bbp_anonymous_reply_count":0,"_bbp_topic_count_hidden":0,"_bbp_reply_count_hidden":0,"_bbp_forum_subforum_count":0,"footnotes":""},"categories":[26],"tags":[],"_links":{"self":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/1644"}],"collection":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/comments?post=1644"}],"version-history":[{"count":0,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/1644\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media\/468"}],"wp:attachment":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media?parent=1644"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/categories?post=1644"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/tags?post=1644"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}