{"id":1721,"date":"2019-02-14T00:24:44","date_gmt":"2019-02-14T00:24:44","guid":{"rendered":"https:\/\/www.aiproblog.com\/index.php\/2019\/02\/14\/adversarial-attacks-on-deep-neural-networks-an-overview\/"},"modified":"2019-02-14T00:24:44","modified_gmt":"2019-02-14T00:24:44","slug":"adversarial-attacks-on-deep-neural-networks-an-overview","status":"publish","type":"post","link":"https:\/\/www.aiproblog.com\/index.php\/2019\/02\/14\/adversarial-attacks-on-deep-neural-networks-an-overview\/","title":{"rendered":"Adversarial Attacks on Deep Neural Networks: an Overview"},"content":{"rendered":"<p>Author: Anant Jain<\/p>\n<div>\n<p class=\"graf graf--p graf-after--h4\"><span style=\"font-size: 14pt;\"><strong>Introduction<\/strong><\/span><\/p>\n<p id=\"e1f2\" class=\"graf graf--p graf-after--h4\"><span style=\"font-size: 12pt;\">Deep Neural Networks are highly expressive machine learning networks that have been around for many decades. In 2012, with gains in computing power and improved tooling, a family of these machine learning models called <em class=\"markup--em markup--p-em\">ConvNets<\/em> started achieving state of the art performance on visual recognition tasks. Up to this point, machine learning algorithms simply didn\u2019t work well enough for anyone to be surprised when it failed to do the right thing.<\/span><\/p>\n<p id=\"3e76\" class=\"graf graf--p graf-after--p\"><span style=\"font-size: 12pt;\">In 2014, a group of researchers at Google and NYU found that it was far too easy to fool ConvNets with an imperceivable, but carefully constructed nudge in the input [1]. Let\u2019s look at an example. We start with an image of a panda, which our neural network correctly recognizes as a \u201cpanda\u201d with 57.7% confidence. Add a little bit of carefully constructed noise and the same neural network now thinks this is an image of a gibbon<\/span><span style=\"font-size: 12pt;\">\u00a0with 99.3% confidence! This is, clearly, an optical illusion \u2014 but for the neural network. You and I can clearly tell that both the images look like pandas \u2014 in fact, we can\u2019t even tell that some noise has been added to the original image to construct the adversarial example on the right!<\/span><\/p>\n<p class=\"graf graf--p graf-after--p\"><a href=\"https:\/\/storage.ning.com\/topology\/rest\/1.0\/file\/get\/1023441257?profile=original\" target=\"_blank\" rel=\"noopener\"><img decoding=\"async\" src=\"https:\/\/storage.ning.com\/topology\/rest\/1.0\/file\/get\/1023441257?profile=RESIZE_710x\" class=\"align-full\"><\/a><\/p>\n<p class=\"graf graf--p graf-after--p\"><span style=\"font-size: 12pt;\">In 2017, another group demonstrated that it\u2019s possible for these adversarial examples to generalize to the real world by showing that when printed out, an adversarially constructed image will continue to fool neural networks under different lighting and orientations [2]:<\/span><\/p>\n<p class=\"graf graf--p graf-after--p\"><a href=\"https:\/\/storage.ning.com\/topology\/rest\/1.0\/file\/get\/1023444471?profile=original\" target=\"_blank\" rel=\"noopener\"><img decoding=\"async\" src=\"https:\/\/storage.ning.com\/topology\/rest\/1.0\/file\/get\/1023444471?profile=RESIZE_710x\" class=\"align-full\"><\/a><\/p>\n<p class=\"graf graf--p graf-after--p\">\n<p class=\"graf graf--p graf-after--p\"><span style=\"font-size: 12pt;\">Another interesting work, titled <em class=\"markup--em markup--p-em\">\u201cAccessorize to a Crime: Real and Stealthy Attacks on State-of-the-Art Face Recognition\u201d\u00a0<\/em>[3] showed that one can fool facial recognition software by constructing adversarial glasses by dodging face detection altogether. These glasses could let you impersonate someone else as well:<\/span><\/p>\n<p class=\"graf graf--p graf-after--p\"><a href=\"https:\/\/storage.ning.com\/topology\/rest\/1.0\/file\/get\/1023448388?profile=original\" target=\"_blank\" rel=\"noopener\"><img decoding=\"async\" src=\"https:\/\/storage.ning.com\/topology\/rest\/1.0\/file\/get\/1023448388?profile=RESIZE_710x\" class=\"align-full\"><\/a><\/p>\n<p class=\"graf graf--p graf-after--p\"><span style=\"font-size: 12pt;\">Shortly after, another research group demonstrated various methods for constructing stop signs that can fool models by placing various stickers on a stop sign[4]. The perturbations were designed to mimic graffiti, and thus \u201chide in the human psyche.\u201d<\/span><\/p>\n<p class=\"graf graf--p graf-after--p\"><a href=\"https:\/\/storage.ning.com\/topology\/rest\/1.0\/file\/get\/1023449767?profile=original\" target=\"_blank\" rel=\"noopener\"><img decoding=\"async\" src=\"https:\/\/storage.ning.com\/topology\/rest\/1.0\/file\/get\/1023449767?profile=RESIZE_710x\" class=\"align-full\"><\/a><\/p>\n<p class=\"graf graf--p graf-after--p\"><span style=\"font-size: 12pt;\"><em class=\"markup--em markup--p-em\">\u201cAdversarial Patch\u201d<\/em> [5]<em class=\"markup--em markup--p-em\">,<\/em> a paper published at NIPS 2017 demonstrated how to generate a patch that can be placed anywhere within the field of view of the classifier and cause the classifier to output a targeted class. In the video below, a banana is correctly classified as a banana. Placing a sticker with a toaster printed on it is not enough to fool the network and it still continues to classify it as a banana. However, with a carefully constructed \u201cadversarial patch\u201d, it\u2019s easy to trick the network into thinking that it\u2019s a toaster:<\/span><\/p>\n<p class=\"graf graf--p graf-after--p\"><a href=\"https:\/\/storage.ning.com\/topology\/rest\/1.0\/file\/get\/1023450836?profile=original\" target=\"_blank\" rel=\"noopener\"><img decoding=\"async\" src=\"https:\/\/storage.ning.com\/topology\/rest\/1.0\/file\/get\/1023450836?profile=RESIZE_710x\" class=\"align-full\"><\/a><\/p>\n<p class=\"graf graf--p graf-after--p\"><span style=\"font-size: 12pt;\">To quote the authors, \u201cthis attack was significant because the attacker does not need to know what image they are attacking when constructing the attack. After generating an adversarial patch, the patch could be widely distributed across the Internet for other attackers to print out and use.\u201d<\/span><\/p>\n<p class=\"graf graf--p graf-after--p\"><span style=\"font-size: 12pt;\">What these examples show us is that our neural networks are still quite fragile when explicitly attacked by an adversary in this way. Let\u2019s dive deeper!<\/span><\/p>\n<h3 id=\"4589\" class=\"graf graf--h3 graf-after--p\"><span style=\"font-size: 14pt;\"><strong>What\u2019s so remarkable about these\u00a0attacks?<\/strong><\/span><\/h3>\n<p id=\"3090\" class=\"graf graf--p graf-after--h3\"><span style=\"font-size: 12pt;\">First, as we saw above, it\u2019s easy to attain <strong class=\"markup--strong markup--p-strong\">high confidence<\/strong> in the incorrect classification of an adversarial example \u2014 recall that in the first \u201cpanda\u201d example we looked at, the network is less sure of an actual image looking like a panda (57.7%) than our adversarial example on the right looking like a gibbon (99.3%). Another intriguing point is how <strong class=\"markup--strong markup--p-strong\">imperceptibly little noise\u00a0<\/strong>we needed to add to fool the system \u2014 after all, clearly, the added noise is not enough to fool us, the humans.<\/span><\/p>\n<p class=\"graf graf--p graf-after--h3\"><span style=\"font-size: 12pt;\">Second, the adversarial examples don\u2019t depend much on the specific deep neural network used for the task \u2014 an adversarial example trained for one network seems to confuse another one as well. In other words, multiple classifiers assign the same (wrong) class to an adversarial example. This \u201c<strong class=\"markup--strong markup--p-strong\">transferability<\/strong>\u201d enables attackers to fool systems in what are known as \u201cblack-box attacks\u201d where they don\u2019t have access to the model\u2019s architecture, parameters or even the training data used to train the network.<\/span><\/p>\n<h3 id=\"ee5b\" class=\"graf graf--h3 graf-after--p\"><span style=\"font-size: 14pt;\"><strong>How do you defend against these attacks?<\/strong><\/span><\/h3>\n<p id=\"26af\" class=\"graf graf--p graf-after--h3\"><span style=\"font-size: 12pt;\">Let\u2019s quickly look at two categories of defenses that have been proposed so far:<\/span><\/p>\n<h4 id=\"9b93\" class=\"graf graf--h4 graf-after--p\"><span style=\"font-size: 12pt;\"><strong>1. Adversarial training<\/strong><\/span><\/h4>\n<p id=\"1553\" class=\"graf graf--p graf-after--h4\"><span style=\"font-size: 12pt;\">One of the easiest and most brute-force way to defend against these attacks is to pretend to be the attacker, generate a number of adversarial examples against your own network, and then explicitly train the model to not be fooled by them. This improves the generalization of the model but hasn\u2019t been able to provide a meaningful level of robustness \u2014 in fact, it just ends up being a game of whack-a-mole where attackers and defenders are just trying to one-up each other.<\/span><\/p>\n<h4 id=\"b2d3\" class=\"graf graf--h4 graf-after--p\"><span style=\"font-size: 12pt;\"><strong>2. Defensive distillation<\/strong><\/span><\/h4>\n<p id=\"cbeb\" class=\"graf graf--p graf-after--h4\"><span style=\"font-size: 12pt;\">In defensive distillation, we train a secondary model whose surface is smoothed in the directions an attacker will typically try to exploit, making it difficult for them to discover adversarial input tweaks that lead to incorrect categorization. The reason it works is that unlike the first model, the second model is trained on the primary model\u2019s \u201csoft\u201d probability outputs, rather than the \u201chard\u201d (0\/1) true labels from the real training data. This technique was shown to have some success defending initial variants of adversarial attacks but has been beaten by more recent ones, like the Carlini-Wagner attack, which is the current benchmark for evaluating the robustness of a neural network against adversarial attacks.<\/span><\/p>\n<h4 id=\"cb0b\" class=\"graf graf--h4 graf-after--p\"><span style=\"font-size: 14pt;\"><strong>An intuition behind Adversarial Attacks<\/strong><\/span><\/h4>\n<p id=\"5efc\" class=\"graf graf--p graf-after--h4\"><span style=\"font-size: 12pt;\">Let\u2019s try to develop an intuition behind what\u2019s going on here. Most of the time, machine learning models work very well but only work on a very small amount of all the many possible inputs they might encounter. In a high-dimensional space, a very small perturbation in each individual input pixel can be enough to cause a dramatic change in the dot products down the neural network. So, it\u2019s very easy to nudge the input image to a point in high-dimensional space that our networks have never seen before. This is a key point to keep in mind: the high dimensional spaces are so sparse that most of our training data is concentrated in a very small region known as the <em class=\"markup--em markup--p-em\">manifold<\/em>. Although our neural networks are nonlinear by definition, the most common activation function we use to train them, the Rectifier Linear Unit, or ReLu, is linear for inputs greater than 0.<\/span><\/p>\n<p class=\"graf graf--p graf-after--h4\"><a href=\"https:\/\/storage.ning.com\/topology\/rest\/1.0\/file\/get\/1023456803?profile=original\" target=\"_blank\" rel=\"noopener\"><img decoding=\"async\" src=\"https:\/\/storage.ning.com\/topology\/rest\/1.0\/file\/get\/1023456803?profile=RESIZE_710x\" class=\"align-full\"><\/a><\/p>\n<p id=\"3c74\" class=\"graf graf--p graf-after--figure\"><span style=\"font-size: 12pt;\">ReLu became the preferred activations function due to its ease of trainability. Compared to sigmoid or tanh<\/span><span style=\"font-size: 12pt;\">\u00a0activation functions that simply saturate to a capped value at high activations and thus have gradients getting \u201cstuck\u201d very close to 0, the ReLu has a non-zero gradient everywhere to the right of 0, making it much more stable and faster to train. But, that also makes it possible to push the ReLu activation function to arbitrarily high values.<\/span><\/p>\n<p id=\"3317\" class=\"graf graf--p graf-after--p\"><span style=\"font-size: 12pt;\">Looking at this <strong class=\"markup--strong markup--p-strong\">trade-off between trainability and robustness to adversarial attacks<\/strong>, we can conclude that the neural network models we have been using are intrinsically flawed. Ease of optimization has come at the cost of models that are easily misled.<\/span><\/p>\n<h3 id=\"56a4\" class=\"graf graf--h3 graf-after--p\"><span style=\"font-size: 14pt;\"><strong>What\u2019s next in this space?<\/strong><\/span><\/h3>\n<p id=\"0a85\" class=\"graf graf--p graf-after--h3\"><span style=\"font-size: 12pt;\">The real problem here is that our machine learning models exhibit unpredictable and overly confident behavior outside of the training distribution. Adversarial examples are just a subset of this broader problem. We would like our models to be able to exhibit appropriately low confidence when they\u2019re operating in regions they have not seen before. We want them to \u201cfail gracefully\u201d when used in production.<\/span><\/p>\n<p id=\"2a74\" class=\"graf graf--p graf-after--p\"><span style=\"font-size: 12pt;\">According to Ian Goodfellow, one of the pioneers of this field, <em>\u201cmany of the most important problems still remain open, both in terms of theory and in terms of applications. We do not yet know whether defending against adversarial examples is a theoretically hopeless endeavor or if an optimal strategy would give the defender an upper ground. On the applied side, no one has yet designed a truly powerful defense algorithm that can resist a wide variety of adversarial example attack algorithms.\u201d<\/em><\/span><\/p>\n<p id=\"b11b\" class=\"graf graf--p graf-after--p\"><span style=\"font-size: 12pt;\">If nothing else, the topic of adversarial examples gives us an insight into what most researchers have been saying for a while \u2014 despite the breakthroughs, we are still in the infancy of machine learning and still have a long way to go here. Machine Learning is just another tool, susceptible to adversarial attacks which can have huge implications in a world where we trust them with human lives via self-driving cars and other automation.<\/span><\/p>\n<h4 id=\"655f\" class=\"graf graf--h4 graf-after--p\"><span style=\"font-size: 14pt;\"><strong>References<\/strong><\/span><\/h4>\n<p id=\"6ee5\" class=\"graf graf--p graf-after--h4\"><span style=\"font-size: 12pt;\">Here are the links to the papers referenced above. I also highly recommend checking out Ian Goodfellow\u2019s <a href=\"http:\/\/www.cleverhans.io\/\" class=\"markup--anchor markup--p-anchor\" rel=\"noopener\">blog<\/a> on the topic.<\/span><\/p>\n<ol class=\"postList\">\n<li id=\"86ea\" class=\"graf graf--li graf-after--p\"><span style=\"font-size: 12pt;\"><a href=\"https:\/\/arxiv.org\/abs\/1412.6572\" class=\"markup--anchor markup--li-anchor\" rel=\"noopener\">Explaining and Harnessing Adversarial Examples<\/a>, Goodfellow et al, ICLR 2015.<\/span><\/li>\n<li id=\"60c6\" class=\"graf graf--li graf-after--li\"><span style=\"font-size: 12pt;\"><a href=\"https:\/\/arxiv.org\/pdf\/1607.02533.pdf\" class=\"markup--anchor markup--li-anchor\" rel=\"noopener\">Adversarial Examples in the Physical World<\/a>. Kurakin et al, ICLR 2017.<\/span><\/li>\n<li id=\"eb3e\" class=\"graf graf--li graf-after--li\"><span style=\"font-size: 12pt;\"><a href=\"https:\/\/www.cs.cmu.edu\/~sbhagava\/papers\/face-rec-ccs16.pdf\" class=\"markup--anchor markup--li-anchor\" rel=\"noopener\">Accessorize to a Crime: Real and Stealthy Attacks on State-of-the-Art Face Recognition<\/a>. Sharif et al.<\/span><\/li>\n<li id=\"4c50\" class=\"graf graf--li graf-after--li\"><span style=\"font-size: 12pt;\"><a href=\"https:\/\/arxiv.org\/pdf\/1707.08945.pdf\" class=\"markup--anchor markup--li-anchor\" rel=\"noopener\">Robust Physical-World Attacks on Deep Learning Visual Classification<\/a>. Eykholt et al.<\/span><\/li>\n<li id=\"46c2\" class=\"graf graf--li graf-after--li graf--trailing\"><span style=\"font-size: 12pt;\"><a href=\"https:\/\/arxiv.org\/pdf\/1712.09665.pdf\" class=\"markup--anchor markup--li-anchor\" rel=\"noopener\">Adversarial Patch<\/a>. Brown et al.<\/span><\/li>\n<\/ol>\n<h4><span style=\"font-size: 14pt;\"><strong>About the author:<\/strong><\/span><\/h4>\n<p><span style=\"font-size: 12pt;\">Anant Jain is a Co-Founder of Commonlounge.com, an educational platform with wiki-based short courses on topics like Deep Learning, Web Development, UX\/UI Design etc. Before Commonlounge, Anant co-founded EagerPanda which raised over $4M in venture funding and prior to that, he had a short stint at Microsoft Research as part of his undergrad. You can find more about him at\u00a0<\/span><a href=\"https:\/\/index.anantja.in\/\" target=\"_blank\" rel=\"nofollow noopener noreferrer\"><span style=\"font-size: 12pt;\">https:\/\/index.anantja.in<\/span><\/a><\/p>\n<\/p>\n<\/div>\n<p><a href=\"https:\/\/www.datasciencecentral.com\/xn\/detail\/6448529:BlogPost:800780\">Go to Source<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Author: Anant Jain Introduction Deep Neural Networks are highly expressive machine learning networks that have been around for many decades. In 2012, with gains in [&hellip;] <span class=\"read-more-link\"><a class=\"read-more\" href=\"https:\/\/www.aiproblog.com\/index.php\/2019\/02\/14\/adversarial-attacks-on-deep-neural-networks-an-overview\/\">Read More<\/a><\/span><\/p>\n","protected":false},"author":1,"featured_media":468,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_bbp_topic_count":0,"_bbp_reply_count":0,"_bbp_total_topic_count":0,"_bbp_total_reply_count":0,"_bbp_voice_count":0,"_bbp_anonymous_reply_count":0,"_bbp_topic_count_hidden":0,"_bbp_reply_count_hidden":0,"_bbp_forum_subforum_count":0,"footnotes":""},"categories":[26],"tags":[],"_links":{"self":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/1721"}],"collection":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/comments?post=1721"}],"version-history":[{"count":0,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/1721\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media\/465"}],"wp:attachment":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media?parent=1721"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/categories?post=1721"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/tags?post=1721"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}