{"id":2493,"date":"2019-08-23T06:36:26","date_gmt":"2019-08-23T06:36:26","guid":{"rendered":"https:\/\/www.aiproblog.com\/index.php\/2019\/08\/23\/neural-machine-translation-with-attention-mechanism-step-by-step-guide\/"},"modified":"2019-08-23T06:36:26","modified_gmt":"2019-08-23T06:36:26","slug":"neural-machine-translation-with-attention-mechanism-step-by-step-guide","status":"publish","type":"post","link":"https:\/\/www.aiproblog.com\/index.php\/2019\/08\/23\/neural-machine-translation-with-attention-mechanism-step-by-step-guide\/","title":{"rendered":"Neural Machine Translation With Attention Mechanism: Step-by-step Guide"},"content":{"rendered":"<p>Author: Olha Zhydik<\/p>\n<div>\n<p><a href=\"https:\/\/storage.ning.com\/topology\/rest\/1.0\/file\/get\/3429611682?profile=original\" target=\"_blank\" rel=\"noopener noreferrer\"><img decoding=\"async\" src=\"https:\/\/storage.ning.com\/topology\/rest\/1.0\/file\/get\/3429611682?profile=RESIZE_710x\" class=\"align-center\"><\/a><\/p>\n<\/p>\n<p>Neural networks have made significant leaps in the<span>\u00a0<\/span><a href=\"https:\/\/eleks.com\/expertise\/data-science\/\" target=\"_blank\" rel=\"noopener noreferrer\">image and natural language processing<\/a><span>\u00a0<\/span>(NLP) recently. They\u2019ve not only learned to recognise, localise and segment images; they\u2019re now able to effectively translate natural language and answer complex questions. One of the precursors to such massive progress was the introduction of Seq2Seq and<span>\u00a0<\/span><a href=\"https:\/\/eleks.com\/blog\/attention-models-amplifying-machine-learning-benefits\/\" target=\"_blank\" rel=\"noopener noreferrer\">neural attention models<\/a><span>\u00a0<\/span>\u2013 enabling neural networks to become more selective about the data they\u2019re working with at any given time.<\/p>\n<p>One of the precursors to such massive progress was the introduction of Seq2Seq and neural attention models \u2013 enabling neural networks to become more selective about the data they\u2019re working with at any given time. The core focus of the neural attention mechanism is to learn to recognise where to find important information.<\/p>\n<p>In this blog, we describe the most promising real-life use cases for neural machine translation, with a link to an extended tutorial on neural machine translation with attention mechanism algorithm<\/p>\n<p>The core focus of the neural attention mechanism is to learn to recognise where to find important information. Here\u2019s an example of a neural machine translation:<\/p>\n<p><a href=\"https:\/\/labs.eleks.com\/wp-content\/uploads\/2019\/06\/image5.gif\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-11577 size-full\" src=\"https:\/\/labs.eleks.com\/wp-content\/uploads\/2019\/06\/image5.gif\" alt=\"neural machine translation\" width=\"800\" height=\"407\"><\/a><\/p>\n<p>Source:<span>\u00a0<\/span><a href=\"https:\/\/google.github.io\/seq2seq\/\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">Google seq2seq<\/a><\/p>\n<h4>The cycle runs as follows:<\/h4>\n<ul>\n<li>The words from the input sentence are fed into the encoder to deliver the sentence meaning; the so-called \u2018thought vector\u2019.<\/li>\n<li>Based on this vector, the decoder produces words one by one to create the output sentence.<\/li>\n<li>Throughout this process, the attention mechanism helps the decoder focus on different fragments of the input sentence.<\/li>\n<\/ul>\n<h4>Neural machine translation\u2019s current success can be attributed to:<\/h4>\n<ol>\n<li>Sepp Hochreiter and Jurgen Schmidhuber\u2019s 1997 creation of the LSTM (long short term memory) neural cell. This presented the opportunity to work with relatively long sequences, using a machine learning paradigm.<\/li>\n<li>The realisation of sequence-to-sequence (Sutskever et al., 2014, Cho et al., 2014), based on LSTM. The concept being to \u201ceat\u201d part of a sequence and \u201creturn\u201d another.<\/li>\n<li>The creation of the \u2018attention mechanism\u2019, first introduced by Bahdanau et al., 2015.<\/li>\n<\/ol>\n<p>But why is this so technologically important? In this blog, we describe the most promising real-life use cases for neural machine translation, with a link to an extended tutorial on neural machine translation with attention mechanism algorithm.<\/p>\n<h2>Seq2Seq algorithm\u2019s real-world applications<\/h2>\n<p>The Seq2Seq algorithm can perform several core tasks, all of them grounded in \u2018translation\u2019 but each with distinct differences. Let\u2019s take a closer look at some of them.<\/p>\n<h3>Neural machine translation<\/h3>\n<p>Machine translation took a huge step forward in 2017, with the introduction of a bidirectional residual Seq2Seq (sequence-to-sequence) neural network, complete with an attention mechanism. The mechanism\u2019s role is to determine the importance of each word in the input sentence, then to extract additional context around each word. It\u2019s thanks to this development that modern tools are now able to produce high-quality translations of lengthy, complex sentences.<\/p>\n<p><a href=\"https:\/\/labs.eleks.com\/wp-content\/uploads\/2019\/06\/TensorFlow-seq2seq-tutorial-compressor.png\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-11591 size-full\" src=\"https:\/\/labs.eleks.com\/wp-content\/uploads\/2019\/06\/TensorFlow-seq2seq-tutorial-compressor.png\" alt=\"TensorFlow seq2seq tutorial\" width=\"666\" height=\"494\"><\/a><\/p>\n<p>Source:<span>\u00a0<\/span><a href=\"https:\/\/google.github.io\/seq2seq\/\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">TensorFlow seq2seq tutorial<\/a><\/p>\n<p>We can peek under the hood of Google Translate, for one of the best illustrations of neural attention in practice.<\/p>\n<p>The above is a prime example of the distribution of attention when the neural network translates English into French. The decoder\u2019s language model and the attention mechanism were taught the correct word sequence in the output sentence.<\/p>\n<p><a href=\"https:\/\/labs.eleks.com\/wp-content\/uploads\/2019\/06\/Bahdanau-compressor.png\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-11592 size-large\" src=\"https:\/\/labs.eleks.com\/wp-content\/uploads\/2019\/06\/Bahdanau-compressor-968x1024.png\" alt=\"Bahdanau\" width=\"968\" height=\"1024\"><\/a><\/p>\n<p>Source:<span>\u00a0<\/span><a href=\"https:\/\/arxiv.org\/abs\/1409.0473\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">Bahdanau et al., 2015<\/a><\/p>\n<h3>Text summarisation<\/h3>\n<p>Annotating text and articles is a laborious process, especially if the data\u2019s vast and heterogeneous. Attention models can be used pinpoint the most important textual elements and compose a meaningful headline, allowing the reader to skim the text and still capture the basic meaning. What\u2019s more, text summarisation can do this almost instantly. And it can be used to generate titles for web pages and perform high-level information research, or information segmentation, for rapid reading.<\/p>\n<p><a href=\"https:\/\/labs.eleks.com\/wp-content\/uploads\/2019\/06\/Text-summarisation-with-TensorFlow-compressor.png\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-11593 size-large\" src=\"https:\/\/labs.eleks.com\/wp-content\/uploads\/2019\/06\/Text-summarisation-with-TensorFlow-compressor-1024x509.png\" alt=\"Text summarisation with TensorFlow\" width=\"1024\" height=\"509\"><\/a><\/p>\n<p>Source:<span>\u00a0<\/span><a href=\"https:\/\/ai.googleblog.com\/2016\/08\/text-summarization-with-tensorflow.html\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">Text summarisation with TensorFlow<\/a><\/p>\n<h3>Chatbots with a question-answering capabilities<\/h3>\n<p>In the constant quest for efficiency, businesses are trying to automate as many routine processes as possible. As yet, however, the perfect tool for human-machine interaction hasn\u2019t been created. Natural language processing (NLP) isn\u2019t yet flawless but, with the addition of the attention mechanism,<span>\u00a0<\/span><a href=\"https:\/\/labs.eleks.com\/2018\/02\/how-to-build-nlp-engine-that-wont-screw-up.html\" target=\"_blank\" rel=\"noopener noreferrer\">its accuracy is greatly improved<\/a>.<\/p>\n<p>An attention mechanism can detect the most significant (key) words from all kinds of questions \u2013 even those that are lengthy and complex \u2013 to produce the right answer. And the mechanism can be implemented as an add-on, to work in conjunction with the neural network on the common knowledge base. With chatbots, the mechanism transcends machine translation and takes on a higher level of abstraction \u2013 allowing it to translate one verbal sequence into another.<\/p>\n<h3>Natural language image captioning (Img2Seq)<\/h3>\n<p>The idea here is the same as it is for image recognition. The difference, however, is that to caption the image the attention heat map changes, depending on each word in the focus sentence.<\/p>\n<p><a href=\"https:\/\/labs.eleks.com\/wp-content\/uploads\/2019\/06\/Neural-Image-Caption-compressor.png\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-11594 size-large\" src=\"https:\/\/labs.eleks.com\/wp-content\/uploads\/2019\/06\/Neural-Image-Caption-compressor-1024x423.png\" alt=\"Neural Image Caption\" width=\"1024\" height=\"423\"><\/a><\/p>\n<p>Source:<span>\u00a0<\/span><a href=\"https:\/\/arxiv.org\/pdf\/1502.03044.pdf\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">Show, Attend and Tell: Neural Image Caption Generation with Visual Attention<\/a><\/p>\n<p>A neural network can \u2018translate\u2019 everything it sees in the image into words. The above example shows us how the network distributes its attention while formulating the description.<\/p>\n<p>This image captioning functionality has big practical potential in the real world; from automating hashtags and subtitle creation, to writing descriptions for the visually impaired \u2013 even producing daily surveillance reports for security firms.<\/p>\n<p><a href=\"https:\/\/labs.eleks.com\/wp-content\/uploads\/2019\/06\/Neural-Image-Caption-Generation-1.png\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-11595 size-large\" src=\"https:\/\/labs.eleks.com\/wp-content\/uploads\/2019\/06\/Neural-Image-Caption-Generation-1-1024x443.png\" alt=\"Neural Image Caption Generation with Visual Attention\" width=\"1024\" height=\"443\"><\/a><\/p>\n<p>Source:<span>\u00a0<\/span><a href=\"https:\/\/arxiv.org\/pdf\/1502.03044.pdf\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">Show, Attend and Tell: Neural Image Caption Generation with Visual Attention<\/a><\/p>\n<h3>CAPTCHA solving<\/h3>\n<p>The attention mechanism turned out to be hugely successful in solving CAPTCHAs based on recognition and segmentation of noisy\/distorted pictures, with subsequent text input. This functionality finds a potential application with chatbots that have to parse third-party sites to answer a query.<\/p>\n<p><a href=\"https:\/\/labs.eleks.com\/wp-content\/uploads\/2019\/06\/CAPTCHA-solving.gif\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-11583 size-full\" src=\"https:\/\/labs.eleks.com\/wp-content\/uploads\/2019\/06\/CAPTCHA-solving.gif\" alt=\"CAPTCHA solving\" width=\"320\" height=\"116\"><\/a><\/p>\n<h3>Image-based question-answering systems<\/h3>\n<p>Like conventional linguistic question-answering systems, image-based question-answering functionality takes a natural language input but, instead of accessing the knowledge base, it uses the attention mechanism to find the answer within the image.<\/p>\n<p><a href=\"https:\/\/labs.eleks.com\/wp-content\/uploads\/2019\/06\/Stacked-Attention-Networks-for-Image-Question-Answering-compressor.png\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-11596 size-full\" src=\"https:\/\/labs.eleks.com\/wp-content\/uploads\/2019\/06\/Stacked-Attention-Networks-for-Image-Question-Answering-compressor.png\" alt=\"Stacked Attention Networks for Image Question Answering\" width=\"712\" height=\"656\"><\/a><\/p>\n<p>Source:<span>\u00a0<\/span><a href=\"https:\/\/arxiv.org\/pdf\/1511.02274.pdf\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">Stacked Attention Networks for Image Question Answering<\/a><\/p>\n<h2>TensorFlow neural machine translation Seq2Seq with attention mechanism: A step-by-step guide<\/h2>\n<p>There are many online tutorials covering neural machine translation, including the official<span>\u00a0<\/span><a href=\"https:\/\/github.com\/tensorflow\/nmt\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">TensorFlow<\/a>and<span>\u00a0<\/span><a href=\"https:\/\/pytorch.org\/tutorials\/intermediate\/seq2seq_translation_tutorial.html\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">PyTorch<\/a><span>\u00a0<\/span>tutorials. However, what neither of these addresses is the implementation of the attention mechanism (using only attention wrapper), which is a pivotal component of modern neural translation.<\/p>\n<p><a href=\"https:\/\/github.com\/mikonst\/seq2seq-attention-tensorflow\/blob\/master\/seq2seq_att.ipynb\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">Here\u2019s the link to our tutorial on neural machine translation<\/a>, based on modern Seq2Seq with attention mechanism algorithm built from scratch. By comparison to what\u2019s out there, this should offer an in-depth overview of all aspects of seq2seq, including attention algorithm.<\/p>\n<p>We used the TensorFlow framework to offer a usable, low-level working example of the concept, based on the<span>\u00a0<\/span><a href=\"https:\/\/github.com\/ematvey\/tensorflow-seq2seq-tutorials\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">Dynamic Seq2Seq in TensorFlow tutorial<\/a>. And we aim to make it as good as the original PyTorch version.<\/p>\n<p>We hope this will help you get the most out of your machine translation projects and, ultimately, pay dividends for your outcomes. Feel free to leave any questions and feedback in the comments box below or, if you\u2019d like to discuss how Data Science and neural machine translation can help you address your needs,<span>\u00a0<\/span><a href=\"https:\/\/eleks.com\/contact-us\/\" target=\"_blank\" rel=\"noopener noreferrer\">get in touch with us<\/a>!<\/p>\n<p><em>By Michael Konstantinov<\/em><br \/><em>Deep Learning Specialist at Eleks<\/em><\/p>\n<p><a href=\"https:\/\/labs.eleks.com\/2019\/06\/neural-machine-translation-attention-mechanism.html?utm_source=DataScienceCentral%20&#038;utm_medium=refferal&#038;utm_campaign=REpubl-NeuralMT-Labs\" target=\"_blank\" rel=\"noopener noreferrer\">Read the full story here.<\/a><\/p>\n<\/div>\n<p><a href=\"https:\/\/www.datasciencecentral.com\/xn\/detail\/6448529:BlogPost:869809\">Go to Source<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Author: Olha Zhydik Neural networks have made significant leaps in the\u00a0image and natural language processing\u00a0(NLP) recently. They\u2019ve not only learned to recognise, localise and segment [&hellip;] <span class=\"read-more-link\"><a class=\"read-more\" href=\"https:\/\/www.aiproblog.com\/index.php\/2019\/08\/23\/neural-machine-translation-with-attention-mechanism-step-by-step-guide\/\">Read More<\/a><\/span><\/p>\n","protected":false},"author":1,"featured_media":459,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_bbp_topic_count":0,"_bbp_reply_count":0,"_bbp_total_topic_count":0,"_bbp_total_reply_count":0,"_bbp_voice_count":0,"_bbp_anonymous_reply_count":0,"_bbp_topic_count_hidden":0,"_bbp_reply_count_hidden":0,"_bbp_forum_subforum_count":0,"footnotes":""},"categories":[26],"tags":[],"_links":{"self":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/2493"}],"collection":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/comments?post=2493"}],"version-history":[{"count":0,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/2493\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media\/461"}],"wp:attachment":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media?parent=2493"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/categories?post=2493"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/tags?post=2493"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}