{"id":5474,"date":"2022-03-09T06:29:22","date_gmt":"2022-03-09T06:29:22","guid":{"rendered":"https:\/\/www.aiproblog.com\/index.php\/2022\/03\/09\/the-transformer-positional-encoding-layer-in-keras-part-2\/"},"modified":"2022-03-09T06:29:22","modified_gmt":"2022-03-09T06:29:22","slug":"the-transformer-positional-encoding-layer-in-keras-part-2","status":"publish","type":"post","link":"https:\/\/www.aiproblog.com\/index.php\/2022\/03\/09\/the-transformer-positional-encoding-layer-in-keras-part-2\/","title":{"rendered":"The Transformer Positional Encoding Layer in Keras, Part 2"},"content":{"rendered":"<p>Author: Mehreen Saeed<\/p>\n<div>\n<p>In <a href=\"https:\/\/machinelearningmastery.com\/a-gentle-introduction-to-positional-encoding-in-transformer-models-part-1\">part 1: A gentle introduction to positional encoding in transformer models<\/a>, we discussed the positional encoding layer of the transformer model. We also showed how you can implement this layer and its functions yourself in Python. In this tutorial, we\u2019ll implement the positional encoding layer in Keras and Tensorflow. You can then use this layer in a complete transformer model.<\/p>\n<p>After completing this tutorial, you will know:<\/p>\n<ul>\n<li>Text vectorization in Keras<\/li>\n<li>Embedding layer in Keras<\/li>\n<li>How to subclass the embedding layer and write your own positional encoding layer.<\/li>\n<\/ul>\n<p>Let\u2019s get started.<\/p>\n<div id=\"attachment_13252\" style=\"width: 527px\" class=\"wp-caption aligncenter\">\n<a href=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/02\/ijaz-rafi-photo-1551102076-9f8bb5f3f897.jpg\"><img decoding=\"async\" aria-describedby=\"caption-attachment-13252\" loading=\"lazy\" class=\"wp-image-13252 \" src=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/02\/ijaz-rafi-photo-1551102076-9f8bb5f3f897-300x200.jpg\" alt=\"\" width=\"517\" height=\"344\" srcset=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/02\/ijaz-rafi-photo-1551102076-9f8bb5f3f897-300x200.jpg 300w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/02\/ijaz-rafi-photo-1551102076-9f8bb5f3f897-1024x683.jpg 1024w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/02\/ijaz-rafi-photo-1551102076-9f8bb5f3f897-768x512.jpg 768w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/02\/ijaz-rafi-photo-1551102076-9f8bb5f3f897-1536x1024.jpg 1536w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/02\/ijaz-rafi-photo-1551102076-9f8bb5f3f897-2048x1365.jpg 2048w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/02\/ijaz-rafi-photo-1551102076-9f8bb5f3f897-600x400.jpg 600w\" sizes=\"(max-width: 517px) 100vw, 517px\"><\/a><\/p>\n<p id=\"caption-attachment-13252\" class=\"wp-caption-text\">The Transformer Positional Encoding Layer in Keras, Part 2. <br \/>Photo by Ijaz Rafi. Some rights reserved<\/p>\n<\/div>\n<h2>Tutorial Overview<\/h2>\n<p>This tutorial is divided into 3 parts; they are:<\/p>\n<ol>\n<li>Text vectorization and embedding layer in Keras<\/li>\n<li>Writing your own positional encoding layer in Keras\n<ol>\n<li>Randomly initialized and tunable embeddings<\/li>\n<li>Fixed weight embeddings from <a href=\"https:\/\/arxiv.org\/abs\/1706.03762\" target=\"_blank\" rel=\"nofollow noopener\">Attention is All You Need<\/a>\n<\/li>\n<\/ol>\n<\/li>\n<li>Graphical view of the output of the positional encoding layer<\/li>\n<\/ol>\n<h2 class=\"text-cell-section-header\">The Import Section<\/h2>\n<p>First let\u2019s write the section to import all the required libraries:<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">import tensorflow as tf\r\nfrom tensorflow import convert_to_tensor, string\r\nfrom tensorflow.keras.layers import TextVectorization, Embedding, Layer\r\nfrom tensorflow.data import Dataset\r\nimport numpy as np\r\nimport matplotlib.pyplot as plt<\/pre>\n<\/p>\n<h2 class=\"text-cell-section-header\">The Text Vectorization Layer<\/h2>\n<p>We\u2019ll start with a set of English phrases, which are already preprocessed and cleaned. The text vectorization layer creates a dictionary of words and replaces each word by its corresponding index in the dictionary. Let\u2019s see how we can map these two sentences using the text vectorization layer:<\/p>\n<ol>\n<li>I am a robot<\/li>\n<li>you too robot<\/li>\n<\/ol>\n<p>Note we have already converted the text to lowercase and removed all the punctuations and noise in text. We\u2019ll convert these two phrases to vectors of a fixed length 5. The <code>TextVectorization<\/code> layer of Keras requires a maximum vocabulary size and the required length of output sequence for initialization. The output of the layer is a tensor of shape:<\/p>\n<p><code>(number of sentences, output sequence length)<\/code><\/p>\n<p>The following code snippet uses the <code>adapt<\/code> method to generate a vocabulary. It next creates a vectorized representation of text.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">output_sequence_length = 5\r\nvocab_size = 10\r\nsentences = [[\"I am a robot\"], [\"you too robot\"]]\r\nsentence_data = Dataset.from_tensor_slices(sentences)\r\n# Create the TextVectorization layer\r\nvectorize_layer = TextVectorization(\r\n                  output_sequence_length=output_sequence_length,\r\n                  max_tokens=vocab_size)\r\n# Train the layer to create a dictionary\r\nvectorize_layer.adapt(sentence_data)\r\n# Convert all sentences to tensors\r\nword_tensors = convert_to_tensor(sentences, dtype=tf.string)\r\n# Use the word tensors to get vectorized phrases\r\nvectorized_words = vectorize_layer(word_tensors)\r\nprint(\"Vocabulary: \", vectorize_layer.get_vocabulary())\r\nprint(\"Vectorized words: \", vectorized_words)<\/pre>\n<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">Vocabulary:  ['', '[UNK]', 'robot', 'you', 'too', 'i', 'am', 'a']\r\nVectorized words:  tf.Tensor(\r\n[[5 6 7 2 0]\r\n [3 4 2 0 0]], shape=(2, 5), dtype=int64)<\/pre>\n<\/p>\n<h2 class=\"text-cell-section-header\">The Embedding Layer<\/h2>\n<p>The Keras <code>Embedding<\/code> layer converts integers to dense vectors. This layer maps these integers to random numbers, which are later tuned during the training phase. However, you also have the option to set the mapping to some predefined weight values (shown later). To initialize this layer, we need to specify the maximum value of an integer to map, along with the length of the output sequence.<\/p>\n<h3>The Word Embeddings<\/h3>\n<p>Let\u2019s see how the layer converts our <code>vectorized_text<\/code> to tensors.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">output_length = 6\r\nword_embedding_layer = Embedding(vocab_size, output_length)\r\nembedded_words = word_embedding_layer(vectorized_words)\r\nprint(embedded_words)<\/pre>\n<p>I have annotated the output with my comments as shown below. Note, you will see a different output every time you run this code because the weights have been initialized randomly.<\/p>\n<div id=\"attachment_13251\" style=\"width: 1256px\" class=\"wp-caption aligncenter\">\n<a href=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/02\/PEKeras_a.png\"><img decoding=\"async\" aria-describedby=\"caption-attachment-13251\" loading=\"lazy\" class=\"wp-image-13251 size-full\" src=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/02\/PEKeras_a.png\" alt=\"Word embeddings.\" width=\"1246\" height=\"740\" srcset=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/02\/PEKeras_a.png 1246w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/02\/PEKeras_a-300x178.png 300w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/02\/PEKeras_a-1024x608.png 1024w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/02\/PEKeras_a-768x456.png 768w\" sizes=\"(max-width: 1246px) 100vw, 1246px\"><\/a><\/p>\n<p id=\"caption-attachment-13251\" class=\"wp-caption-text\">Word Embeddings. This output will be different every time you run the code because of the random numbers involved.<\/p>\n<\/div>\n<h3 class=\"text-cell-section-header\">The Position Embeddings<\/h3>\n<p>We also need the embeddings for the corresponding positions. The maximum positions correspond to the output sequence length of the <code>TextVectorization<\/code> layer.<\/p>\n<div>\n<div>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">position_embedding_layer = Embedding(output_sequence_length, output_length)\r\nposition_indices = tf.range(output_sequence_length)\r\nembedded_indices = position_embedding_layer(position_indices)\r\nprint(embedded_indices)<\/pre>\n<p>\nThe output is shown below:<\/p>\n<div id=\"attachment_13250\" style=\"width: 1486px\" class=\"wp-caption aligncenter\">\n<a href=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/02\/PEKeras_b.png\"><img decoding=\"async\" aria-describedby=\"caption-attachment-13250\" loading=\"lazy\" class=\"wp-image-13250 size-full\" src=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/02\/PEKeras_b.png\" alt=\"Position Indices Embedding. \" width=\"1476\" height=\"244\" srcset=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/02\/PEKeras_b.png 1476w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/02\/PEKeras_b-300x50.png 300w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/02\/PEKeras_b-1024x169.png 1024w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/02\/PEKeras_b-768x127.png 768w\" sizes=\"(max-width: 1476px) 100vw, 1476px\"><\/a><\/p>\n<p id=\"caption-attachment-13250\" class=\"wp-caption-text\">Position Indices Embedding.<\/p>\n<\/div>\n<\/div>\n<div>\n<h3 class=\"text-cell-section-header\">The Output of Positional Encoding Layer in Transformers<\/h3>\n<p>In a transformer model the final output is the sum of both the word embeddings and the position embeddings. Hence, when you set up both embedding layers, you need to make sure that the <code>output_length<\/code> is the same for both.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">final_output_embedding = embedded_words + embedded_indices\r\nprint(\"Final output: \", final_output_embedding)<\/pre>\n<p>The output is shown below, annotated with my comments. Again, this will be different from your run of the code because of the random weight initialization.<\/p>\n<div id=\"attachment_13249\" style=\"width: 1254px\" class=\"wp-caption aligncenter\">\n<a href=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/02\/PEKeras_c.png\"><img decoding=\"async\" aria-describedby=\"caption-attachment-13249\" loading=\"lazy\" class=\"wp-image-13249 size-full\" src=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/02\/PEKeras_c.png\" alt=\"\" width=\"1244\" height=\"700\" srcset=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/02\/PEKeras_c.png 1244w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/02\/PEKeras_c-300x169.png 300w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/02\/PEKeras_c-1024x576.png 1024w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/02\/PEKeras_c-768x432.png 768w\" sizes=\"(max-width: 1244px) 100vw, 1244px\"><\/a><\/p>\n<p id=\"caption-attachment-13249\" class=\"wp-caption-text\">The Final Output After Adding Word Embedding and Position Embedding<\/p>\n<\/div>\n<h2 class=\"text-cell-section-header\">SubClassing the Keras Embedding Layer<\/h2>\n<p>When implementing a transformer model, you\u2019ll have to write your own position encoding layer. This is quite simple as the basic functionality is already provided for you. This\u00a0<a href=\"https:\/\/keras.io\/examples\/nlp\/neural_machine_translation_with_transformer\/\" target=\"_blank\" rel=\"nofollow noopener\">Keras example<\/a> shows how you can subclass the <code>Embedding<\/code> layer to implement your own functionality. You can add more methods to it as you require.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">class PositionEmbeddingLayer(Layer):\r\n    def __init__(self, sequence_length, vocab_size, output_dim, **kwargs):\r\n        super(PositionEmbeddingLayer, self).__init__(**kwargs)\r\n        self.word_embedding_layer = Embedding(\r\n            input_dim=vocab_size, output_dim=output_dim\r\n        )\r\n        self.position_embedding_layer = Embedding(\r\n            input_dim=sequence_length, output_dim=output_dim\r\n        )\r\n\r\n    def call(self, inputs):        \r\n        position_indices = tf.range(tf.shape(inputs)[-1])\r\n        embedded_words = self.word_embedding_layer(inputs)\r\n        embedded_indices = self.position_embedding_layer(position_indices)\r\n        return embedded_words + embedded_indices<\/pre>\n<p>Let\u2019s run this layer.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">my_embedding_layer = PositionEmbeddingLayer(output_sequence_length,\r\n                                            vocab_size, output_length)\r\nembedded_layer_output = my_embedding_layer(vectorized_words)\r\nprint(\"Output from my_embedded_layer: \", embedded_layer_output)<\/pre>\n<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">Output from my_embedded_layer:  tf.Tensor(\r\n[[[ 0.06798736 -0.02821309  0.00571618  0.00314623 -0.03060734\r\n    0.01111387]\r\n  [-0.06097465  0.03966043 -0.05164248  0.06578685  0.03638128\r\n   -0.03397174]\r\n  [ 0.06715029 -0.02453769  0.02205854  0.01110986  0.02345785\r\n    0.05879898]\r\n  [-0.04625867  0.07500569 -0.05690887 -0.07615659  0.01962536\r\n    0.00035865]\r\n  [ 0.01423577 -0.03938593 -0.08625181  0.04841495  0.06951572\r\n    0.08811047]]\r\n\r\n [[ 0.0163899   0.06895607 -0.01131684  0.01810524 -0.05857501\r\n    0.01811318]\r\n  [ 0.01915303 -0.0163289  -0.04133433  0.06810946  0.03736673\r\n    0.04218033]\r\n  [ 0.00795418 -0.00143972 -0.01627307 -0.00300788 -0.02759011\r\n    0.09251165]\r\n  [ 0.0028762   0.04526488 -0.05222676 -0.02007698  0.07879823\r\n    0.00541583]\r\n  [ 0.01423577 -0.03938593 -0.08625181  0.04841495  0.06951572\r\n    0.08811047]]], shape=(2, 5, 6), dtype=float32)<\/pre>\n<\/p>\n<div>\n<h2>Positional Encoding in Transformers: Attention is All You Need<\/h2>\n<div>Note, the above class creates an embedding layer that has trainable weights. Hence, the weights are initialized randomly and tuned in the training phase.<\/div>\n<div>The authors of <a href=\"https:\/\/arxiv.org\/abs\/1706.03762\">Attention is All You Need<\/a> have specified a positional encoding scheme as shown below. You can read the full details in <a href=\"https:\/\/machinelearningmastery.com\/a-gentle-introduction-to-positional-encoding-in-transformer-models-part-1\">part 1<\/a> of this tutorial:<\/div>\n<\/div>\n<\/div>\n<\/div>\n<div><\/div>\n<div>\n<div>begin{eqnarray}<br \/>\nP(k, 2i) &amp;=&amp; sinBig(frac{k}{n^{2i\/d}}Big)\\<br \/>\nP(k, 2i+1) &amp;=&amp; cosBig(frac{k}{n^{2i\/d}}Big)<br \/>\nend{eqnarray}<\/div>\n<div><\/div>\n<\/div>\n<div>\n<div>\n<div><\/div>\n<div>If you want to use the same positional encoding scheme, you can specify your own embedding matrix as discussed in <a href=\"https:\/\/machinelearningmastery.com\/a-gentle-introduction-to-positional-encoding-in-transformer-models-part-1\">part 1<\/a>, which shows how to create your own embeddings in NumPy. When specifying the <code>Embedding<\/code> layer, you need to provide the positional encoding matrix as weights along with <code>trainable=False<\/code>. Let\u2019s create another positional embedding class that does exactly this.<\/div>\n<\/div>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">class PositionEmbeddingFixedWeights(Layer):\r\n    def __init__(self, sequence_length, vocab_size, output_dim, **kwargs):\r\n        super(PositionEmbeddingFixedWeights, self).__init__(**kwargs)\r\n        word_embedding_matrix = self.get_position_encoding(vocab_size, output_dim)   \r\n        position_embedding_matrix = self.get_position_encoding(sequence_length, output_dim)                                          \r\n        self.word_embedding_layer = Embedding(\r\n            input_dim=vocab_size, output_dim=output_dim,\r\n            weights=[word_embedding_matrix],\r\n            trainable=False\r\n        )\r\n        self.position_embedding_layer = Embedding(\r\n            input_dim=sequence_length, output_dim=output_dim,\r\n            weights=[position_embedding_matrix],\r\n            trainable=False\r\n        )\r\n             \r\n    def get_position_encoding(self, seq_len, d, n=10000):\r\n        P = np.zeros((seq_len, d))\r\n        for k in range(seq_len):\r\n            for i in np.arange(int(d\/2)):\r\n                denominator = np.power(n, 2*i\/d)\r\n                P[k, 2*i] = np.sin(k\/denominator)\r\n                P[k, 2*i+1] = np.cos(k\/denominator)\r\n        return P\r\n\r\n\r\n    def call(self, inputs):        \r\n        position_indices = tf.range(tf.shape(inputs)[-1])\r\n        embedded_words = self.word_embedding_layer(inputs)\r\n        embedded_indices = self.position_embedding_layer(position_indices)\r\n        return embedded_words + embedded_indices<\/pre>\n<p>Next, we set up everything to run this layer.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">attnisallyouneed_embedding = PositionEmbeddingFixedWeights(output_sequence_length,\r\n                                            vocab_size, output_length)\r\nattnisallyouneed_output = attnisallyouneed_embedding(vectorized_words)\r\nprint(\"Output from my_embedded_layer: \", attnisallyouneed_output)<\/pre>\n<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">Output from my_embedded_layer:  tf.Tensor(\r\n[[[-0.9589243   1.2836622   0.23000172  1.9731903   0.01077196\r\n    1.9999421 ]\r\n  [ 0.56205547  1.5004725   0.3213085   1.9603932   0.01508068\r\n    1.9999142 ]\r\n  [ 1.566284    0.3377554   0.41192317  1.9433732   0.01938933\r\n    1.999877  ]\r\n  [ 1.0504174  -1.4061394   0.2314966   1.9860148   0.01077211\r\n    1.9999698 ]\r\n  [-0.7568025   0.3463564   0.18459873  1.982814    0.00861763\r\n    1.9999628 ]]\r\n\r\n [[ 0.14112     0.0100075   0.1387981   1.9903207   0.00646326\r\n    1.9999791 ]\r\n  [ 0.08466846 -0.11334133  0.23099795  1.9817369   0.01077207\r\n    1.9999605 ]\r\n  [ 1.8185948  -0.8322937   0.185397    1.9913884   0.00861771\r\n    1.9999814 ]\r\n  [ 0.14112     0.0100075   0.1387981   1.9903207   0.00646326\r\n    1.9999791 ]\r\n  [-0.7568025   0.3463564   0.18459873  1.982814    0.00861763\r\n    1.9999628 ]]], shape=(2, 5, 6), dtype=float32)<\/pre>\n<\/p>\n<h2 class=\"text-cell-section-header\">Visualizing the Final Embedding<\/h2>\n<p>In order to visualize the embeddings, let\u2019s take two bigger sentences, one technical and the other one just a quote. We\u2019ll set up the <code>TextVectorization<\/code> layer along with the positional encoding layer and see what the final output looks like.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">technical_phrase = \"to understand machine learning algorithms you need\" +\r\n                   \" to understand concepts such as gradient of a function \"+\r\n                   \"Hessians of a matrix and optimization etc\"\r\nwise_phrase = \"patrick henry said give me liberty or give me death \"+\r\n              \"when he\u00a0addressed\u00a0the second virginia convention in march\"\r\n\r\ntotal_vocabulary = 200\r\nsequence_length = 20\r\nfinal_output_len = 50\r\nphrase_vectorization_layer = TextVectorization(\r\n                  output_sequence_length=sequence_length,\r\n                  max_tokens=total_vocabulary)\r\n# Learn the dictionary\r\nphrase_vectorization_layer.adapt([technical_phrase, wise_phrase])\r\n# Convert all sentences to tensors\r\nphrase_tensors = convert_to_tensor([technical_phrase, wise_phrase], \r\n                                   dtype=tf.string)\r\n# Use the word tensors to get vectorized phrases\r\nvectorized_phrases = phrase_vectorization_layer(phrase_tensors)\r\n\r\nrandom_weights_embedding_layer = PositionEmbeddingLayer(sequence_length, \r\n                                                        total_vocabulary,\r\n                                                        final_output_len)\r\nfixed_weights_embedding_layer = PositionEmbeddingFixedWeights(sequence_length, \r\n                                                        total_vocabulary,\r\n                                                        final_output_len)\r\nrandom_embedding = random_weights_embedding_layer(vectorized_phrases)\r\nfixed_embedding = fixed_weights_embedding_layer(vectorized_phrases)<\/pre>\n<p>Now let\u2019s see what the random embeddings look like for both phrases.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">fig = plt.figure(figsize=(15, 5))    \r\ntitle = [\"Tech Phrase\", \"Wise Phrase\"]\r\nfor i in range(2):\r\n    ax = plt.subplot(1, 2, 1+i)\r\n    matrix = tf.reshape(random_embedding[i, :, :], (sequence_length, final_output_len))\r\n    cax = ax.matshow(matrix)\r\n    plt.gcf().colorbar(cax)   \r\n    plt.title(title[i], y=1.2)\r\nfig.suptitle(\"Random Embedding\")\r\nplt.show()<\/pre>\n<\/p>\n<div id=\"attachment_13248\" style=\"width: 881px\" class=\"wp-caption aligncenter\">\n<a href=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/02\/PEKeras1.png\"><img decoding=\"async\" aria-describedby=\"caption-attachment-13248\" loading=\"lazy\" class=\"wp-image-13248 size-full\" src=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/02\/PEKeras1.png\" alt=\"Random embeddings\" width=\"871\" height=\"322\" srcset=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/02\/PEKeras1.png 871w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/02\/PEKeras1-300x111.png 300w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/02\/PEKeras1-768x284.png 768w\" sizes=\"(max-width: 871px) 100vw, 871px\"><\/a><\/p>\n<p id=\"caption-attachment-13248\" class=\"wp-caption-text\">Random Embeddings<\/p>\n<\/div>\n<p>\u00a0<\/p>\n<p>The embedding from the fixed weights layer are visualized below.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">fig = plt.figure(figsize=(15, 5))    \r\ntitle = [\"Tech Phrase\", \"Wise Phrase\"]\r\nfor i in range(2):\r\n    ax = plt.subplot(1, 2, 1+i)\r\n    matrix = tf.reshape(fixed_embedding[i, :, :], (sequence_length, final_output_len))\r\n    cax = ax.matshow(matrix)\r\n    plt.gcf().colorbar(cax)   \r\n    plt.title(title[i], y=1.2)\r\nfig.suptitle(\"Fixed Weight Embedding from Attention is All You Need\")\r\nplt.show()<\/pre>\n<\/p>\n<div id=\"attachment_13247\" style=\"width: 869px\" class=\"wp-caption aligncenter\">\n<a href=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/02\/PEKeras2.png\"><img decoding=\"async\" aria-describedby=\"caption-attachment-13247\" loading=\"lazy\" class=\"wp-image-13247 size-full\" src=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/02\/PEKeras2.png\" alt=\"Embedding using sinusoidal positional encoding\" width=\"859\" height=\"322\" srcset=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/02\/PEKeras2.png 859w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/02\/PEKeras2-300x112.png 300w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/02\/PEKeras2-768x288.png 768w\" sizes=\"(max-width: 859px) 100vw, 859px\"><\/a><\/p>\n<p id=\"caption-attachment-13247\" class=\"wp-caption-text\">Embedding using sinusoidal positional encoding<\/p>\n<\/div>\n<p>We can see that the embedding layer initialized using the default parameter outputs random values. On the other hand, the fixed weights generated using sinusoids create a unique signature for every phrase with information on each word position encoded within it.<\/p>\n<p>You can experiment with both tunable or fixed weight implementations for your particular application.<\/p>\n<h2 class=\"text-cell-section-header\">Further Reading<\/h2>\n<p>This section provides more resources on the topic if you are looking to go deeper.<\/p>\n<h3>Books<\/h3>\n<ul>\n<li>\n<a href=\"https:\/\/www.amazon.com\/Transformers-Natural-Language-Processing-architectures\/dp\/1800565798\" target=\"_blank\" rel=\"nofollow noopener\">Transformers for natural language processing<\/a>, by Denis Rothman.<\/li>\n<\/ul>\n<h3>Papers<\/h3>\n<ul>\n<li>\n<a href=\"https:\/\/arxiv.org\/abs\/1706.03762\" target=\"_blank\" rel=\"nofollow noopener\">Attention Is All You Need<\/a>, 2017.<\/li>\n<\/ul>\n<h3>Articles<\/h3>\n<ul>\n<li><a href=\"https:\/\/machinelearningmastery.com\/the-transformer-attention-mechanism\/\" target=\"_blank\" rel=\"nofollow noopener\">The Transformer Attention Mechanism<\/a><\/li>\n<li><a href=\"https:\/\/machinelearningmastery.com\/the-transformer-model\/\" target=\"_blank\" rel=\"nofollow noopener\">The Transformer Model<\/a><\/li>\n<li><a href=\"https:\/\/www.tensorflow.org\/text\/tutorials\/transformer\" target=\"_blank\" rel=\"nofollow noopener\">Transformer Model for Language Understanding<\/a><\/li>\n<li><a href=\"https:\/\/blog.keras.io\/using-pre-trained-word-embeddings-in-a-keras-model.html\" target=\"_blank\" rel=\"nofollow noopener\">Using Pre-Trained Word Embeddings in a Keras Model<\/a><\/li>\n<li><a href=\"https:\/\/keras.io\/examples\/nlp\/neural_machine_translation_with_transformer\/\" target=\"_blank\" rel=\"nofollow noopener\">English-to-Spanish translation with a sequence-to-sequence Transformer<\/a><\/li>\n<li><a href=\"https:\/\/machinelearningmastery.com\/a-gentle-introduction-to-positional-encoding-in-transformer-models-part-1\">A Gentle Introduction to Positional Encoding in Transformer Models, Part 1<\/a><\/li>\n<\/ul>\n<h2>Summary<\/h2>\n<p>In this tutorial, you discovered the implementation of positional encoding layer in Keras.<\/p>\n<p>Specifically, you learned:<\/p>\n<ul>\n<li>Text vectorization layer in Keras<\/li>\n<li>Positional encoding layer in Keras<\/li>\n<li>Creating your own class for positional encoding<\/li>\n<li>Setting your own weights for the positional encoding layer in Keras<\/li>\n<\/ul>\n<p>Do you have any questions about positional encoding discussed in this post? Ask your questions in the comments below and I will do my best to answer.<\/p>\n<\/div>\n<p>The post <a rel=\"nofollow\" href=\"https:\/\/machinelearningmastery.com\/the-transformer-positional-encoding-layer-in-keras-part-2\/\">The Transformer Positional Encoding Layer in Keras, Part 2<\/a> appeared first on <a rel=\"nofollow\" href=\"https:\/\/machinelearningmastery.com\/\">Machine Learning Mastery<\/a>.<\/p>\n<\/div>\n<p><a href=\"https:\/\/machinelearningmastery.com\/the-transformer-positional-encoding-layer-in-keras-part-2\/\">Go to Source<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Author: Mehreen Saeed In part 1: A gentle introduction to positional encoding in transformer models, we discussed the positional encoding layer of the transformer model. [&hellip;] <span class=\"read-more-link\"><a class=\"read-more\" href=\"https:\/\/www.aiproblog.com\/index.php\/2022\/03\/09\/the-transformer-positional-encoding-layer-in-keras-part-2\/\">Read More<\/a><\/span><\/p>\n","protected":false},"author":1,"featured_media":5475,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_bbp_topic_count":0,"_bbp_reply_count":0,"_bbp_total_topic_count":0,"_bbp_total_reply_count":0,"_bbp_voice_count":0,"_bbp_anonymous_reply_count":0,"_bbp_topic_count_hidden":0,"_bbp_reply_count_hidden":0,"_bbp_forum_subforum_count":0,"footnotes":""},"categories":[24],"tags":[],"_links":{"self":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/5474"}],"collection":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/comments?post=5474"}],"version-history":[{"count":0,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/5474\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media\/5475"}],"wp:attachment":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media?parent=5474"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/categories?post=5474"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/tags?post=5474"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}