{"id":2191,"date":"2019-05-26T19:00:34","date_gmt":"2019-05-26T19:00:34","guid":{"rendered":"https:\/\/www.aiproblog.com\/index.php\/2019\/05\/26\/how-to-perform-object-detection-with-yolov3-in-keras\/"},"modified":"2019-05-26T19:00:34","modified_gmt":"2019-05-26T19:00:34","slug":"how-to-perform-object-detection-with-yolov3-in-keras","status":"publish","type":"post","link":"https:\/\/www.aiproblog.com\/index.php\/2019\/05\/26\/how-to-perform-object-detection-with-yolov3-in-keras\/","title":{"rendered":"How to Perform Object Detection With YOLOv3 in Keras"},"content":{"rendered":"<p>Author: Jason Brownlee<\/p>\n<div>\n<p>Object detection is a task in computer vision that involves identifying the presence, location, and type of one or more objects in a given photograph.<\/p>\n<p>It is a challenging problem that involves building upon methods for object recognition (e.g. where are they), object localization (e.g. what are their extent), and object classification (e.g. what are they).<\/p>\n<p>In recent years, deep learning techniques are achieving state-of-the-art results for object detection, such as on standard benchmark datasets and in computer vision competitions. Notable is the \u201cYou Only Look Once,\u201d or YOLO, family of Convolutional Neural Networks that achieve near state-of-the-art results with a single end-to-end model that can perform object detection in real-time.<\/p>\n<p>In this tutorial, you will discover how to develop a YOLOv3 model for object detection on new photographs.<\/p>\n<p>After completing this tutorial, you will know:<\/p>\n<ul>\n<li>YOLO-based Convolutional Neural Network family of models for object detection and the most recent variation called YOLOv3.<\/li>\n<li>The best-of-breed open source library implementation of the YOLOv3 for the Keras deep learning library.<\/li>\n<li>How to use a pre-trained YOLOv3 to perform object localization and detection on new photographs.<\/li>\n<\/ul>\n<p>Let\u2019s get started.<\/p>\n<div id=\"attachment_7715\" style=\"width: 650px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-7715\" class=\"size-full wp-image-7715\" src=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2019\/05\/How-to-Perform-Object-Detection-With-YOLOv3-in-Keras.jpg\" alt=\"How to Perform Object Detection With YOLOv3 in Keras\" width=\"640\" height=\"480\" srcset=\"http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2019\/05\/How-to-Perform-Object-Detection-With-YOLOv3-in-Keras.jpg 640w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2019\/05\/How-to-Perform-Object-Detection-With-YOLOv3-in-Keras-300x225.jpg 300w\" sizes=\"(max-width: 640px) 100vw, 640px\"><\/p>\n<p id=\"caption-attachment-7715\" class=\"wp-caption-text\">How to Perform Object Detection With YOLOv3 in Keras<br \/>Photo by <a href=\"https:\/\/www.flickr.com\/photos\/davidberkowitz\/5699832418\/\">David Berkowitz<\/a>, some rights reserved.<\/p>\n<\/div>\n<h2>Tutorial Overview<\/h2>\n<p>This tutorial is divided into three parts; they are:<\/p>\n<ol>\n<li>YOLO for Object Detection<\/li>\n<li>Experiencor YOLO3 Project<\/li>\n<li>Object Detection With YOLOv3<\/li>\n<\/ol>\n<div class=\"woo-sc-hr\"><\/div>\n<p><center><\/p>\n<h3>Want Results with Deep Learning for Computer Vision?<\/h3>\n<p>Take my free 7-day email crash course now (with sample code).<\/p>\n<p>Click to sign-up and also get a free PDF Ebook version of the course.<\/p>\n<p><a href=\"https:\/\/machinelearningmastery.lpages.co\/leadbox\/1458ca1e0972a2%3A164f8be4f346dc\/4715926590455808\/\" target=\"_blank\" style=\"background: rgb(255, 206, 10); color: rgb(255, 255, 255); text-decoration: none; font-family: Helvetica, Arial, sans-serif; font-weight: bold; font-size: 16px; line-height: 20px; padding: 10px; display: inline-block; max-width: 300px; border-radius: 5px; text-shadow: rgba(0, 0, 0, 0.25) 0px -1px 1px; box-shadow: rgba(255, 255, 255, 0.5) 0px 1px 3px inset, rgba(0, 0, 0, 0.5) 0px 1px 3px;\" rel=\"noopener noreferrer\">Download Your FREE Mini-Course<\/a><script data-leadbox=\"1458ca1e0972a2:164f8be4f346dc\" data-url=\"https:\/\/machinelearningmastery.lpages.co\/leadbox\/1458ca1e0972a2%3A164f8be4f346dc\/4715926590455808\/\" data-config=\"%7B%7D\" type=\"text\/javascript\" src=\"https:\/\/machinelearningmastery.lpages.co\/leadbox-1553357564.js\"><\/script><\/p>\n<p><\/center><\/p>\n<div class=\"woo-sc-hr\"><\/div>\n<h2>YOLO for Object Detection<\/h2>\n<p>Object detection is a computer vision task that involves both localizing one or more objects within an image and classifying each object in the image.<\/p>\n<p>It is a challenging computer vision task that requires both successful object localization in order to locate and draw a bounding box around each object in an image, and object classification to predict the correct class of object that was localized.<\/p>\n<p>The \u201c<em>You Only Look Once<\/em>,\u201d or YOLO, family of models are a series of end-to-end deep learning models designed for fast object detection, developed by <a href=\"https:\/\/pjreddie.com\/\">Joseph Redmon<\/a>, et al. and first described in the 2015 paper titled \u201c<a href=\"https:\/\/arxiv.org\/abs\/1506.02640\">You Only Look Once: Unified, Real-Time Object Detection<\/a>.\u201d<\/p>\n<p>The approach involves a single deep convolutional neural network (originally a version of GoogLeNet, later updated and called DarkNet based on VGG) that splits the input into a grid of cells and each cell directly predicts a bounding box and object classification. The result is a large number of candidate bounding boxes that are consolidated into a final prediction by a post-processing step.<\/p>\n<p>There are three main variations of the approach, at the time of writing; they are YOLOv1, YOLOv2, and YOLOv3. The first version proposed the general architecture, whereas the second version refined the design and made use of predefined anchor boxes to improve bounding box proposal, and version three further refined the model architecture and training process.<\/p>\n<p>Although the accuracy of the models is close but not as good as Region-Based Convolutional Neural Networks (R-CNNs), they are popular for object detection because of their detection speed, often demonstrated in real-time on video or with camera feed input.<\/p>\n<blockquote>\n<p>A single neural network predicts bounding boxes and class probabilities directly from full images in one evaluation. Since the whole detection pipeline is a single network, it can be optimized end-to-end directly on detection performance.<\/p>\n<\/blockquote>\n<p>\u2014 <a href=\"https:\/\/arxiv.org\/abs\/1506.02640\">You Only Look Once: Unified, Real-Time Object Detection<\/a>, 2015.<\/p>\n<p>In this tutorial, we will focus on using YOLOv3.<\/p>\n<h2>Experiencor YOLO3 for Keras Project<\/h2>\n<p>Source code for each version of YOLO is available, as well as pre-trained models.<\/p>\n<p>The official <a href=\"https:\/\/github.com\/pjreddie\/darknet\">DarkNet GitHub<\/a> repository contains the source code for the YOLO versions mentioned in the papers, written in C. The repository provides a step-by-step tutorial on how to use the code for object detection.<\/p>\n<p>It is a challenging model to implement from scratch, especially for beginners as it requires the development of many customized model elements for training and for prediction. For example, even using a pre-trained model directly requires sophisticated code to distill and interpret the predicted bounding boxes output by the model.<\/p>\n<p>Instead of developing this code from scratch, we can use a third-party implementation. There are many third-party implementations designed for using YOLO with Keras, and none appear to be standardized and designed to be used as a library.<\/p>\n<p>The <a href=\"https:\/\/github.com\/allanzelener\/YAD2K\">YAD2K project<\/a> was a de facto standard for YOLOv2 and provided scripts to convert the pre-trained weights into Keras format, use the pre-trained model to make predictions, and provided the code required to distill interpret the predicted bounding boxes. Many other third-party developers have used this code as a starting point and updated it to support YOLOv3.<\/p>\n<p>Perhaps the most widely used project for using pre-trained the YOLO models is called \u201c<a href=\"https:\/\/github.com\/experiencor\/keras-yolo3\">keras-yolo3: Training and Detecting Objects with YOLO3<\/a>\u201d by <a href=\"https:\/\/www.linkedin.com\/in\/ngoca\/\">Huynh Ngoc Anh<\/a> or experiencor. The code in the project has been made available under a permissive MIT open source license. Like YAD2K, it provides scripts to both load and use pre-trained YOLO models as well as transfer learning for developing YOLOv3 models on new datasets.<\/p>\n<p>He also has a <a href=\"https:\/\/github.com\/experiencor\/keras-yolo2\">keras-yolo2<\/a> project that provides similar code for YOLOv2 as well as detailed tutorials on how to use the code in the repository. The <a href=\"https:\/\/github.com\/experiencor\/keras-yolo3\">keras-yolo3<\/a> project appears to be an updated version of that project.<\/p>\n<p>Interestingly, experiencor has used the model as the basis for some experiments and trained versions of the YOLOv3 on standard object detection problems such as a kangaroo dataset, racoon dataset, red blood cell detection, and others. He has listed model performance, provided the model weights for download and provided YouTube videos of model behavior. For example:<\/p>\n<ul>\n<li><a href=\"https:\/\/www.youtube.com\/watch?v=lxLyLIL7OsU\">Raccoon Detection using YOLO 3<\/a><\/li>\n<\/ul>\n<p><iframe loading=\"lazy\" width=\"500\" height=\"281\" src=\"https:\/\/www.youtube.com\/embed\/lxLyLIL7OsU?feature=oembed\" frameborder=\"0\" allow=\"accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture\" allowfullscreen><\/iframe><\/p>\n<p>We will use experiencor\u2019s keras-yolo3 project as the basis for performing object detection with a YOLOv3 model in this tutorial.<\/p>\n<p>In case the repository changes or is removed (which can happen with third-party open source projects), a <a href=\"https:\/\/github.com\/jbrownlee\/keras-yolo3\">fork of the code at the time of writing<\/a> is provided.<\/p>\n<h2>Object Detection With YOLOv3<\/h2>\n<p>The <a href=\"https:\/\/github.com\/experiencor\/keras-yolo3\">keras-yolo3<\/a> project provides a lot of capability for using YOLOv3 models, including object detection, transfer learning, and training new models from scratch.<\/p>\n<p>In this section, we will use a pre-trained model to perform object detection on an unseen photograph. This capability is available in a single Python file in the repository called \u201c<a href=\"https:\/\/raw.githubusercontent.com\/experiencor\/keras-yolo3\/master\/yolo3_one_file_to_detect_them_all.py\">yolo3_one_file_to_detect_them_all.py<\/a>\u201d that has about 435 lines. This script is, in fact, a program that will use pre-trained weights to prepare a model and use that model to perform object detection and output a model. It also depends upon OpenCV.<\/p>\n<p>Instead of using this program directly, we will reuse elements from this program and develop our own scripts to first prepare and save a Keras YOLOv3 model, and then load the model to make a prediction for a new photograph.<\/p>\n<h3>Create and Save Model<\/h3>\n<p>The first step is to download the pre-trained model weights.<\/p>\n<p>These were trained using the DarkNet code base on the MSCOCO dataset. Download the model weights and place them into your current working directory with the filename \u201c<em>yolov3.weights<\/em>.\u201d It is a large file and may take a moment to download depending on the speed of your internet connection.<\/p>\n<ul>\n<li><a href=\"https:\/\/pjreddie.com\/media\/files\/yolov3.weights\">YOLOv3 Pre-trained Model Weights (yolov3.weights) (237 MB)<\/a><\/li>\n<\/ul>\n<p>Next, we need to define a Keras model that has the right number and type of layers to match the downloaded model weights. The model architecture is called a \u201c<em>DarkNet<\/em>\u201d and was originally loosely based on the VGG-16 model.<\/p>\n<p>The \u201c<a href=\"https:\/\/raw.githubusercontent.com\/experiencor\/keras-yolo3\/master\/yolo3_one_file_to_detect_them_all.py\">yolo3_one_file_to_detect_them_all.py<\/a>\u201d script provides the <em>make_yolov3_model()<\/em> function to create the model for us, and the helper function <em>_conv_block()<\/em> that is used to create blocks of layers. These two functions can be copied directly from the script.<\/p>\n<p>We can now define the Keras model for YOLOv3.<\/p>\n<pre class=\"crayon-plain-tag\"># define the model\r\nmodel = make_yolov3_model()<\/pre>\n<p>Next, we need to load the model weights. The model weights are stored in whatever format that was used by DarkNet. Rather than trying to decode the file manually, we can use the <em>WeightReader<\/em> class provided in the script.<\/p>\n<p>To use the <em>WeightReader<\/em>, it is instantiated with the path to our weights file (e.g. \u2018<em>yolov3.weights<\/em>\u2018). This will parse the file and load the model weights into memory in a format that we can set into our Keras model.<\/p>\n<pre class=\"crayon-plain-tag\"># load the model weights\r\nweight_reader = WeightReader('yolov3.weights')<\/pre>\n<p>We can then call the <em>load_weights()<\/em> function of the <em>WeightReader<\/em> instance, passing in our defined Keras model to set the weights into the layers.<\/p>\n<pre class=\"crayon-plain-tag\"># set the model weights into the model\r\nweight_reader.load_weights(model)<\/pre>\n<p>That\u2019s it; we now have a YOLOv3 model for use.<\/p>\n<p>We can save this model to a Keras compatible .h5 model file ready for later use.<\/p>\n<pre class=\"crayon-plain-tag\"># save the model to file\r\nmodel.save('model.h5')<\/pre>\n<p>We can tie all of this together; the complete code example including functions copied directly from the \u201c<em>yolo3_one_file_to_detect_them_all.py<\/em>\u201d script is listed below.<\/p>\n<pre class=\"crayon-plain-tag\"># create a YOLOv3 Keras model and save it to file\r\n# based on https:\/\/github.com\/experiencor\/keras-yolo3\r\nimport struct\r\nimport numpy as np\r\nfrom keras.layers import Conv2D\r\nfrom keras.layers import Input\r\nfrom keras.layers import BatchNormalization\r\nfrom keras.layers import LeakyReLU\r\nfrom keras.layers import ZeroPadding2D\r\nfrom keras.layers import UpSampling2D\r\nfrom keras.layers.merge import add, concatenate\r\nfrom keras.models import Model\r\n\r\ndef _conv_block(inp, convs, skip=True):\r\n\tx = inp\r\n\tcount = 0\r\n\tfor conv in convs:\r\n\t\tif count == (len(convs) - 2) and skip:\r\n\t\t\tskip_connection = x\r\n\t\tcount += 1\r\n\t\tif conv['stride'] > 1: x = ZeroPadding2D(((1,0),(1,0)))(x) # peculiar padding as darknet prefer left and top\r\n\t\tx = Conv2D(conv['filter'],\r\n\t\t\t\t   conv['kernel'],\r\n\t\t\t\t   strides=conv['stride'],\r\n\t\t\t\t   padding='valid' if conv['stride'] > 1 else 'same', # peculiar padding as darknet prefer left and top\r\n\t\t\t\t   name='conv_' + str(conv['layer_idx']),\r\n\t\t\t\t   use_bias=False if conv['bnorm'] else True)(x)\r\n\t\tif conv['bnorm']: x = BatchNormalization(epsilon=0.001, name='bnorm_' + str(conv['layer_idx']))(x)\r\n\t\tif conv['leaky']: x = LeakyReLU(alpha=0.1, name='leaky_' + str(conv['layer_idx']))(x)\r\n\treturn add([skip_connection, x]) if skip else x\r\n\r\ndef make_yolov3_model():\r\n\tinput_image = Input(shape=(None, None, 3))\r\n\t# Layer  0 => 4\r\n\tx = _conv_block(input_image, [{'filter': 32, 'kernel': 3, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 0},\r\n\t\t\t\t\t\t\t\t  {'filter': 64, 'kernel': 3, 'stride': 2, 'bnorm': True, 'leaky': True, 'layer_idx': 1},\r\n\t\t\t\t\t\t\t\t  {'filter': 32, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 2},\r\n\t\t\t\t\t\t\t\t  {'filter': 64, 'kernel': 3, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 3}])\r\n\t# Layer  5 => 8\r\n\tx = _conv_block(x, [{'filter': 128, 'kernel': 3, 'stride': 2, 'bnorm': True, 'leaky': True, 'layer_idx': 5},\r\n\t\t\t\t\t\t{'filter':  64, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 6},\r\n\t\t\t\t\t\t{'filter': 128, 'kernel': 3, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 7}])\r\n\t# Layer  9 => 11\r\n\tx = _conv_block(x, [{'filter':  64, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 9},\r\n\t\t\t\t\t\t{'filter': 128, 'kernel': 3, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 10}])\r\n\t# Layer 12 => 15\r\n\tx = _conv_block(x, [{'filter': 256, 'kernel': 3, 'stride': 2, 'bnorm': True, 'leaky': True, 'layer_idx': 12},\r\n\t\t\t\t\t\t{'filter': 128, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 13},\r\n\t\t\t\t\t\t{'filter': 256, 'kernel': 3, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 14}])\r\n\t# Layer 16 => 36\r\n\tfor i in range(7):\r\n\t\tx = _conv_block(x, [{'filter': 128, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 16+i*3},\r\n\t\t\t\t\t\t\t{'filter': 256, 'kernel': 3, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 17+i*3}])\r\n\tskip_36 = x\r\n\t# Layer 37 => 40\r\n\tx = _conv_block(x, [{'filter': 512, 'kernel': 3, 'stride': 2, 'bnorm': True, 'leaky': True, 'layer_idx': 37},\r\n\t\t\t\t\t\t{'filter': 256, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 38},\r\n\t\t\t\t\t\t{'filter': 512, 'kernel': 3, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 39}])\r\n\t# Layer 41 => 61\r\n\tfor i in range(7):\r\n\t\tx = _conv_block(x, [{'filter': 256, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 41+i*3},\r\n\t\t\t\t\t\t\t{'filter': 512, 'kernel': 3, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 42+i*3}])\r\n\tskip_61 = x\r\n\t# Layer 62 => 65\r\n\tx = _conv_block(x, [{'filter': 1024, 'kernel': 3, 'stride': 2, 'bnorm': True, 'leaky': True, 'layer_idx': 62},\r\n\t\t\t\t\t\t{'filter':  512, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 63},\r\n\t\t\t\t\t\t{'filter': 1024, 'kernel': 3, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 64}])\r\n\t# Layer 66 => 74\r\n\tfor i in range(3):\r\n\t\tx = _conv_block(x, [{'filter':  512, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 66+i*3},\r\n\t\t\t\t\t\t\t{'filter': 1024, 'kernel': 3, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 67+i*3}])\r\n\t# Layer 75 => 79\r\n\tx = _conv_block(x, [{'filter':  512, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 75},\r\n\t\t\t\t\t\t{'filter': 1024, 'kernel': 3, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 76},\r\n\t\t\t\t\t\t{'filter':  512, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 77},\r\n\t\t\t\t\t\t{'filter': 1024, 'kernel': 3, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 78},\r\n\t\t\t\t\t\t{'filter':  512, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 79}], skip=False)\r\n\t# Layer 80 => 82\r\n\tyolo_82 = _conv_block(x, [{'filter': 1024, 'kernel': 3, 'stride': 1, 'bnorm': True,  'leaky': True,  'layer_idx': 80},\r\n\t\t\t\t\t\t\t  {'filter':  255, 'kernel': 1, 'stride': 1, 'bnorm': False, 'leaky': False, 'layer_idx': 81}], skip=False)\r\n\t# Layer 83 => 86\r\n\tx = _conv_block(x, [{'filter': 256, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 84}], skip=False)\r\n\tx = UpSampling2D(2)(x)\r\n\tx = concatenate([x, skip_61])\r\n\t# Layer 87 => 91\r\n\tx = _conv_block(x, [{'filter': 256, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 87},\r\n\t\t\t\t\t\t{'filter': 512, 'kernel': 3, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 88},\r\n\t\t\t\t\t\t{'filter': 256, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 89},\r\n\t\t\t\t\t\t{'filter': 512, 'kernel': 3, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 90},\r\n\t\t\t\t\t\t{'filter': 256, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 91}], skip=False)\r\n\t# Layer 92 => 94\r\n\tyolo_94 = _conv_block(x, [{'filter': 512, 'kernel': 3, 'stride': 1, 'bnorm': True,  'leaky': True,  'layer_idx': 92},\r\n\t\t\t\t\t\t\t  {'filter': 255, 'kernel': 1, 'stride': 1, 'bnorm': False, 'leaky': False, 'layer_idx': 93}], skip=False)\r\n\t# Layer 95 => 98\r\n\tx = _conv_block(x, [{'filter': 128, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True,   'layer_idx': 96}], skip=False)\r\n\tx = UpSampling2D(2)(x)\r\n\tx = concatenate([x, skip_36])\r\n\t# Layer 99 => 106\r\n\tyolo_106 = _conv_block(x, [{'filter': 128, 'kernel': 1, 'stride': 1, 'bnorm': True,  'leaky': True,  'layer_idx': 99},\r\n\t\t\t\t\t\t\t   {'filter': 256, 'kernel': 3, 'stride': 1, 'bnorm': True,  'leaky': True,  'layer_idx': 100},\r\n\t\t\t\t\t\t\t   {'filter': 128, 'kernel': 1, 'stride': 1, 'bnorm': True,  'leaky': True,  'layer_idx': 101},\r\n\t\t\t\t\t\t\t   {'filter': 256, 'kernel': 3, 'stride': 1, 'bnorm': True,  'leaky': True,  'layer_idx': 102},\r\n\t\t\t\t\t\t\t   {'filter': 128, 'kernel': 1, 'stride': 1, 'bnorm': True,  'leaky': True,  'layer_idx': 103},\r\n\t\t\t\t\t\t\t   {'filter': 256, 'kernel': 3, 'stride': 1, 'bnorm': True,  'leaky': True,  'layer_idx': 104},\r\n\t\t\t\t\t\t\t   {'filter': 255, 'kernel': 1, 'stride': 1, 'bnorm': False, 'leaky': False, 'layer_idx': 105}], skip=False)\r\n\tmodel = Model(input_image, [yolo_82, yolo_94, yolo_106])\r\n\treturn model\r\n\r\nclass WeightReader:\r\n\tdef __init__(self, weight_file):\r\n\t\twith open(weight_file, 'rb') as w_f:\r\n\t\t\tmajor,\t= struct.unpack('i', w_f.read(4))\r\n\t\t\tminor,\t= struct.unpack('i', w_f.read(4))\r\n\t\t\trevision, = struct.unpack('i', w_f.read(4))\r\n\t\t\tif (major*10 + minor) >= 2 and major < 1000 and minor < 1000:\r\n\t\t\t\tw_f.read(8)\r\n\t\t\telse:\r\n\t\t\t\tw_f.read(4)\r\n\t\t\ttranspose = (major > 1000) or (minor > 1000)\r\n\t\t\tbinary = w_f.read()\r\n\t\tself.offset = 0\r\n\t\tself.all_weights = np.frombuffer(binary, dtype='float32')\r\n\r\n\tdef read_bytes(self, size):\r\n\t\tself.offset = self.offset + size\r\n\t\treturn self.all_weights[self.offset-size:self.offset]\r\n\r\n\tdef load_weights(self, model):\r\n\t\tfor i in range(106):\r\n\t\t\ttry:\r\n\t\t\t\tconv_layer = model.get_layer('conv_' + str(i))\r\n\t\t\t\tprint(\"loading weights of convolution #\" + str(i))\r\n\t\t\t\tif i not in [81, 93, 105]:\r\n\t\t\t\t\tnorm_layer = model.get_layer('bnorm_' + str(i))\r\n\t\t\t\t\tsize = np.prod(norm_layer.get_weights()[0].shape)\r\n\t\t\t\t\tbeta  = self.read_bytes(size) # bias\r\n\t\t\t\t\tgamma = self.read_bytes(size) # scale\r\n\t\t\t\t\tmean  = self.read_bytes(size) # mean\r\n\t\t\t\t\tvar   = self.read_bytes(size) # variance\r\n\t\t\t\t\tweights = norm_layer.set_weights([gamma, beta, mean, var])\r\n\t\t\t\tif len(conv_layer.get_weights()) > 1:\r\n\t\t\t\t\tbias   = self.read_bytes(np.prod(conv_layer.get_weights()[1].shape))\r\n\t\t\t\t\tkernel = self.read_bytes(np.prod(conv_layer.get_weights()[0].shape))\r\n\t\t\t\t\tkernel = kernel.reshape(list(reversed(conv_layer.get_weights()[0].shape)))\r\n\t\t\t\t\tkernel = kernel.transpose([2,3,1,0])\r\n\t\t\t\t\tconv_layer.set_weights([kernel, bias])\r\n\t\t\t\telse:\r\n\t\t\t\t\tkernel = self.read_bytes(np.prod(conv_layer.get_weights()[0].shape))\r\n\t\t\t\t\tkernel = kernel.reshape(list(reversed(conv_layer.get_weights()[0].shape)))\r\n\t\t\t\t\tkernel = kernel.transpose([2,3,1,0])\r\n\t\t\t\t\tconv_layer.set_weights([kernel])\r\n\t\t\texcept ValueError:\r\n\t\t\t\tprint(\"no convolution #\" + str(i))\r\n\r\n\tdef reset(self):\r\n\t\tself.offset = 0\r\n\r\n# define the model\r\nmodel = make_yolov3_model()\r\n# load the model weights\r\nweight_reader = WeightReader('yolov3.weights')\r\n# set the model weights into the model\r\nweight_reader.load_weights(model)\r\n# save the model to file\r\nmodel.save('model.h5')<\/pre>\n<p>Running the example may take a little less than one minute to execute on modern hardware.<\/p>\n<p>As the weight file is loaded, you will see debug information reported about what was loaded, output by the <em>WeightReader<\/em> class.<\/p>\n<pre class=\"crayon-plain-tag\">...\r\nloading weights of convolution #99\r\nloading weights of convolution #100\r\nloading weights of convolution #101\r\nloading weights of convolution #102\r\nloading weights of convolution #103\r\nloading weights of convolution #104\r\nloading weights of convolution #105<\/pre>\n<p>At the end of the run, the <em>model.h5<\/em> file is saved in your current working directory with approximately the same size as the original weight file (237MB), but ready to be loaded and used directly as a Keras model.<\/p>\n<h3>Make a Prediction<\/h3>\n<p>We need a new photo for object detection, ideally with objects that we know that the model knows about from the <a href=\"http:\/\/cocodataset.org\/\">MSCOCO dataset<\/a>.<\/p>\n<p>We will use a photograph of three zebras taken by <a href=\"https:\/\/www.flickr.com\/photos\/boegh\/5676993427\/\">Boegh<\/a> on safari, and released under a permissive license.<\/p>\n<div id=\"attachment_7712\" style=\"width: 650px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-7712\" class=\"size-full wp-image-7712\" src=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2019\/03\/zebra.jpg\" alt=\"Photograph of Three Zebras\" width=\"640\" height=\"386\" srcset=\"http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2019\/03\/zebra.jpg 640w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2019\/03\/zebra-300x181.jpg 300w\" sizes=\"(max-width: 640px) 100vw, 640px\"><\/p>\n<p id=\"caption-attachment-7712\" class=\"wp-caption-text\">Photograph of Three Zebras<br \/>Taken by Boegh, some rights reserved.<\/p>\n<\/div>\n<ul>\n<li><a href=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2019\/03\/zebra.jpg\">Photograph of Three Zebras (zebra.jpg)<\/a><\/li>\n<\/ul>\n<p>Download the photograph and place it in your current working directory with the filename \u2018<em>zebra.jpg<\/em>\u2018.<\/p>\n<p>Making a prediction is straightforward, although interpreting the prediction requires some work.<\/p>\n<p>The first step is to load the YOLOv3 Keras model. This might be the slowest part of making a prediction.<\/p>\n<pre class=\"crayon-plain-tag\"># load yolov3 model\r\nmodel = load_model('model.h5')<\/pre>\n<p>Next, we need to load our new photograph and prepare it as suitable input to the model. The model expects inputs to be color images with the square shape of 416\u00d7416 pixels.<\/p>\n<p>We can use the <em>load_img()<\/em> Keras function to load the image and the target_size argument to resize the image after loading. We can also use the <em>img_to_array()<\/em> function to convert the loaded PIL image object into a NumPy array, and then rescale the pixel values from 0-255 to 0-1 32-bit floating point values.<\/p>\n<pre class=\"crayon-plain-tag\"># load the image with the required size\r\nimage = load_img('zebra.jpg', target_size=(416, 416))\r\n# convert to numpy array\r\nimage = img_to_array(image)\r\n# scale pixel values to [0, 1]\r\nimage = image.astype('float32')\r\nimage \/= 255.0<\/pre>\n<p>We will want to show the original photo again later, which means we will need to scale the bounding boxes of all detected objects from the square shape back to the original shape. As such, we can load the image and retrieve the original shape.<\/p>\n<pre class=\"crayon-plain-tag\"># load the image to get its shape\r\nimage = load_img('zebra.jpg')\r\nwidth, height = image.size<\/pre>\n<p>We can tie all of this together into a convenience function named <em>load_image_pixels()<\/em> that takes the filename and target size and returns the scaled pixel data ready to provide as input to the Keras model, as well as the original width and height of the image.<\/p>\n<pre class=\"crayon-plain-tag\"># load and prepare an image\r\ndef load_image_pixels(filename, shape):\r\n    # load the image to get its shape\r\n    image = load_img(filename)\r\n    width, height = image.size\r\n    # load the image with the required size\r\n    image = load_img(filename, target_size=shape)\r\n    # convert to numpy array\r\n    image = img_to_array(image)\r\n    # scale pixel values to [0, 1]\r\n    image = image.astype('float32')\r\n    image \/= 255.0\r\n    # add a dimension so that we have one sample\r\n    image = expand_dims(image, 0)\r\n    return image, width, height<\/pre>\n<p>We can then call this function to load our photo of zebras.<\/p>\n<pre class=\"crayon-plain-tag\"># define the expected input shape for the model\r\ninput_w, input_h = 416, 416\r\n# define our new photo\r\nphoto_filename = 'zebra.jpg'\r\n# load and prepare image\r\nimage, image_w, image_h = load_image_pixels(photo_filename, (input_w, input_h))<\/pre>\n<p>We can now feed the photo into the Keras model and make a prediction.<\/p>\n<pre class=\"crayon-plain-tag\"># make prediction\r\nyhat = model.predict(image)\r\n# summarize the shape of the list of arrays\r\nprint([a.shape for a in yhat])<\/pre>\n<p>That\u2019s it, at least for making a prediction. The complete example is listed below.<\/p>\n<pre class=\"crayon-plain-tag\"># load yolov3 model and perform object detection\r\n# based on https:\/\/github.com\/experiencor\/keras-yolo3\r\nfrom numpy import expand_dims\r\nfrom keras.models import load_model\r\nfrom keras.preprocessing.image import load_img\r\nfrom keras.preprocessing.image import img_to_array\r\n\r\n# load and prepare an image\r\ndef load_image_pixels(filename, shape):\r\n    # load the image to get its shape\r\n    image = load_img(filename)\r\n    width, height = image.size\r\n    # load the image with the required size\r\n    image = load_img(filename, target_size=shape)\r\n    # convert to numpy array\r\n    image = img_to_array(image)\r\n    # scale pixel values to [0, 1]\r\n    image = image.astype('float32')\r\n    image \/= 255.0\r\n    # add a dimension so that we have one sample\r\n    image = expand_dims(image, 0)\r\n    return image, width, height\r\n\r\n# load yolov3 model\r\nmodel = load_model('model.h5')\r\n# define the expected input shape for the model\r\ninput_w, input_h = 416, 416\r\n# define our new photo\r\nphoto_filename = 'zebra.jpg'\r\n# load and prepare image\r\nimage, image_w, image_h = load_image_pixels(photo_filename, (input_w, input_h))\r\n# make prediction\r\nyhat = model.predict(image)\r\n# summarize the shape of the list of arrays\r\nprint([a.shape for a in yhat])<\/pre>\n<p>Running the example returns a list of three NumPy arrays, the shape of which is displayed as output.<\/p>\n<p>These arrays predict both the bounding boxes and class labels but are encoded. They must be interpreted.<\/p>\n<pre class=\"crayon-plain-tag\">[(1, 13, 13, 255), (1, 26, 26, 255), (1, 52, 52, 255)]<\/pre>\n<\/p>\n<h3>Make a Prediction and Interpret Result<\/h3>\n<p>The output of the model is, in fact, encoded candidate bounding boxes from three different grid sizes, and the boxes are defined the context of anchor boxes, carefully chosen based on an analysis of the size of objects in the MSCOCO dataset.<\/p>\n<p>The script provided by experiencor provides a function called <em>decode_netout()<\/em> that will take each one of the NumPy arrays, one at a time, and decode the candidate bounding boxes and class predictions. Further, any bounding boxes that don\u2019t confidently describe an object (e.g. all class probabilities are below a threshold) are ignored. We will use a probability of 60% or 0.6. The function returns a list of <em>BoundBox<\/em> instances that define the corners of each bounding box in the context of the input image shape and class probabilities.<\/p>\n<pre class=\"crayon-plain-tag\"># define the anchors\r\nanchors = [[116,90, 156,198, 373,326], [30,61, 62,45, 59,119], [10,13, 16,30, 33,23]]\r\n# define the probability threshold for detected objects\r\nclass_threshold = 0.6\r\nboxes = list()\r\nfor i in range(len(yhat)):\r\n\t# decode the output of the network\r\n\tboxes += decode_netout(yhat[i][0], anchors[i], class_threshold, input_h, input_w)<\/pre>\n<p>Next, the bounding boxes can be stretched back into the shape of the original image. This is helpful as it means that later we can plot the original image and draw the bounding boxes, hopefully detecting real objects.<\/p>\n<p>The experiencor script provides the <em>correct_yolo_boxes()<\/em> function to perform this translation of bounding box coordinates, taking the list of bounding boxes, the original shape of our loaded photograph, and the shape of the input to the network as arguments. The coordinates of the bounding boxes are updated directly.<\/p>\n<pre class=\"crayon-plain-tag\"># correct the sizes of the bounding boxes for the shape of the image\r\ncorrect_yolo_boxes(boxes, image_h, image_w, input_h, input_w)<\/pre>\n<p>The model has predicted a lot of candidate bounding boxes, and most of the boxes will be referring to the same objects. The list of bounding boxes can be filtered and those boxes that overlap and refer to the same object can be merged. We can define the amount of overlap as a configuration parameter, in this case, 50% or 0.5. This filtering of bounding box regions is generally referred to as non-maximal suppression and is a required post-processing step.<\/p>\n<p>The experiencor script provides this via the <em>do_nms()<\/em> function that takes the list of bounding boxes and a threshold parameter. Rather than purging the overlapping boxes, their predicted probability for their overlapping class is cleared. This allows the boxes to remain and be used if they also detect another object type.<\/p>\n<pre class=\"crayon-plain-tag\"># suppress non-maximal boxes\r\ndo_nms(boxes, 0.5)<\/pre>\n<p>This will leave us with the same number of boxes, but only very few of interest. We can retrieve just those boxes that strongly predict the presence of an object: that is are more than 60% confident. This can be achieved by enumerating over all boxes and checking the class prediction values. We can then look up the corresponding class label for the box and add it to the list. Each box must be considered for each class label, just in case the same box strongly predicts more than one object.<\/p>\n<p>We can develop a <em>get_boxes()<\/em> function that does this and takes the list of boxes, known labels, and our classification threshold as arguments and returns parallel lists of boxes, labels, and scores.<\/p>\n<pre class=\"crayon-plain-tag\"># get all of the results above a threshold\r\ndef get_boxes(boxes, labels, thresh):\r\n\tv_boxes, v_labels, v_scores = list(), list(), list()\r\n\t# enumerate all boxes\r\n\tfor box in boxes:\r\n\t\t# enumerate all possible labels\r\n\t\tfor i in range(len(labels)):\r\n\t\t\t# check if the threshold for this label is high enough\r\n\t\t\tif box.classes[i] > thresh:\r\n\t\t\t\tv_boxes.append(box)\r\n\t\t\t\tv_labels.append(labels[i])\r\n\t\t\t\tv_scores.append(box.classes[i]*100)\r\n\t\t\t\t# don't break, many labels may trigger for one box\r\n\treturn v_boxes, v_labels, v_scores<\/pre>\n<p>We can call this function with our list of boxes.<\/p>\n<p>We also need a list of strings containing the class labels known to the model in the correct order used during training, specifically those class labels from the MSCOCO dataset. Thankfully, this is provided in the experiencor script.<\/p>\n<pre class=\"crayon-plain-tag\"># define the labels\r\nlabels = [\"person\", \"bicycle\", \"car\", \"motorbike\", \"aeroplane\", \"bus\", \"train\", \"truck\",\r\n    \"boat\", \"traffic light\", \"fire hydrant\", \"stop sign\", \"parking meter\", \"bench\",\r\n    \"bird\", \"cat\", \"dog\", \"horse\", \"sheep\", \"cow\", \"elephant\", \"bear\", \"zebra\", \"giraffe\",\r\n    \"backpack\", \"umbrella\", \"handbag\", \"tie\", \"suitcase\", \"frisbee\", \"skis\", \"snowboard\",\r\n    \"sports ball\", \"kite\", \"baseball bat\", \"baseball glove\", \"skateboard\", \"surfboard\",\r\n    \"tennis racket\", \"bottle\", \"wine glass\", \"cup\", \"fork\", \"knife\", \"spoon\", \"bowl\", \"banana\",\r\n    \"apple\", \"sandwich\", \"orange\", \"broccoli\", \"carrot\", \"hot dog\", \"pizza\", \"donut\", \"cake\",\r\n    \"chair\", \"sofa\", \"pottedplant\", \"bed\", \"diningtable\", \"toilet\", \"tvmonitor\", \"laptop\", \"mouse\",\r\n    \"remote\", \"keyboard\", \"cell phone\", \"microwave\", \"oven\", \"toaster\", \"sink\", \"refrigerator\",\r\n    \"book\", \"clock\", \"vase\", \"scissors\", \"teddy bear\", \"hair drier\", \"toothbrush\"]\r\n# get the details of the detected objects\r\nv_boxes, v_labels, v_scores = get_boxes(boxes, labels, class_threshold)<\/pre>\n<p>Now that we have those few boxes of strongly predicted objects, we can summarize them.<\/p>\n<pre class=\"crayon-plain-tag\"># summarize what we found\r\nfor i in range(len(v_boxes)):\r\n    print(v_labels[i], v_scores[i])<\/pre>\n<p>We can also plot our original photograph and draw the bounding box around each detected object. This can be achieved by retrieving the coordinates from each bounding box and creating a Rectangle object.<\/p>\n<pre class=\"crayon-plain-tag\">box = v_boxes[i]\r\n# get coordinates\r\ny1, x1, y2, x2 = box.ymin, box.xmin, box.ymax, box.xmax\r\n# calculate width and height of the box\r\nwidth, height = x2 - x1, y2 - y1\r\n# create the shape\r\nrect = Rectangle((x1, y1), width, height, fill=False, color='white')\r\n# draw the box\r\nax.add_patch(rect)<\/pre>\n<p>We can also draw a string with the class label and confidence.<\/p>\n<pre class=\"crayon-plain-tag\"># draw text and score in top left corner\r\nlabel = \"%s (%.3f)\" % (v_labels[i], v_scores[i])\r\npyplot.text(x1, y1, label, color='white')<\/pre>\n<p>The <em>draw_boxes()<\/em> function below implements this, taking the filename of the original photograph and the parallel lists of bounding boxes, labels and scores, and creates a plot showing all detected objects.<\/p>\n<pre class=\"crayon-plain-tag\"># draw all results\r\ndef draw_boxes(filename, v_boxes, v_labels, v_scores):\r\n\t# load the image\r\n\tdata = pyplot.imread(filename)\r\n\t# plot the image\r\n\tpyplot.imshow(data)\r\n\t# get the context for drawing boxes\r\n\tax = pyplot.gca()\r\n\t# plot each box\r\n\tfor i in range(len(v_boxes)):\r\n\t\tbox = v_boxes[i]\r\n\t\t# get coordinates\r\n\t\ty1, x1, y2, x2 = box.ymin, box.xmin, box.ymax, box.xmax\r\n\t\t# calculate width and height of the box\r\n\t\twidth, height = x2 - x1, y2 - y1\r\n\t\t# create the shape\r\n\t\trect = Rectangle((x1, y1), width, height, fill=False, color='white')\r\n\t\t# draw the box\r\n\t\tax.add_patch(rect)\r\n\t\t# draw text and score in top left corner\r\n\t\tlabel = \"%s (%.3f)\" % (v_labels[i], v_scores[i])\r\n\t\tpyplot.text(x1, y1, label, color='white')\r\n\t# show the plot\r\n\tpyplot.show()<\/pre>\n<p>We can then call this function to plot our final result.<\/p>\n<pre class=\"crayon-plain-tag\"># draw what we found\r\ndraw_boxes(photo_filename, v_boxes, v_labels, v_scores)<\/pre>\n<p>We now have all of the elements required to make a prediction using the YOLOv3 model, interpret the results, and plot them for review.<\/p>\n<p>The full code listing, including the original and modified functions taken from the experiencor script, are listed below for completeness.<\/p>\n<pre class=\"crayon-plain-tag\"># load yolov3 model and perform object detection\r\n# based on https:\/\/github.com\/experiencor\/keras-yolo3\r\nimport numpy as np\r\nfrom numpy import expand_dims\r\nfrom keras.models import load_model\r\nfrom keras.preprocessing.image import load_img\r\nfrom keras.preprocessing.image import img_to_array\r\nfrom matplotlib import pyplot\r\nfrom matplotlib.patches import Rectangle\r\n\r\nclass BoundBox:\r\n\tdef __init__(self, xmin, ymin, xmax, ymax, objness = None, classes = None):\r\n\t\tself.xmin = xmin\r\n\t\tself.ymin = ymin\r\n\t\tself.xmax = xmax\r\n\t\tself.ymax = ymax\r\n\t\tself.objness = objness\r\n\t\tself.classes = classes\r\n\t\tself.label = -1\r\n\t\tself.score = -1\r\n\r\n\tdef get_label(self):\r\n\t\tif self.label == -1:\r\n\t\t\tself.label = np.argmax(self.classes)\r\n\r\n\t\treturn self.label\r\n\r\n\tdef get_score(self):\r\n\t\tif self.score == -1:\r\n\t\t\tself.score = self.classes[self.get_label()]\r\n\r\n\t\treturn self.score\r\n\r\ndef _sigmoid(x):\r\n\treturn 1. \/ (1. + np.exp(-x))\r\n\r\ndef decode_netout(netout, anchors, obj_thresh, net_h, net_w):\r\n\tgrid_h, grid_w = netout.shape[:2]\r\n\tnb_box = 3\r\n\tnetout = netout.reshape((grid_h, grid_w, nb_box, -1))\r\n\tnb_class = netout.shape[-1] - 5\r\n\tboxes = []\r\n\tnetout[..., :2]  = _sigmoid(netout[..., :2])\r\n\tnetout[..., 4:]  = _sigmoid(netout[..., 4:])\r\n\tnetout[..., 5:]  = netout[..., 4][..., np.newaxis] * netout[..., 5:]\r\n\tnetout[..., 5:] *= netout[..., 5:] > obj_thresh\r\n\r\n\tfor i in range(grid_h*grid_w):\r\n\t\trow = i \/ grid_w\r\n\t\tcol = i % grid_w\r\n\t\tfor b in range(nb_box):\r\n\t\t\t# 4th element is objectness score\r\n\t\t\tobjectness = netout[int(row)][int(col)][b][4]\r\n\t\t\tif(objectness.all() <= obj_thresh): continue\r\n\t\t\t# first 4 elements are x, y, w, and h\r\n\t\t\tx, y, w, h = netout[int(row)][int(col)][b][:4]\r\n\t\t\tx = (col + x) \/ grid_w # center position, unit: image width\r\n\t\t\ty = (row + y) \/ grid_h # center position, unit: image height\r\n\t\t\tw = anchors[2 * b + 0] * np.exp(w) \/ net_w # unit: image width\r\n\t\t\th = anchors[2 * b + 1] * np.exp(h) \/ net_h # unit: image height\r\n\t\t\t# last elements are class probabilities\r\n\t\t\tclasses = netout[int(row)][col][b][5:]\r\n\t\t\tbox = BoundBox(x-w\/2, y-h\/2, x+w\/2, y+h\/2, objectness, classes)\r\n\t\t\tboxes.append(box)\r\n\treturn boxes\r\n\r\ndef correct_yolo_boxes(boxes, image_h, image_w, net_h, net_w):\r\n\tnew_w, new_h = net_w, net_h\r\n\tfor i in range(len(boxes)):\r\n\t\tx_offset, x_scale = (net_w - new_w)\/2.\/net_w, float(new_w)\/net_w\r\n\t\ty_offset, y_scale = (net_h - new_h)\/2.\/net_h, float(new_h)\/net_h\r\n\t\tboxes[i].xmin = int((boxes[i].xmin - x_offset) \/ x_scale * image_w)\r\n\t\tboxes[i].xmax = int((boxes[i].xmax - x_offset) \/ x_scale * image_w)\r\n\t\tboxes[i].ymin = int((boxes[i].ymin - y_offset) \/ y_scale * image_h)\r\n\t\tboxes[i].ymax = int((boxes[i].ymax - y_offset) \/ y_scale * image_h)\r\n\r\ndef _interval_overlap(interval_a, interval_b):\r\n\tx1, x2 = interval_a\r\n\tx3, x4 = interval_b\r\n\tif x3 < x1:\r\n\t\tif x4 < x1:\r\n\t\t\treturn 0\r\n\t\telse:\r\n\t\t\treturn min(x2,x4) - x1\r\n\telse:\r\n\t\tif x2 < x3:\r\n\t\t\t return 0\r\n\t\telse:\r\n\t\t\treturn min(x2,x4) - x3\r\n\r\ndef bbox_iou(box1, box2):\r\n\tintersect_w = _interval_overlap([box1.xmin, box1.xmax], [box2.xmin, box2.xmax])\r\n\tintersect_h = _interval_overlap([box1.ymin, box1.ymax], [box2.ymin, box2.ymax])\r\n\tintersect = intersect_w * intersect_h\r\n\tw1, h1 = box1.xmax-box1.xmin, box1.ymax-box1.ymin\r\n\tw2, h2 = box2.xmax-box2.xmin, box2.ymax-box2.ymin\r\n\tunion = w1*h1 + w2*h2 - intersect\r\n\treturn float(intersect) \/ union\r\n\r\ndef do_nms(boxes, nms_thresh):\r\n\tif len(boxes) > 0:\r\n\t\tnb_class = len(boxes[0].classes)\r\n\telse:\r\n\t\treturn\r\n\tfor c in range(nb_class):\r\n\t\tsorted_indices = np.argsort([-box.classes[c] for box in boxes])\r\n\t\tfor i in range(len(sorted_indices)):\r\n\t\t\tindex_i = sorted_indices[i]\r\n\t\t\tif boxes[index_i].classes[c] == 0: continue\r\n\t\t\tfor j in range(i+1, len(sorted_indices)):\r\n\t\t\t\tindex_j = sorted_indices[j]\r\n\t\t\t\tif bbox_iou(boxes[index_i], boxes[index_j]) >= nms_thresh:\r\n\t\t\t\t\tboxes[index_j].classes[c] = 0\r\n\r\n# load and prepare an image\r\ndef load_image_pixels(filename, shape):\r\n\t# load the image to get its shape\r\n\timage = load_img(filename)\r\n\twidth, height = image.size\r\n\t# load the image with the required size\r\n\timage = load_img(filename, target_size=shape)\r\n\t# convert to numpy array\r\n\timage = img_to_array(image)\r\n\t# scale pixel values to [0, 1]\r\n\timage = image.astype('float32')\r\n\timage \/= 255.0\r\n\t# add a dimension so that we have one sample\r\n\timage = expand_dims(image, 0)\r\n\treturn image, width, height\r\n\r\n# get all of the results above a threshold\r\ndef get_boxes(boxes, labels, thresh):\r\n\tv_boxes, v_labels, v_scores = list(), list(), list()\r\n\t# enumerate all boxes\r\n\tfor box in boxes:\r\n\t\t# enumerate all possible labels\r\n\t\tfor i in range(len(labels)):\r\n\t\t\t# check if the threshold for this label is high enough\r\n\t\t\tif box.classes[i] > thresh:\r\n\t\t\t\tv_boxes.append(box)\r\n\t\t\t\tv_labels.append(labels[i])\r\n\t\t\t\tv_scores.append(box.classes[i]*100)\r\n\t\t\t\t# don't break, many labels may trigger for one box\r\n\treturn v_boxes, v_labels, v_scores\r\n\r\n# draw all results\r\ndef draw_boxes(filename, v_boxes, v_labels, v_scores):\r\n\t# load the image\r\n\tdata = pyplot.imread(filename)\r\n\t# plot the image\r\n\tpyplot.imshow(data)\r\n\t# get the context for drawing boxes\r\n\tax = pyplot.gca()\r\n\t# plot each box\r\n\tfor i in range(len(v_boxes)):\r\n\t\tbox = v_boxes[i]\r\n\t\t# get coordinates\r\n\t\ty1, x1, y2, x2 = box.ymin, box.xmin, box.ymax, box.xmax\r\n\t\t# calculate width and height of the box\r\n\t\twidth, height = x2 - x1, y2 - y1\r\n\t\t# create the shape\r\n\t\trect = Rectangle((x1, y1), width, height, fill=False, color='white')\r\n\t\t# draw the box\r\n\t\tax.add_patch(rect)\r\n\t\t# draw text and score in top left corner\r\n\t\tlabel = \"%s (%.3f)\" % (v_labels[i], v_scores[i])\r\n\t\tpyplot.text(x1, y1, label, color='white')\r\n\t# show the plot\r\n\tpyplot.show()\r\n\r\n# load yolov3 model\r\nmodel = load_model('model.h5')\r\n# define the expected input shape for the model\r\ninput_w, input_h = 416, 416\r\n# define our new photo\r\nphoto_filename = 'zebra.jpg'\r\n# load and prepare image\r\nimage, image_w, image_h = load_image_pixels(photo_filename, (input_w, input_h))\r\n# make prediction\r\nyhat = model.predict(image)\r\n# summarize the shape of the list of arrays\r\nprint([a.shape for a in yhat])\r\n# define the anchors\r\nanchors = [[116,90, 156,198, 373,326], [30,61, 62,45, 59,119], [10,13, 16,30, 33,23]]\r\n# define the probability threshold for detected objects\r\nclass_threshold = 0.6\r\nboxes = list()\r\nfor i in range(len(yhat)):\r\n\t# decode the output of the network\r\n\tboxes += decode_netout(yhat[i][0], anchors[i], class_threshold, input_h, input_w)\r\n# correct the sizes of the bounding boxes for the shape of the image\r\ncorrect_yolo_boxes(boxes, image_h, image_w, input_h, input_w)\r\n# suppress non-maximal boxes\r\ndo_nms(boxes, 0.5)\r\n# define the labels\r\nlabels = [\"person\", \"bicycle\", \"car\", \"motorbike\", \"aeroplane\", \"bus\", \"train\", \"truck\",\r\n\t\"boat\", \"traffic light\", \"fire hydrant\", \"stop sign\", \"parking meter\", \"bench\",\r\n\t\"bird\", \"cat\", \"dog\", \"horse\", \"sheep\", \"cow\", \"elephant\", \"bear\", \"zebra\", \"giraffe\",\r\n\t\"backpack\", \"umbrella\", \"handbag\", \"tie\", \"suitcase\", \"frisbee\", \"skis\", \"snowboard\",\r\n\t\"sports ball\", \"kite\", \"baseball bat\", \"baseball glove\", \"skateboard\", \"surfboard\",\r\n\t\"tennis racket\", \"bottle\", \"wine glass\", \"cup\", \"fork\", \"knife\", \"spoon\", \"bowl\", \"banana\",\r\n\t\"apple\", \"sandwich\", \"orange\", \"broccoli\", \"carrot\", \"hot dog\", \"pizza\", \"donut\", \"cake\",\r\n\t\"chair\", \"sofa\", \"pottedplant\", \"bed\", \"diningtable\", \"toilet\", \"tvmonitor\", \"laptop\", \"mouse\",\r\n\t\"remote\", \"keyboard\", \"cell phone\", \"microwave\", \"oven\", \"toaster\", \"sink\", \"refrigerator\",\r\n\t\"book\", \"clock\", \"vase\", \"scissors\", \"teddy bear\", \"hair drier\", \"toothbrush\"]\r\n# get the details of the detected objects\r\nv_boxes, v_labels, v_scores = get_boxes(boxes, labels, class_threshold)\r\n# summarize what we found\r\nfor i in range(len(v_boxes)):\r\n\tprint(v_labels[i], v_scores[i])\r\n# draw what we found\r\ndraw_boxes(photo_filename, v_boxes, v_labels, v_scores)<\/pre>\n<p>Running the example again prints the shape of the raw output from the model.<\/p>\n<p>This is followed by a summary of the objects detected by the model and their confidence. We can see that the model has detected three zebra, all above 90% likelihood.<\/p>\n<pre class=\"crayon-plain-tag\">[(1, 13, 13, 255), (1, 26, 26, 255), (1, 52, 52, 255)]\r\nzebra 94.91060376167297\r\nzebra 99.86329674720764\r\nzebra 96.8708872795105<\/pre>\n<p>A plot of the photograph is created and the three bounding boxes are plotted. We can see that the model has indeed successfully detected the three zebra in the photograph.<\/p>\n<div id=\"attachment_7713\" style=\"width: 1034px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-7713\" class=\"size-large wp-image-7713\" src=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2019\/03\/Photograph-of-Three-Zebra-Each-Detected-with-the-YOLOv3-Model-and-Localized-with-Bounding-Boxes-1024x768.png\" alt=\"Photograph of Three Zebra Each Detected with the YOLOv3 Model and Localized with Bounding Boxes\" width=\"1024\" height=\"768\" srcset=\"http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2019\/03\/Photograph-of-Three-Zebra-Each-Detected-with-the-YOLOv3-Model-and-Localized-with-Bounding-Boxes-1024x768.png 1024w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2019\/03\/Photograph-of-Three-Zebra-Each-Detected-with-the-YOLOv3-Model-and-Localized-with-Bounding-Boxes-300x225.png 300w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2019\/03\/Photograph-of-Three-Zebra-Each-Detected-with-the-YOLOv3-Model-and-Localized-with-Bounding-Boxes-768x576.png 768w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2019\/03\/Photograph-of-Three-Zebra-Each-Detected-with-the-YOLOv3-Model-and-Localized-with-Bounding-Boxes.png 1280w\" sizes=\"(max-width: 1024px) 100vw, 1024px\"><\/p>\n<p id=\"caption-attachment-7713\" class=\"wp-caption-text\">Photograph of Three Zebra Each Detected with the YOLOv3 Model and Localized with Bounding Boxes<\/p>\n<\/div>\n<h2>Further Reading<\/h2>\n<p>This section provides more resources on the topic if you are looking to go deeper.<\/p>\n<h3>Papers<\/h3>\n<ul>\n<li><a href=\"https:\/\/arxiv.org\/abs\/1506.02640\">You Only Look Once: Unified, Real-Time Object Detection<\/a>, 2015.<\/li>\n<li><a href=\"https:\/\/arxiv.org\/abs\/1612.08242\">YOLO9000: Better, Faster, Stronger<\/a>, 2016.<\/li>\n<li><a href=\"https:\/\/arxiv.org\/abs\/1804.02767\">YOLOv3: An Incremental Improvement<\/a>, 2018.<\/li>\n<\/ul>\n<h3>API<\/h3>\n<ul>\n<li><a href=\"https:\/\/matplotlib.org\/api\/_as_gen\/matplotlib.patches.Rectangle.html\">matplotlib.patches.Rectangle API<\/a><\/li>\n<\/ul>\n<h3>Resources<\/h3>\n<ul>\n<li><a href=\"https:\/\/pjreddie.com\/darknet\/yolo\/\">YOLO: Real-Time Object Detection, Homepage<\/a>.<\/li>\n<li><a href=\"https:\/\/github.com\/pjreddie\/darknet\">Official DarkNet and YOLO Source Code, GitHub<\/a>.<\/li>\n<li><a href=\"https:\/\/github.com\/pjreddie\/darknet\/wiki\/YOLO:-Real-Time-Object-Detection\">Official YOLO: Real Time Object Detection<\/a>.<\/li>\n<li><a href=\"https:\/\/experiencor.github.io\/\">Huynh Ngoc Anh, experiencor, Home Page<\/a>.<\/li>\n<li><a href=\"https:\/\/github.com\/experiencor\/keras-yolo3\">experiencor\/keras-yolo3, GitHub<\/a>.<\/li>\n<\/ul>\n<h3>Other YOLO for Keras Projects<\/h3>\n<ul>\n<li><a href=\"https:\/\/github.com\/allanzelener\/YAD2K\">allanzelener\/YAD2K, GitHub<\/a>.<\/li>\n<li><a href=\"https:\/\/github.com\/qqwweee\/keras-yolo3\">qqwweee\/keras-yolo3, GitHub<\/a>.<\/li>\n<li><a href=\"https:\/\/github.com\/xiaochus\/YOLOv3\">xiaochus\/YOLOv3 GitHub<\/a>.<\/li>\n<\/ul>\n<h2>Summary<\/h2>\n<p>In this tutorial, you discovered how to develop a YOLOv3 model for object detection on new photographs.<\/p>\n<p>Specifically, you learned:<\/p>\n<ul>\n<li>YOLO-based Convolutional Neural Network family of models for object detection and the most recent variation called YOLOv3.<\/li>\n<li>The best-of-breed open source library implementation of the YOLOv3 for the Keras deep learning library.<\/li>\n<li>How to use a pre-trained YOLOv3 to perform object localization and detection on new photographs.<\/li>\n<\/ul>\n<p>Do you have any questions?<br \/>\nAsk your questions in the comments below and I will do my best to answer.<\/p>\n<p>The post <a rel=\"nofollow\" href=\"https:\/\/machinelearningmastery.com\/how-to-perform-object-detection-with-yolov3-in-keras\/\">How to Perform Object Detection With YOLOv3 in Keras<\/a> appeared first on <a rel=\"nofollow\" href=\"https:\/\/machinelearningmastery.com\/\">Machine Learning Mastery<\/a>.<\/p>\n<\/div>\n<p><a href=\"https:\/\/machinelearningmastery.com\/how-to-perform-object-detection-with-yolov3-in-keras\/\">Go to Source<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Author: Jason Brownlee Object detection is a task in computer vision that involves identifying the presence, location, and type of one or more objects in [&hellip;] <span class=\"read-more-link\"><a class=\"read-more\" href=\"https:\/\/www.aiproblog.com\/index.php\/2019\/05\/26\/how-to-perform-object-detection-with-yolov3-in-keras\/\">Read More<\/a><\/span><\/p>\n","protected":false},"author":1,"featured_media":2192,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_bbp_topic_count":0,"_bbp_reply_count":0,"_bbp_total_topic_count":0,"_bbp_total_reply_count":0,"_bbp_voice_count":0,"_bbp_anonymous_reply_count":0,"_bbp_topic_count_hidden":0,"_bbp_reply_count_hidden":0,"_bbp_forum_subforum_count":0,"footnotes":""},"categories":[24],"tags":[],"_links":{"self":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/2191"}],"collection":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/comments?post=2191"}],"version-history":[{"count":0,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/2191\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media\/2192"}],"wp:attachment":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media?parent=2191"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/categories?post=2191"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/tags?post=2191"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}