{"id":2198,"date":"2019-05-28T19:00:57","date_gmt":"2019-05-28T19:00:57","guid":{"rendered":"https:\/\/www.aiproblog.com\/index.php\/2019\/05\/28\/how-to-train-an-object-detection-model-to-find-kangaroos-in-photographs-r-cnn-with-keras\/"},"modified":"2019-05-28T19:00:57","modified_gmt":"2019-05-28T19:00:57","slug":"how-to-train-an-object-detection-model-to-find-kangaroos-in-photographs-r-cnn-with-keras","status":"publish","type":"post","link":"https:\/\/www.aiproblog.com\/index.php\/2019\/05\/28\/how-to-train-an-object-detection-model-to-find-kangaroos-in-photographs-r-cnn-with-keras\/","title":{"rendered":"How to Train an Object Detection Model to Find Kangaroos in Photographs (R-CNN with Keras)"},"content":{"rendered":"<p>Author: Jason Brownlee<\/p>\n<div>\n<p>Object detection is a challenging computer vision task that involves predicting both where the objects are in the image and what type of objects were detected.<\/p>\n<p>The Mask Region-based Convolutional Neural Network, or Mask R-CNN, model is one of the state-of-the-art approaches for object recognition tasks. The Matterport Mask R-CNN project provides a library that allows you to develop and train Mask R-CNN Keras models for your own object detection tasks. Using the library can be tricky for beginners and requires the careful preparation of the dataset, although it allows fast training via transfer learning with top performing models trained on challenging object detection tasks, such as MS COCO.<\/p>\n<p>In this tutorial, you will discover how to develop a Mask R-CNN model for kangaroo object detection in photographs.<\/p>\n<p>After completing this tutorial, you will know:<\/p>\n<ul>\n<li>How to prepare an object detection dataset ready for modeling with an R-CNN.<\/li>\n<li>How to use transfer learning to train an object detection model on a new dataset.<\/li>\n<li>How to evaluate a fit Mask R-CNN model on a test dataset and make predictions on new photos.<\/li>\n<\/ul>\n<p>Let\u2019s get started.<\/p>\n<div id=\"attachment_7732\" style=\"width: 650px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-7732\" class=\"size-full wp-image-7732\" src=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2019\/05\/How-to-Train-an-Object-Detection-Model-to-Find-Kangaroos-in-Photographs-R-CNN-with-Keras.jpg\" alt=\"How to Train an Object Detection Model to Find Kangaroos in Photographs (R-CNN with Keras)\" width=\"640\" height=\"358\" srcset=\"http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2019\/05\/How-to-Train-an-Object-Detection-Model-to-Find-Kangaroos-in-Photographs-R-CNN-with-Keras.jpg 640w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2019\/05\/How-to-Train-an-Object-Detection-Model-to-Find-Kangaroos-in-Photographs-R-CNN-with-Keras-300x168.jpg 300w\" sizes=\"(max-width: 640px) 100vw, 640px\"><\/p>\n<p id=\"caption-attachment-7732\" class=\"wp-caption-text\">How to Train an Object Detection Model to Find Kangaroos in Photographs (R-CNN with Keras)<br \/>Photo by <a href=\"https:\/\/www.flickr.com\/photos\/16633132@N04\/16146584567\/\">Ronnie Robertson<\/a>, some rights reserved.<\/p>\n<\/div>\n<h2>Tutorial Overview<\/h2>\n<p>This tutorial is divided into five parts; they are:<\/p>\n<ol>\n<li>How to Install Mask R-CNN for Keras<\/li>\n<li>How to Prepare a Dataset for Object Detection<\/li>\n<li>How to a Train Mask R-CNN Model for Kangaroo Detection<\/li>\n<li>How to Evaluate a Mask R-CNN Model<\/li>\n<li>How to Detect Kangaroos in New Photos<\/li>\n<\/ol>\n<h2>How to Install Mask R-CNN for Keras<\/h2>\n<p>Object detection is a task in computer vision that involves identifying the presence, location, and type of one or more objects in a given image.<\/p>\n<p>It is a challenging problem that involves building upon methods for object recognition (e.g. where are they), object localization (e.g. what are their extent), and object classification (e.g. what are they).<\/p>\n<p>The Region-Based Convolutional Neural Network, or R-CNN, is a family of convolutional neural network models designed for object detection, developed by <a href=\"http:\/\/www.rossgirshick.info\/\">Ross Girshick<\/a>, et al. There are perhaps four main variations of the approach, resulting in the current pinnacle called Mask R-CNN. The Mask R-CNN introduced in the 2018 paper titled \u201c<a href=\"https:\/\/arxiv.org\/abs\/1703.06870\">Mask R-CNN<\/a>\u201d is the most recent variation of the family of models and supports both object detection and object segmentation. Object segmentation not only involves localizing objects in the image but also specifies a mask for the image, indicating exactly which pixels in the image belong to the object.<\/p>\n<p>Mask R-CNN is a sophisticated model to implement, especially as compared to a simple or even state-of-the-art deep convolutional neural network model. Instead of developing an implementation of the R-CNN or Mask R-CNN model from scratch, we can use a reliable third-party implementation built on top of the Keras deep learning framework.<\/p>\n<p>The best-of-breed third-party implementations of Mask R-CNN is the <a href=\"https:\/\/github.com\/matterport\/Mask_RCNN\">Mask R-CNN Project<\/a> developed by <a href=\"https:\/\/matterport.com\/\">Matterport<\/a>. The project is open source released under a permissive license (e.g. MIT license) and the code has been widely used on a variety of projects and Kaggle competitions.<\/p>\n<p>The first step is to install the library.<\/p>\n<p>At the time of writing, there is no distributed version of the library, so we have to install it manually. The good news is that this is very easy.<\/p>\n<p>Installation involves cloning the GitHub repository and running the installation script on your workstation. If you are having trouble, see the <a href=\"https:\/\/github.com\/matterport\/Mask_RCNN#installation\">installation instructions<\/a> buried in the library\u2019s readme file.<\/p>\n<div class=\"woo-sc-hr\"><\/div>\n<p><center><\/p>\n<h3>Want Results with Deep Learning for Computer Vision?<\/h3>\n<p>Take my free 7-day email crash course now (with sample code).<\/p>\n<p>Click to sign-up and also get a free PDF Ebook version of the course.<\/p>\n<p><a href=\"https:\/\/machinelearningmastery.lpages.co\/leadbox\/1458ca1e0972a2%3A164f8be4f346dc\/4715926590455808\/\" target=\"_blank\" style=\"background: rgb(255, 206, 10); color: rgb(255, 255, 255); text-decoration: none; font-family: Helvetica, Arial, sans-serif; font-weight: bold; font-size: 16px; line-height: 20px; padding: 10px; display: inline-block; max-width: 300px; border-radius: 5px; text-shadow: rgba(0, 0, 0, 0.25) 0px -1px 1px; box-shadow: rgba(255, 255, 255, 0.5) 0px 1px 3px inset, rgba(0, 0, 0, 0.5) 0px 1px 3px;\" rel=\"noopener noreferrer\">Download Your FREE Mini-Course<\/a><script data-leadbox=\"1458ca1e0972a2:164f8be4f346dc\" data-url=\"https:\/\/machinelearningmastery.lpages.co\/leadbox\/1458ca1e0972a2%3A164f8be4f346dc\/4715926590455808\/\" data-config=\"%7B%7D\" type=\"text\/javascript\" src=\"https:\/\/machinelearningmastery.lpages.co\/leadbox-1553357564.js\"><\/script><\/p>\n<p><\/center><\/p>\n<div class=\"woo-sc-hr\"><\/div>\n<h3>Step 1. Clone the Mask R-CNN GitHub Repository<\/h3>\n<p>This is as simple as running the following command from your command line:<\/p>\n<pre class=\"crayon-plain-tag\">git clone https:\/\/github.com\/matterport\/Mask_RCNN.git<\/pre>\n<p>This will create a new local directory with the name <em>Mask_RCNN<\/em> that looks as follows:<\/p>\n<pre class=\"crayon-plain-tag\">Mask_RCNN\r\n\u251c\u2500\u2500 assets\r\n\u251c\u2500\u2500 build\r\n\u2502   \u251c\u2500\u2500 bdist.macosx-10.13-x86_64\r\n\u2502   \u2514\u2500\u2500 lib\r\n\u2502       \u2514\u2500\u2500 mrcnn\r\n\u251c\u2500\u2500 dist\r\n\u251c\u2500\u2500 images\r\n\u251c\u2500\u2500 mask_rcnn.egg-info\r\n\u251c\u2500\u2500 mrcnn\r\n\u2514\u2500\u2500 samples\r\n    \u251c\u2500\u2500 balloon\r\n    \u251c\u2500\u2500 coco\r\n    \u251c\u2500\u2500 nucleus\r\n    \u2514\u2500\u2500 shapes<\/pre>\n<\/p>\n<h3>Step 2. Install the Mask R-CNN Library<\/h3>\n<p>The library can be installed directly via pip.<\/p>\n<p>Change directory into the <em>Mask_RCNN<\/em> directory and run the installation script.<\/p>\n<p>From the command line, type the following:<\/p>\n<pre class=\"crayon-plain-tag\">cd Mask_RCNN\r\npython setup.py install<\/pre>\n<p>On Linux or MacOS, you may need to install the software with sudo permissions; for example, you may see an error such as:<\/p>\n<pre class=\"crayon-plain-tag\">error: can't create or remove files in install directory<\/pre>\n<p>In that case, install the software with sudo:<\/p>\n<pre class=\"crayon-plain-tag\">sudo python setup.py install<\/pre>\n<p>If you are using a Python virtual environment (<a href=\"https:\/\/virtualenv.pypa.io\/en\/latest\/\">virtualenv<\/a>), such as on an <a href=\"https:\/\/aws.amazon.com\/marketplace\/pp\/B077GF11NF\">EC2 Deep Learning AMI instance<\/a> (recommended for this tutorial), you can install Mask_RCNN into your environment as follows:<\/p>\n<pre class=\"crayon-plain-tag\">sudo ~\/anaconda3\/envs\/tensorflow_p36\/bin\/python setup.py install<\/pre>\n<p>The library will then install directly and you will see a lot of successful installation messages ending with the following:<\/p>\n<pre class=\"crayon-plain-tag\">...\r\nFinished processing dependencies for mask-rcnn==2.1<\/pre>\n<p>This confirms that you installed the library successfully and that you have the latest version, which at the time of writing is version 2.1.<\/p>\n<h3>Step 3: Confirm the Library Was Installed<\/h3>\n<p>It is always a good idea to confirm that the library was installed correctly.<\/p>\n<p>You can confirm that the library was installed correctly by querying it via the pip command; for example:<\/p>\n<pre class=\"crayon-plain-tag\">pip show mask-rcnn<\/pre>\n<p>You should see output informing you of the version and installation location; for example:<\/p>\n<pre class=\"crayon-plain-tag\">Name: mask-rcnn\r\nVersion: 2.1\r\nSummary: Mask R-CNN for object detection and instance segmentation\r\nHome-page: https:\/\/github.com\/matterport\/Mask_RCNN\r\nAuthor: Matterport\r\nAuthor-email: waleed.abdulla@gmail.com\r\nLicense: MIT\r\nLocation: ...\r\nRequires:\r\nRequired-by:<\/pre>\n<p>We are now ready to use the library.<\/p>\n<h2>How to Prepare a Dataset for Object Detection<\/h2>\n<p>Next, we need a dataset to model.<\/p>\n<p>In this tutorial, we will use the <a href=\"https:\/\/github.com\/experiencor\/kangaroo\">kangaroo dataset<\/a>, made available by <a href=\"https:\/\/www.linkedin.com\/in\/ngoca\">Huynh Ngoc Anh<\/a> (experiencor). The dataset is comprised of 183 photographs that contain kangaroos, and XML annotation files that provide bounding boxes for the kangaroos in each photograph.<\/p>\n<p>The Mask R-CNN is designed to learn to predict both bounding boxes for objects as well as masks for those detected objects, and the kangaroo dataset does not provide masks. As such, we will use the dataset to learn a kangaroo object detection task, and ignore the masks and not focus on the image segmentation capabilities of the model.<\/p>\n<p>There are a few steps required in order to prepare this dataset for modeling and we will work through each in turn in this section, including downloading the dataset, parsing the annotations file, developing a <em>KangarooDataset<\/em> object that can be used by the <em>Mask_RCNN<\/em> library, then testing the dataset object to confirm that we are loading images and annotations correctly.<\/p>\n<h3>Install Dataset<\/h3>\n<p>The first step is to download the dataset into your current working directory.<\/p>\n<p>This can be achieved by cloning the GitHub repository directly, as follows:<\/p>\n<pre class=\"crayon-plain-tag\">git clone https:\/\/github.com\/experiencor\/kangaroo.git<\/pre>\n<p>This will create a new directory called \u201c<em>kangaroo<\/em>\u201d with a subdirectory called \u2018<em>images\/<\/em>\u2018 that contains all of the JPEG photos of kangaroos and a subdirectory called \u2018<em>annotes\/<\/em>\u2018 that contains all of the XML files that describe the locations of kangaroos in each photo.<\/p>\n<pre class=\"crayon-plain-tag\">kangaroo\r\n\u251c\u2500\u2500 annots\r\n\u2514\u2500\u2500 images<\/pre>\n<p>Looking in each subdirectory, you can see that the photos and annotation files use a consistent naming convention, with filenames using a 5-digit zero-padded numbering system; for example:<\/p>\n<pre class=\"crayon-plain-tag\">images\/00001.jpg\r\nimages\/00002.jpg\r\nimages\/00003.jpg\r\n...\r\nannots\/00001.xml\r\nannots\/00002.xml\r\nannots\/00003.xml\r\n...<\/pre>\n<p>This makes matching photographs and annotation files together very easy.<\/p>\n<p>We can also see that the numbering system is not contiguous, that there are some photos missing, e.g. there is no \u2018<em>00007<\/em>\u2018 JPG or XML.<\/p>\n<p>This means that we should focus on loading the list of actual files in the directory rather than using a numbering system.<\/p>\n<h3>Parse Annotation File<\/h3>\n<p>The next step is to figure out how to load the annotation files.<\/p>\n<p>First, open the first annotation file (<em>annots\/00001.xml<\/em>) and take a look; you should see:<\/p>\n<pre class=\"crayon-plain-tag\"><annotation>\r\n\t<folder>Kangaroo<\/folder>\r\n\t<filename>00001.jpg<\/filename>\r\n\t<path>...<\/path>\r\n\t<source>\r\n\t\t<database>Unknown<\/database>\r\n\t<\/source>\r\n\t<size>\r\n\t\t<width>450<\/width>\r\n\t\t<height>319<\/height>\r\n\t\t<depth>3<\/depth>\r\n\t<\/size>\r\n\t<segmented>0<\/segmented>\r\n\t<object>\r\n\t\t<name>kangaroo<\/name>\r\n\t\t<pose>Unspecified<\/pose>\r\n\t\t<truncated>0<\/truncated>\r\n\t\t<difficult>0<\/difficult>\r\n\t\t<bndbox>\r\n\t\t\t<xmin>233<\/xmin>\r\n\t\t\t<ymin>89<\/ymin>\r\n\t\t\t<xmax>386<\/xmax>\r\n\t\t\t<ymax>262<\/ymax>\r\n\t\t<\/bndbox>\r\n\t<\/object>\r\n\t<object>\r\n\t\t<name>kangaroo<\/name>\r\n\t\t<pose>Unspecified<\/pose>\r\n\t\t<truncated>0<\/truncated>\r\n\t\t<difficult>0<\/difficult>\r\n\t\t<bndbox>\r\n\t\t\t<xmin>134<\/xmin>\r\n\t\t\t<ymin>105<\/ymin>\r\n\t\t\t<xmax>341<\/xmax>\r\n\t\t\t<ymax>253<\/ymax>\r\n\t\t<\/bndbox>\r\n\t<\/object>\r\n<\/annotation><\/pre>\n<p>We can see that the annotation file contains a \u201c<em>size<\/em>\u201d element that describes the shape of the photograph, and one or more \u201c<em>object<\/em>\u201d elements that describe the bounding boxes for the kangaroo objects in the photograph.<\/p>\n<p>The size and the bounding boxes are the minimum information that we require from each annotation file. We could write some careful XML parsing code to process these annotation files, and that would be a good idea for a production system. Instead, we will short-cut development and use XPath queries to directly extract the data that we need from each file, e.g. a <em>\/\/size<\/em> query to extract the size element and a <em>\/\/object<\/em> or a <em>\/\/bndbox<\/em> query to extract the bounding box elements.<\/p>\n<p>Python provides the <a href=\"https:\/\/docs.python.org\/3\/library\/xml.etree.elementtree.html\">ElementTree API<\/a> that can be used to load and parse an XML file and we can use the <a href=\"https:\/\/docs.python.org\/3\/library\/xml.etree.elementtree.html#xml.etree.ElementTree.Element.find\">find()<\/a> and <a href=\"https:\/\/docs.python.org\/3\/library\/xml.etree.elementtree.html#xml.etree.ElementTree.Element.findall\">findall()<\/a> functions to perform the XPath queries on a loaded document.<\/p>\n<p>First, the annotation file must be loaded and parsed as an <em>ElementTree<\/em> object.<\/p>\n<pre class=\"crayon-plain-tag\"># load and parse the file\r\ntree = ElementTree.parse(filename)<\/pre>\n<p>Once loaded, we can retrieve the root element of the document from which we can perform our XPath queries.<\/p>\n<pre class=\"crayon-plain-tag\"># get the root of the document\r\nroot = tree.getroot()<\/pre>\n<p>We can use the findall() function with a query for \u2018<em>.\/\/bndbox<\/em>\u2018 to find all \u2018<em>bndbox<\/em>\u2018 elements, then enumerate each to extract the <em>x<\/em> and <em>y,<\/em> <em>min<\/em> and <em>max<\/em> values that define each bounding box.<\/p>\n<p>The element text can also be parsed to integer values.<\/p>\n<pre class=\"crayon-plain-tag\"># extract each bounding box\r\nfor box in root.findall('.\/\/bndbox'):\r\n\txmin = int(box.find('xmin').text)\r\n\tymin = int(box.find('ymin').text)\r\n\txmax = int(box.find('xmax').text)\r\n\tymax = int(box.find('ymax').text)\r\n\tcoors = [xmin, ymin, xmax, ymax]<\/pre>\n<p>We can then collect the definition of each bounding box into a list.<\/p>\n<p>The dimensions of the image may also be helpful, which can be queried directly.<\/p>\n<pre class=\"crayon-plain-tag\"># extract image dimensions\r\nwidth = int(root.find('.\/\/size\/width').text)\r\nheight = int(root.find('.\/\/size\/height').text)<\/pre>\n<p>We can tie all of this together into a function that will take the annotation filename as an argument, extract the bounding box and image dimension details, and return them for use.<\/p>\n<p>The <em>extract_boxes()<\/em> function below implements this behavior.<\/p>\n<pre class=\"crayon-plain-tag\"># function to extract bounding boxes from an annotation file\r\ndef extract_boxes(filename):\r\n\t# load and parse the file\r\n\ttree = ElementTree.parse(filename)\r\n\t# get the root of the document\r\n\troot = tree.getroot()\r\n\t# extract each bounding box\r\n\tboxes = list()\r\n\tfor box in root.findall('.\/\/bndbox'):\r\n\t\txmin = int(box.find('xmin').text)\r\n\t\tymin = int(box.find('ymin').text)\r\n\t\txmax = int(box.find('xmax').text)\r\n\t\tymax = int(box.find('ymax').text)\r\n\t\tcoors = [xmin, ymin, xmax, ymax]\r\n\t\tboxes.append(coors)\r\n\t# extract image dimensions\r\n\twidth = int(root.find('.\/\/size\/width').text)\r\n\theight = int(root.find('.\/\/size\/height').text)\r\n\treturn boxes, width, height<\/pre>\n<p>We can test out this function on our annotation files, for example, on the first annotation file in the directory.<\/p>\n<p>The complete example is listed below.<\/p>\n<pre class=\"crayon-plain-tag\"># example of extracting bounding boxes from an annotation file\r\nfrom xml.etree import ElementTree\r\n\r\n# function to extract bounding boxes from an annotation file\r\ndef extract_boxes(filename):\r\n\t# load and parse the file\r\n\ttree = ElementTree.parse(filename)\r\n\t# get the root of the document\r\n\troot = tree.getroot()\r\n\t# extract each bounding box\r\n\tboxes = list()\r\n\tfor box in root.findall('.\/\/bndbox'):\r\n\t\txmin = int(box.find('xmin').text)\r\n\t\tymin = int(box.find('ymin').text)\r\n\t\txmax = int(box.find('xmax').text)\r\n\t\tymax = int(box.find('ymax').text)\r\n\t\tcoors = [xmin, ymin, xmax, ymax]\r\n\t\tboxes.append(coors)\r\n\t# extract image dimensions\r\n\twidth = int(root.find('.\/\/size\/width').text)\r\n\theight = int(root.find('.\/\/size\/height').text)\r\n\treturn boxes, width, height\r\n\r\n# extract details form annotation file\r\nboxes, w, h = extract_boxes('kangaroo\/annots\/00001.xml')\r\n# summarize extracted details\r\nprint(boxes, w, h)<\/pre>\n<p>Running the example returns a list that contains the details of each bounding box in the annotation file, as well as two integers for the width and height of the photograph.<\/p>\n<pre class=\"crayon-plain-tag\">[[233, 89, 386, 262], [134, 105, 341, 253]] 450 319<\/pre>\n<p>Now that we know how to load the annotation file, we can look at using this functionality to develop a Dataset object.<\/p>\n<h3>Develop KangarooDataset Object<\/h3>\n<p>The mask-rcnn library requires that train, validation, and test datasets be managed by a <a href=\"https:\/\/github.com\/matterport\/Mask_RCNN\/blob\/master\/mrcnn\/utils.py\">mrcnn.utils.Dataset object<\/a>.<\/p>\n<p>This means that a new class must be defined that extends the <em>mrcnn.utils.Dataset<\/em> class and defines a function to load the dataset, with any name you like such as <em>load_dataset()<\/em>, and override two functions, one for loading a mask called <em>load_mask()<\/em> and one for loading an image reference (path or URL) called <em>image_reference()<\/em>.<\/p>\n<pre class=\"crayon-plain-tag\"># class that defines and loads the kangaroo dataset\r\nclass KangarooDataset(Dataset):\r\n\t# load the dataset definitions\r\n\tdef load_dataset(self, dataset_dir, is_train=True):\r\n\t\t# ...\r\n\r\n\t# load the masks for an image\r\n\tdef load_mask(self, image_id):\r\n\t\t# ...\r\n\r\n\t# load an image reference\r\n\tdef image_reference(self, image_id):\r\n\t\t# ...<\/pre>\n<p>To use a <em>Dataset<\/em> object, it is instantiated, then your custom load function must be called, then finally the built-in <em>prepare()<\/em> function is called.<\/p>\n<p>For example, we will create a new class called <em>KangarooDataset<\/em> that will be used as follows:<\/p>\n<pre class=\"crayon-plain-tag\"># prepare the dataset\r\ntrain_set = KangarooDataset()\r\ntrain_set.load_dataset(...)\r\ntrain_set.prepare()<\/pre>\n<p>The custom load function, e.g. <em>load_dataset()<\/em> is responsible for both defining the classes and for defining the images in the dataset.<\/p>\n<p>Classes are defined by calling the built-in <em>add_class()<\/em> function and specifying the \u2018<em>source<\/em>\u2018 (the name of the dataset), the \u2018<em>class_id<\/em>\u2018 or integer for the class (e.g. 1 for the first lass as 0 is reserved for the background class), and the \u2018<em>class_name<\/em>\u2018 (e.g. \u2018<em>kangaroo<\/em>\u2018).<\/p>\n<pre class=\"crayon-plain-tag\"># define one class\r\nself.add_class(\"dataset\", 1, \"kangaroo\")<\/pre>\n<p>Objects are defined by a call to the built-in <em>add_image()<\/em> function and specifying the \u2018<em>source<\/em>\u2018 (the name of the dataset), a unique \u2018<em>image_id<\/em>\u2018 (e.g. the filename without the file extension like \u2018<em>00001<\/em>\u2018), and the path for where the image can be loaded (e.g. \u2018<em>kangaroo\/images\/00001.jpg<\/em>\u2018).<\/p>\n<p>This will define an \u201c<em>image info<\/em>\u201d dictionary for the image that can be retrieved later via the index or order in which the image was added to the dataset. You can also specify other arguments that will be added to the image info dictionary, such as an \u2018<em>annotation<\/em>\u2018 to define the annotation path.<\/p>\n<pre class=\"crayon-plain-tag\"># add to dataset\r\nself.add_image('dataset', image_id='00001', path='kangaroo\/images\/00001.jpg', annotation='kangaroo\/annots\/00001.xml')<\/pre>\n<p>For example, we can implement a <em>load_dataset()<\/em> function that takes the path to the dataset directory and loads all images in the dataset.<\/p>\n<p>Note, testing revealed that there is an issue with image number \u2018<em>00090<\/em>\u2018, so we will exclude it from the dataset.<\/p>\n<pre class=\"crayon-plain-tag\"># load the dataset definitions\r\ndef load_dataset(self, dataset_dir):\r\n\t# define one class\r\n\tself.add_class(\"dataset\", 1, \"kangaroo\")\r\n\t# define data locations\r\n\timages_dir = dataset_dir + '\/images\/'\r\n\tannotations_dir = dataset_dir + '\/annots\/'\r\n\t# find all images\r\n\tfor filename in listdir(images_dir):\r\n\t\t# extract image id\r\n\t\timage_id = filename[:-4]\r\n\t\t# skip bad images\r\n\t\tif image_id in ['00090']:\r\n\t\t\tcontinue\r\n\t\timg_path = images_dir + filename\r\n\t\tann_path = annotations_dir + image_id + '.xml'\r\n\t\t# add to dataset\r\n\t\tself.add_image('dataset', image_id=image_id, path=img_path, annotation=ann_path)<\/pre>\n<p>We can go one step further and add one more argument to the function to define whether the <em>Dataset<\/em> instance is for training or test\/validation. We have about 160 photos, so we can use about 20%, or the last 32 photos, as a test or validation dataset and the first 131, or 80%, as the training dataset.<\/p>\n<p>This division can be made using the integer in the filename, where all photos before photo number 150 will be train and equal or after 150 used for test. The updated <em>load_dataset()<\/em> with support for train and test datasets is provided below.<\/p>\n<pre class=\"crayon-plain-tag\"># load the dataset definitions\r\ndef load_dataset(self, dataset_dir, is_train=True):\r\n\t# define one class\r\n\tself.add_class(\"dataset\", 1, \"kangaroo\")\r\n\t# define data locations\r\n\timages_dir = dataset_dir + '\/images\/'\r\n\tannotations_dir = dataset_dir + '\/annots\/'\r\n\t# find all images\r\n\tfor filename in listdir(images_dir):\r\n\t\t# extract image id\r\n\t\timage_id = filename[:-4]\r\n\t\t# skip bad images\r\n\t\tif image_id in ['00090']:\r\n\t\t\tcontinue\r\n\t\t# skip all images after 150 if we are building the train set\r\n\t\tif is_train and int(image_id) >= 150:\r\n\t\t\tcontinue\r\n\t\t# skip all images before 150 if we are building the test\/val set\r\n\t\tif not is_train and int(image_id) < 150:\r\n\t\t\tcontinue\r\n\t\timg_path = images_dir + filename\r\n\t\tann_path = annotations_dir + image_id + '.xml'\r\n\t\t# add to dataset\r\n\t\tself.add_image('dataset', image_id=image_id, path=img_path, annotation=ann_path)<\/pre>\n<p>Next, we need to define the <em>load_mask()<\/em> function for loading the mask for a given \u2018<em>image_id<\/em>\u2018.<\/p>\n<p>In this case, the \u2018<em>image_id<\/em>\u2018 is the integer index for an image in the dataset, assigned based on the order that the image was added via a call to <em>add_image()<\/em> when loading the dataset. The function must return an array of one or more masks for the photo associated with the <em>image_id<\/em>, and the classes for each mask.<\/p>\n<p>We don\u2019t have masks, but we do have bounding boxes. We can load the bounding boxes for a given photo and return them as masks. The library will then infer bounding boxes from our \u201c<em>masks<\/em>\u201d which will be the same size.<\/p>\n<p>First, we must load the annotation file for the <em>image_id<\/em>. This involves first retrieving the \u2018<em>image info<\/em>\u2018 dict for the <em>image_id<\/em>, then retrieving the annotations path that we stored for the image via our prior call to <em>add_image()<\/em>. We can then use the path in our call to <em>extract_boxes()<\/em> developed in the previous section to get the list of bounding boxes and the dimensions of the image.<\/p>\n<pre class=\"crayon-plain-tag\"># get details of image\r\ninfo = self.image_info[image_id]\r\n# define box file location\r\npath = info['annotation']\r\n# load XML\r\nboxes, w, h = self.extract_boxes(path)<\/pre>\n<p>We can now define a mask for each bounding box, and an associated class.<\/p>\n<p>A mask is a two-dimensional array with the same dimensions as the photograph with all zero values where the object isn\u2019t and all one values where the object is in the photograph.<\/p>\n<p>We can achieve this by creating a NumPy array with all zero values for the known size of the image and one channel for each bounding box.<\/p>\n<pre class=\"crayon-plain-tag\"># create one array for all masks, each on a different channel\r\nmasks = zeros([h, w, len(boxes)], dtype='uint8')<\/pre>\n<p>Each bounding box is defined as <em>min<\/em> and <em>max<\/em>, <em>x<\/em> and <em>y<\/em> coordinates of the box.<\/p>\n<p>These can be used directly to define row and column ranges in the array that can then be marked as 1.<\/p>\n<pre class=\"crayon-plain-tag\"># create masks\r\nfor i in range(len(boxes)):\r\n\tbox = boxes[i]\r\n\trow_s, row_e = box[1], box[3]\r\n\tcol_s, col_e = box[0], box[2]\r\n\tmasks[row_s:row_e, col_s:col_e, i] = 1<\/pre>\n<p>All objects have the same class in this dataset. We can retrieve the class index via the \u2018<em>class_names<\/em>\u2018 dictionary, then add it to a list to be returned alongside the masks.<\/p>\n<pre class=\"crayon-plain-tag\">self.class_names.index('kangaroo')<\/pre>\n<p>Tying this together, the complete <em>load_mask()<\/em> function is listed below.<\/p>\n<pre class=\"crayon-plain-tag\"># load the masks for an image\r\ndef load_mask(self, image_id):\r\n\t# get details of image\r\n\tinfo = self.image_info[image_id]\r\n\t# define box file location\r\n\tpath = info['annotation']\r\n\t# load XML\r\n\tboxes, w, h = self.extract_boxes(path)\r\n\t# create one array for all masks, each on a different channel\r\n\tmasks = zeros([h, w, len(boxes)], dtype='uint8')\r\n\t# create masks\r\n\tclass_ids = list()\r\n\tfor i in range(len(boxes)):\r\n\t\tbox = boxes[i]\r\n\t\trow_s, row_e = box[1], box[3]\r\n\t\tcol_s, col_e = box[0], box[2]\r\n\t\tmasks[row_s:row_e, col_s:col_e, i] = 1\r\n\t\tclass_ids.append(self.class_names.index('kangaroo'))\r\n\treturn masks, asarray(class_ids, dtype='int32')<\/pre>\n<p>Finally, we must implement the <em>image_reference()<\/em> function.<\/p>\n<p>This function is responsible for returning the path or URL for a given \u2018<em>image_id<\/em>\u2018, which we know is just the \u2018<em>path<\/em>\u2018 property on the \u2018<em>image info<\/em>\u2018 dict.<\/p>\n<pre class=\"crayon-plain-tag\"># load an image reference\r\ndef image_reference(self, image_id):\r\n\tinfo = self.image_info[image_id]\r\n\treturn info['path']<\/pre>\n<p>And that\u2019s it. We have successfully defined a <em>Dataset<\/em> object for the <em>mask-rcnn<\/em> library for our Kangaroo dataset.<\/p>\n<p>The complete listing of the class and creating a train and test dataset is provided below.<\/p>\n<pre class=\"crayon-plain-tag\"># split into train and test set\r\nfrom os import listdir\r\nfrom xml.etree import ElementTree\r\nfrom numpy import zeros\r\nfrom numpy import asarray\r\nfrom mrcnn.utils import Dataset\r\n\r\n# class that defines and loads the kangaroo dataset\r\nclass KangarooDataset(Dataset):\r\n\t# load the dataset definitions\r\n\tdef load_dataset(self, dataset_dir, is_train=True):\r\n\t\t# define one class\r\n\t\tself.add_class(\"dataset\", 1, \"kangaroo\")\r\n\t\t# define data locations\r\n\t\timages_dir = dataset_dir + '\/images\/'\r\n\t\tannotations_dir = dataset_dir + '\/annots\/'\r\n\t\t# find all images\r\n\t\tfor filename in listdir(images_dir):\r\n\t\t\t# extract image id\r\n\t\t\timage_id = filename[:-4]\r\n\t\t\t# skip bad images\r\n\t\t\tif image_id in ['00090']:\r\n\t\t\t\tcontinue\r\n\t\t\t# skip all images after 150 if we are building the train set\r\n\t\t\tif is_train and int(image_id) >= 150:\r\n\t\t\t\tcontinue\r\n\t\t\t# skip all images before 150 if we are building the test\/val set\r\n\t\t\tif not is_train and int(image_id) < 150:\r\n\t\t\t\tcontinue\r\n\t\t\timg_path = images_dir + filename\r\n\t\t\tann_path = annotations_dir + image_id + '.xml'\r\n\t\t\t# add to dataset\r\n\t\t\tself.add_image('dataset', image_id=image_id, path=img_path, annotation=ann_path)\r\n\r\n\t# extract bounding boxes from an annotation file\r\n\tdef extract_boxes(self, filename):\r\n\t\t# load and parse the file\r\n\t\ttree = ElementTree.parse(filename)\r\n\t\t# get the root of the document\r\n\t\troot = tree.getroot()\r\n\t\t# extract each bounding box\r\n\t\tboxes = list()\r\n\t\tfor box in root.findall('.\/\/bndbox'):\r\n\t\t\txmin = int(box.find('xmin').text)\r\n\t\t\tymin = int(box.find('ymin').text)\r\n\t\t\txmax = int(box.find('xmax').text)\r\n\t\t\tymax = int(box.find('ymax').text)\r\n\t\t\tcoors = [xmin, ymin, xmax, ymax]\r\n\t\t\tboxes.append(coors)\r\n\t\t# extract image dimensions\r\n\t\twidth = int(root.find('.\/\/size\/width').text)\r\n\t\theight = int(root.find('.\/\/size\/height').text)\r\n\t\treturn boxes, width, height\r\n\r\n\t# load the masks for an image\r\n\tdef load_mask(self, image_id):\r\n\t\t# get details of image\r\n\t\tinfo = self.image_info[image_id]\r\n\t\t# define box file location\r\n\t\tpath = info['annotation']\r\n\t\t# load XML\r\n\t\tboxes, w, h = self.extract_boxes(path)\r\n\t\t# create one array for all masks, each on a different channel\r\n\t\tmasks = zeros([h, w, len(boxes)], dtype='uint8')\r\n\t\t# create masks\r\n\t\tclass_ids = list()\r\n\t\tfor i in range(len(boxes)):\r\n\t\t\tbox = boxes[i]\r\n\t\t\trow_s, row_e = box[1], box[3]\r\n\t\t\tcol_s, col_e = box[0], box[2]\r\n\t\t\tmasks[row_s:row_e, col_s:col_e, i] = 1\r\n\t\t\tclass_ids.append(self.class_names.index('kangaroo'))\r\n\t\treturn masks, asarray(class_ids, dtype='int32')\r\n\r\n\t# load an image reference\r\n\tdef image_reference(self, image_id):\r\n\t\tinfo = self.image_info[image_id]\r\n\t\treturn info['path']\r\n\r\n# train set\r\ntrain_set = KangarooDataset()\r\ntrain_set.load_dataset('kangaroo', is_train=True)\r\ntrain_set.prepare()\r\nprint('Train: %d' % len(train_set.image_ids))\r\n\r\n# test\/val set\r\ntest_set = KangarooDataset()\r\ntest_set.load_dataset('kangaroo', is_train=False)\r\ntest_set.prepare()\r\nprint('Test: %d' % len(test_set.image_ids))<\/pre>\n<p>Running the example successfully loads and prepares the train and test dataset and prints the number of images in each.<\/p>\n<pre class=\"crayon-plain-tag\">Train: 131\r\nTest: 32<\/pre>\n<p>Now that we have defined the dataset, let\u2019s confirm that the images, masks, and bounding boxes are handled correctly.<\/p>\n<h3>Test KangarooDataset Object<\/h3>\n<p>The first useful test is to confirm that the images and masks can be loaded correctly.<\/p>\n<p>We can test this by creating a dataset and loading an image via a call to the <em>load_image()<\/em> function with an <em>image_id<\/em>, then load the mask for the image via a call to the <em>load_mask()<\/em> function with the same <em>image_id<\/em>.<\/p>\n<pre class=\"crayon-plain-tag\"># load an image\r\nimage_id = 0\r\nimage = train_set.load_image(image_id)\r\nprint(image.shape)\r\n# load image mask\r\nmask, class_ids = train_set.load_mask(image_id)\r\nprint(mask.shape)<\/pre>\n<p>Next, we can plot the photograph using the Matplotlib API, then plot the first mask over the top with an alpha value so that the photograph underneath can still be seen<\/p>\n<pre class=\"crayon-plain-tag\"># plot image\r\npyplot.imshow(image)\r\n# plot mask\r\npyplot.imshow(mask[:, :, 0], cmap='gray', alpha=0.5)\r\npyplot.show()<\/pre>\n<p>The complete example is listed below.<\/p>\n<pre class=\"crayon-plain-tag\"># plot one photograph and mask\r\nfrom os import listdir\r\nfrom xml.etree import ElementTree\r\nfrom numpy import zeros\r\nfrom numpy import asarray\r\nfrom mrcnn.utils import Dataset\r\nfrom matplotlib import pyplot\r\n\r\n# class that defines and loads the kangaroo dataset\r\nclass KangarooDataset(Dataset):\r\n\t# load the dataset definitions\r\n\tdef load_dataset(self, dataset_dir, is_train=True):\r\n\t\t# define one class\r\n\t\tself.add_class(\"dataset\", 1, \"kangaroo\")\r\n\t\t# define data locations\r\n\t\timages_dir = dataset_dir + '\/images\/'\r\n\t\tannotations_dir = dataset_dir + '\/annots\/'\r\n\t\t# find all images\r\n\t\tfor filename in listdir(images_dir):\r\n\t\t\t# extract image id\r\n\t\t\timage_id = filename[:-4]\r\n\t\t\t# skip bad images\r\n\t\t\tif image_id in ['00090']:\r\n\t\t\t\tcontinue\r\n\t\t\t# skip all images after 150 if we are building the train set\r\n\t\t\tif is_train and int(image_id) >= 150:\r\n\t\t\t\tcontinue\r\n\t\t\t# skip all images before 150 if we are building the test\/val set\r\n\t\t\tif not is_train and int(image_id) < 150:\r\n\t\t\t\tcontinue\r\n\t\t\timg_path = images_dir + filename\r\n\t\t\tann_path = annotations_dir + image_id + '.xml'\r\n\t\t\t# add to dataset\r\n\t\t\tself.add_image('dataset', image_id=image_id, path=img_path, annotation=ann_path)\r\n\r\n\t# extract bounding boxes from an annotation file\r\n\tdef extract_boxes(self, filename):\r\n\t\t# load and parse the file\r\n\t\ttree = ElementTree.parse(filename)\r\n\t\t# get the root of the document\r\n\t\troot = tree.getroot()\r\n\t\t# extract each bounding box\r\n\t\tboxes = list()\r\n\t\tfor box in root.findall('.\/\/bndbox'):\r\n\t\t\txmin = int(box.find('xmin').text)\r\n\t\t\tymin = int(box.find('ymin').text)\r\n\t\t\txmax = int(box.find('xmax').text)\r\n\t\t\tymax = int(box.find('ymax').text)\r\n\t\t\tcoors = [xmin, ymin, xmax, ymax]\r\n\t\t\tboxes.append(coors)\r\n\t\t# extract image dimensions\r\n\t\twidth = int(root.find('.\/\/size\/width').text)\r\n\t\theight = int(root.find('.\/\/size\/height').text)\r\n\t\treturn boxes, width, height\r\n\r\n\t# load the masks for an image\r\n\tdef load_mask(self, image_id):\r\n\t\t# get details of image\r\n\t\tinfo = self.image_info[image_id]\r\n\t\t# define box file location\r\n\t\tpath = info['annotation']\r\n\t\t# load XML\r\n\t\tboxes, w, h = self.extract_boxes(path)\r\n\t\t# create one array for all masks, each on a different channel\r\n\t\tmasks = zeros([h, w, len(boxes)], dtype='uint8')\r\n\t\t# create masks\r\n\t\tclass_ids = list()\r\n\t\tfor i in range(len(boxes)):\r\n\t\t\tbox = boxes[i]\r\n\t\t\trow_s, row_e = box[1], box[3]\r\n\t\t\tcol_s, col_e = box[0], box[2]\r\n\t\t\tmasks[row_s:row_e, col_s:col_e, i] = 1\r\n\t\t\tclass_ids.append(self.class_names.index('kangaroo'))\r\n\t\treturn masks, asarray(class_ids, dtype='int32')\r\n\r\n\t# load an image reference\r\n\tdef image_reference(self, image_id):\r\n\t\tinfo = self.image_info[image_id]\r\n\t\treturn info['path']\r\n\r\n# train set\r\ntrain_set = KangarooDataset()\r\ntrain_set.load_dataset('kangaroo', is_train=True)\r\ntrain_set.prepare()\r\n# load an image\r\nimage_id = 0\r\nimage = train_set.load_image(image_id)\r\nprint(image.shape)\r\n# load image mask\r\nmask, class_ids = train_set.load_mask(image_id)\r\nprint(mask.shape)\r\n# plot image\r\npyplot.imshow(image)\r\n# plot mask\r\npyplot.imshow(mask[:, :, 0], cmap='gray', alpha=0.5)\r\npyplot.show()<\/pre>\n<p>Running the example first prints the shape of the photograph and mask NumPy arrays.<\/p>\n<p>We can confirm that both arrays have the same width and height and only differ in terms of the number of channels. We can also see that the first photograph (e.g. <em>image_id=0<\/em>) in this case only has one mask.<\/p>\n<pre class=\"crayon-plain-tag\">(626, 899, 3)\r\n(626, 899, 1)<\/pre>\n<p>A plot of the photograph is also created with the first mask overlaid.<\/p>\n<p>In this case, we can see that one kangaroo is present in the photo and that the mask correctly bounds the kangaroo.<\/p>\n<div id=\"attachment_7726\" style=\"width: 1034px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-7726\" class=\"size-large wp-image-7726\" src=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2019\/03\/Photograph-of-Kangaroo-with-Object-Detection-Mask-Overlaid-1024x768.png\" alt=\"Photograph of Kangaroo With Object Detection Mask Overlaid\" width=\"1024\" height=\"768\" srcset=\"http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2019\/03\/Photograph-of-Kangaroo-with-Object-Detection-Mask-Overlaid-1024x768.png 1024w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2019\/03\/Photograph-of-Kangaroo-with-Object-Detection-Mask-Overlaid-300x225.png 300w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2019\/03\/Photograph-of-Kangaroo-with-Object-Detection-Mask-Overlaid-768x576.png 768w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2019\/03\/Photograph-of-Kangaroo-with-Object-Detection-Mask-Overlaid.png 1280w\" sizes=\"(max-width: 1024px) 100vw, 1024px\"><\/p>\n<p id=\"caption-attachment-7726\" class=\"wp-caption-text\">Photograph of Kangaroo With Object Detection Mask Overlaid<\/p>\n<\/div>\n<p>We could repeat this for the first nine photos in the dataset, plotting each photo in one figure as a subplot and plotting all masks for each photo.<\/p>\n<pre class=\"crayon-plain-tag\"># plot first few images\r\nfor i in range(9):\r\n\t# define subplot\r\n\tpyplot.subplot(330 + 1 + i)\r\n\t# plot raw pixel data\r\n\timage = train_set.load_image(i)\r\n\tpyplot.imshow(image)\r\n\t# plot all masks\r\n\tmask, _ = train_set.load_mask(i)\r\n\tfor j in range(mask.shape[2]):\r\n\t\tpyplot.imshow(mask[:, :, j], cmap='gray', alpha=0.3)\r\n# show the figure\r\npyplot.show()<\/pre>\n<p>Running the example shows that photos are loaded correctly and that those photos with multiple objects correctly have separate masks defined.<\/p>\n<div id=\"attachment_7728\" style=\"width: 1034px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-7728\" class=\"size-large wp-image-7728\" src=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2019\/03\/Plot-of-First-Nine-Photos-of-Kangaroos-in-the-Training-Dataset-with-Object-Detection-Masks-1024x768.png\" alt=\"Plot of First Nine Photos of Kangaroos in the Training Dataset With Object Detection Masks\" width=\"1024\" height=\"768\" srcset=\"http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2019\/03\/Plot-of-First-Nine-Photos-of-Kangaroos-in-the-Training-Dataset-with-Object-Detection-Masks-1024x768.png 1024w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2019\/03\/Plot-of-First-Nine-Photos-of-Kangaroos-in-the-Training-Dataset-with-Object-Detection-Masks-300x225.png 300w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2019\/03\/Plot-of-First-Nine-Photos-of-Kangaroos-in-the-Training-Dataset-with-Object-Detection-Masks-768x576.png 768w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2019\/03\/Plot-of-First-Nine-Photos-of-Kangaroos-in-the-Training-Dataset-with-Object-Detection-Masks.png 1280w\" sizes=\"(max-width: 1024px) 100vw, 1024px\"><\/p>\n<p id=\"caption-attachment-7728\" class=\"wp-caption-text\">Plot of First Nine Photos of Kangaroos in the Training Dataset With Object Detection Masks<\/p>\n<\/div>\n<p>Another useful debugging step might be to load all of the \u2018<em>image info<\/em>\u2018 objects in the dataset and print them to the console.<\/p>\n<p>This can help to confirm that all of the calls to the <em>add_image()<\/em> function in the <em>load_dataset()<\/em> function worked as expected.<\/p>\n<pre class=\"crayon-plain-tag\"># enumerate all images in the dataset\r\nfor image_id in train_set.image_ids:\r\n\t# load image info\r\n\tinfo = train_set.image_info[image_id]\r\n\t# display on the console\r\n\tprint(info)<\/pre>\n<p>Running this code on the loaded training dataset will then show all of the \u2018<em>image info<\/em>\u2018 dictionaries, showing the paths and ids for each image in the dataset.<\/p>\n<pre class=\"crayon-plain-tag\">{'id': '00132', 'source': 'dataset', 'path': 'kangaroo\/images\/00132.jpg', 'annotation': 'kangaroo\/annots\/00132.xml'}\r\n{'id': '00046', 'source': 'dataset', 'path': 'kangaroo\/images\/00046.jpg', 'annotation': 'kangaroo\/annots\/00046.xml'}\r\n{'id': '00052', 'source': 'dataset', 'path': 'kangaroo\/images\/00052.jpg', 'annotation': 'kangaroo\/annots\/00052.xml'}\r\n...<\/pre>\n<p>Finally, the <em>mask-rcnn<\/em> library provides utilities for displaying images and masks. We can use some of these built-in functions to confirm that the Dataset is operating correctly.<\/p>\n<p>For example, the <em>mask-rcnn<\/em> library provides the <em>mrcnn.visualize.display_instances()<\/em> function that will show a photograph with bounding boxes, masks, and class labels. This requires that the bounding boxes are extracted from the masks via the <em>extract_bboxes()<\/em> function.<\/p>\n<pre class=\"crayon-plain-tag\"># define image id\r\nimage_id = 1\r\n# load the image\r\nimage = train_set.load_image(image_id)\r\n# load the masks and the class ids\r\nmask, class_ids = train_set.load_mask(image_id)\r\n# extract bounding boxes from the masks\r\nbbox = extract_bboxes(mask)\r\n# display image with masks and bounding boxes\r\ndisplay_instances(image, bbox, mask, class_ids, train_set.class_names)<\/pre>\n<p>For completeness, the full code listing is provided below.<\/p>\n<pre class=\"crayon-plain-tag\"># display image with masks and bounding boxes\r\nfrom os import listdir\r\nfrom xml.etree import ElementTree\r\nfrom numpy import zeros\r\nfrom numpy import asarray\r\nfrom mrcnn.utils import Dataset\r\nfrom mrcnn.visualize import display_instances\r\nfrom mrcnn.utils import extract_bboxes\r\n\r\n# class that defines and loads the kangaroo dataset\r\nclass KangarooDataset(Dataset):\r\n\t# load the dataset definitions\r\n\tdef load_dataset(self, dataset_dir, is_train=True):\r\n\t\t# define one class\r\n\t\tself.add_class(\"dataset\", 1, \"kangaroo\")\r\n\t\t# define data locations\r\n\t\timages_dir = dataset_dir + '\/images\/'\r\n\t\tannotations_dir = dataset_dir + '\/annots\/'\r\n\t\t# find all images\r\n\t\tfor filename in listdir(images_dir):\r\n\t\t\t# extract image id\r\n\t\t\timage_id = filename[:-4]\r\n\t\t\t# skip bad images\r\n\t\t\tif image_id in ['00090']:\r\n\t\t\t\tcontinue\r\n\t\t\t# skip all images after 150 if we are building the train set\r\n\t\t\tif is_train and int(image_id) >= 150:\r\n\t\t\t\tcontinue\r\n\t\t\t# skip all images before 150 if we are building the test\/val set\r\n\t\t\tif not is_train and int(image_id) < 150:\r\n\t\t\t\tcontinue\r\n\t\t\timg_path = images_dir + filename\r\n\t\t\tann_path = annotations_dir + image_id + '.xml'\r\n\t\t\t# add to dataset\r\n\t\t\tself.add_image('dataset', image_id=image_id, path=img_path, annotation=ann_path)\r\n\r\n\t# extract bounding boxes from an annotation file\r\n\tdef extract_boxes(self, filename):\r\n\t\t# load and parse the file\r\n\t\ttree = ElementTree.parse(filename)\r\n\t\t# get the root of the document\r\n\t\troot = tree.getroot()\r\n\t\t# extract each bounding box\r\n\t\tboxes = list()\r\n\t\tfor box in root.findall('.\/\/bndbox'):\r\n\t\t\txmin = int(box.find('xmin').text)\r\n\t\t\tymin = int(box.find('ymin').text)\r\n\t\t\txmax = int(box.find('xmax').text)\r\n\t\t\tymax = int(box.find('ymax').text)\r\n\t\t\tcoors = [xmin, ymin, xmax, ymax]\r\n\t\t\tboxes.append(coors)\r\n\t\t# extract image dimensions\r\n\t\twidth = int(root.find('.\/\/size\/width').text)\r\n\t\theight = int(root.find('.\/\/size\/height').text)\r\n\t\treturn boxes, width, height\r\n\r\n\t# load the masks for an image\r\n\tdef load_mask(self, image_id):\r\n\t\t# get details of image\r\n\t\tinfo = self.image_info[image_id]\r\n\t\t# define box file location\r\n\t\tpath = info['annotation']\r\n\t\t# load XML\r\n\t\tboxes, w, h = self.extract_boxes(path)\r\n\t\t# create one array for all masks, each on a different channel\r\n\t\tmasks = zeros([h, w, len(boxes)], dtype='uint8')\r\n\t\t# create masks\r\n\t\tclass_ids = list()\r\n\t\tfor i in range(len(boxes)):\r\n\t\t\tbox = boxes[i]\r\n\t\t\trow_s, row_e = box[1], box[3]\r\n\t\t\tcol_s, col_e = box[0], box[2]\r\n\t\t\tmasks[row_s:row_e, col_s:col_e, i] = 1\r\n\t\t\tclass_ids.append(self.class_names.index('kangaroo'))\r\n\t\treturn masks, asarray(class_ids, dtype='int32')\r\n\r\n\t# load an image reference\r\n\tdef image_reference(self, image_id):\r\n\t\tinfo = self.image_info[image_id]\r\n\t\treturn info['path']\r\n\r\n# train set\r\ntrain_set = KangarooDataset()\r\ntrain_set.load_dataset('kangaroo', is_train=True)\r\ntrain_set.prepare()\r\n# define image id\r\nimage_id = 1\r\n# load the image\r\nimage = train_set.load_image(image_id)\r\n# load the masks and the class ids\r\nmask, class_ids = train_set.load_mask(image_id)\r\n# extract bounding boxes from the masks\r\nbbox = extract_bboxes(mask)\r\n# display image with masks and bounding boxes\r\ndisplay_instances(image, bbox, mask, class_ids, train_set.class_names)<\/pre>\n<p>Running the example creates a plot showing the photograph with the mask for each object in a separate color.<\/p>\n<p>The bounding boxes match the masks exactly, by design, and are shown with dotted outlines. Finally, each object is marked with the class label, which in this case is \u2018<em>kangaroo<\/em>\u2018.<\/p>\n<div id=\"attachment_7729\" style=\"width: 1034px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-7729\" class=\"size-large wp-image-7729\" src=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2019\/03\/Photograph-Showing-Object-Detection-Masks-Bounding-Boxes-and-Class-Labels-1024x576.png\" alt=\"Photograph Showing Object Detection Masks, Bounding Boxes, and Class Labels\" width=\"1024\" height=\"576\" srcset=\"http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2019\/03\/Photograph-Showing-Object-Detection-Masks-Bounding-Boxes-and-Class-Labels-1024x576.png 1024w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2019\/03\/Photograph-Showing-Object-Detection-Masks-Bounding-Boxes-and-Class-Labels-300x169.png 300w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2019\/03\/Photograph-Showing-Object-Detection-Masks-Bounding-Boxes-and-Class-Labels-768x432.png 768w\" sizes=\"(max-width: 1024px) 100vw, 1024px\"><\/p>\n<p id=\"caption-attachment-7729\" class=\"wp-caption-text\">Photograph Showing Object Detection Masks, Bounding Boxes, and Class Labels<\/p>\n<\/div>\n<p>Now that we are confident that our dataset is being loaded correctly, we can use it to fit a Mask R-CNN model.<\/p>\n<h2>How to Train Mask R-CNN Model for Kangaroo Detection<\/h2>\n<p>A Mask R-CNN model can be fit from scratch, although like other computer vision applications, time can be saved and performance can be improved by using transfer learning.<\/p>\n<p>The Mask R-CNN model pre-fit on the MS COCO object detection dataset can be used as a starting point and then tailored to the specific dataset, in this case, the kangaroo dataset.<\/p>\n<p>The first step is to download the model file (architecture and weights) for the pre-fit Mask R-CNN model. The weights are available from the GitHub project and the file is about 250 megabytes.<\/p>\n<p>Download the model weights to a file with the name \u2018<em>mask_rcnn_coco.h5<\/em>\u2018 in your current working directory.<\/p>\n<ul>\n<li><a href=\"https:\/\/github.com\/matterport\/Mask_RCNN\/releases\/download\/v2.0\/mask_rcnn_coco.h5\">Download Weights (mask_rcnn_coco.h5) 246M<\/a><\/li>\n<\/ul>\n<p>Next, a configuration object for the model must be defined.<\/p>\n<p>This is a new class that extends the <em>mrcnn.config.Config<\/em> class and defines properties of both the prediction problem (such as name and the number of classes) and the algorithm for training the model (such as the learning rate).<\/p>\n<p>The configuration must define the name of the configuration via the \u2018<em>NAME<\/em>\u2018 attribute, e.g. \u2018<em>kangaroo_cfg<\/em>\u2018, that will be used to save details and models to file during the run. The configuration must also define the number of classes in the prediction problem via the \u2018<em>NUM_CLASSES<\/em>\u2018 attribute. In this case, we only have one object type of kangaroo, although there is always an additional class for the background.<\/p>\n<p>Finally, we must define the number of samples (photos) used in each training epoch. This will be the number of photos in the training dataset, in this case, 131.<\/p>\n<p>Tying this together, our custom <em>KangarooConfig<\/em> class is defined below.<\/p>\n<pre class=\"crayon-plain-tag\"># define a configuration for the model\r\nclass KangarooConfig(Config):\r\n\t# Give the configuration a recognizable name\r\n\tNAME = \"kangaroo_cfg\"\r\n\t# Number of classes (background + kangaroo)\r\n\tNUM_CLASSES = 1 + 1\r\n\t# Number of training steps per epoch\r\n\tSTEPS_PER_EPOCH = 131\r\n\r\n# prepare config\r\nconfig = KangarooConfig()<\/pre>\n<p>Next, we can define our model.<\/p>\n<p>This is achieved by creating an instance of the <em>mrcnn.model.MaskRCNN<\/em> class and specifying the model will be used for training via setting the \u2018<em>mode<\/em>\u2018 argument to \u2018<em>training<\/em>\u2018.<\/p>\n<p>The \u2018<em>config<\/em>\u2018 argument must also be specified with an instance of our <em>KangarooConfig<\/em> class.<\/p>\n<p>Finally, a directory is needed where configuration files can be saved and where checkpoint models can be saved at the end of each epoch. We will use the current working directory.<\/p>\n<pre class=\"crayon-plain-tag\"># define the model\r\nmodel = MaskRCNN(mode='training', model_dir='.\/', config=config)<\/pre>\n<p>Next, the pre-defined model architecture and weights can be loaded. This can be achieved by calling the <em>load_weights()<\/em> function on the model and specifying the path to the downloaded \u2018<em>mask_rcnn_coco.h5<\/em>\u2018 file.<\/p>\n<p>The model will be used as-is, although the class-specific output layers will be removed so that new output layers can be defined and trained. This can be done by specifying the \u2018<em>exclude<\/em>\u2018 argument and listing all of the output layers to exclude or remove from the model after it is loaded. This includes the output layers for the classification label, bounding boxes, and masks.<\/p>\n<pre class=\"crayon-plain-tag\"># load weights (mscoco)\r\nmodel.load_weights('mask_rcnn_coco.h5', by_name=True, exclude=[\"mrcnn_class_logits\", \"mrcnn_bbox_fc\",  \"mrcnn_bbox\", \"mrcnn_mask\"])<\/pre>\n<p>Next, the model can be fit on the training dataset by calling the <em>train()<\/em> function and passing in both the training dataset and the validation dataset. We can also specify the learning rate as the default learning rate in the configuration (0.001).<\/p>\n<p>We can also specify what layers to train. In this case, we will only train the heads, that is the output layers of the model.<\/p>\n<pre class=\"crayon-plain-tag\"># train weights (output layers or 'heads')\r\nmodel.train(train_set, test_set, learning_rate=config.LEARNING_RATE, epochs=5, layers='heads')<\/pre>\n<p>We could follow this training with further epochs that fine-tune all of the weights in the model. This could be achieved by using a smaller learning rate and changing the \u2018layer\u2019 argument from \u2018heads\u2019 to \u2018all\u2019.<\/p>\n<p>The complete example of training a Mask R-CNN on the kangaroo dataset is listed below.<\/p>\n<p>This may take some time to execute on the CPU, even with modern hardware. I recommend running the code with a GPU, such as on <a href=\"https:\/\/machinelearningmastery.com\/develop-evaluate-large-deep-learning-models-keras-amazon-web-services\/\">Amazon EC2<\/a>, where it will finish in about five minutes on a P3 type hardware.<\/p>\n<pre class=\"crayon-plain-tag\"># fit a mask rcnn on the kangaroo dataset\r\nfrom os import listdir\r\nfrom xml.etree import ElementTree\r\nfrom numpy import zeros\r\nfrom numpy import asarray\r\nfrom mrcnn.utils import Dataset\r\nfrom mrcnn.config import Config\r\nfrom mrcnn.model import MaskRCNN\r\n\r\n# class that defines and loads the kangaroo dataset\r\nclass KangarooDataset(Dataset):\r\n\t# load the dataset definitions\r\n\tdef load_dataset(self, dataset_dir, is_train=True):\r\n\t\t# define one class\r\n\t\tself.add_class(\"dataset\", 1, \"kangaroo\")\r\n\t\t# define data locations\r\n\t\timages_dir = dataset_dir + '\/images\/'\r\n\t\tannotations_dir = dataset_dir + '\/annots\/'\r\n\t\t# find all images\r\n\t\tfor filename in listdir(images_dir):\r\n\t\t\t# extract image id\r\n\t\t\timage_id = filename[:-4]\r\n\t\t\t# skip bad images\r\n\t\t\tif image_id in ['00090']:\r\n\t\t\t\tcontinue\r\n\t\t\t# skip all images after 150 if we are building the train set\r\n\t\t\tif is_train and int(image_id) >= 150:\r\n\t\t\t\tcontinue\r\n\t\t\t# skip all images before 150 if we are building the test\/val set\r\n\t\t\tif not is_train and int(image_id) < 150:\r\n\t\t\t\tcontinue\r\n\t\t\timg_path = images_dir + filename\r\n\t\t\tann_path = annotations_dir + image_id + '.xml'\r\n\t\t\t# add to dataset\r\n\t\t\tself.add_image('dataset', image_id=image_id, path=img_path, annotation=ann_path)\r\n\r\n\t# extract bounding boxes from an annotation file\r\n\tdef extract_boxes(self, filename):\r\n\t\t# load and parse the file\r\n\t\ttree = ElementTree.parse(filename)\r\n\t\t# get the root of the document\r\n\t\troot = tree.getroot()\r\n\t\t# extract each bounding box\r\n\t\tboxes = list()\r\n\t\tfor box in root.findall('.\/\/bndbox'):\r\n\t\t\txmin = int(box.find('xmin').text)\r\n\t\t\tymin = int(box.find('ymin').text)\r\n\t\t\txmax = int(box.find('xmax').text)\r\n\t\t\tymax = int(box.find('ymax').text)\r\n\t\t\tcoors = [xmin, ymin, xmax, ymax]\r\n\t\t\tboxes.append(coors)\r\n\t\t# extract image dimensions\r\n\t\twidth = int(root.find('.\/\/size\/width').text)\r\n\t\theight = int(root.find('.\/\/size\/height').text)\r\n\t\treturn boxes, width, height\r\n\r\n\t# load the masks for an image\r\n\tdef load_mask(self, image_id):\r\n\t\t# get details of image\r\n\t\tinfo = self.image_info[image_id]\r\n\t\t# define box file location\r\n\t\tpath = info['annotation']\r\n\t\t# load XML\r\n\t\tboxes, w, h = self.extract_boxes(path)\r\n\t\t# create one array for all masks, each on a different channel\r\n\t\tmasks = zeros([h, w, len(boxes)], dtype='uint8')\r\n\t\t# create masks\r\n\t\tclass_ids = list()\r\n\t\tfor i in range(len(boxes)):\r\n\t\t\tbox = boxes[i]\r\n\t\t\trow_s, row_e = box[1], box[3]\r\n\t\t\tcol_s, col_e = box[0], box[2]\r\n\t\t\tmasks[row_s:row_e, col_s:col_e, i] = 1\r\n\t\t\tclass_ids.append(self.class_names.index('kangaroo'))\r\n\t\treturn masks, asarray(class_ids, dtype='int32')\r\n\r\n\t# load an image reference\r\n\tdef image_reference(self, image_id):\r\n\t\tinfo = self.image_info[image_id]\r\n\t\treturn info['path']\r\n\r\n# define a configuration for the model\r\nclass KangarooConfig(Config):\r\n\t# define the name of the configuration\r\n\tNAME = \"kangaroo_cfg\"\r\n\t# number of classes (background + kangaroo)\r\n\tNUM_CLASSES = 1 + 1\r\n\t# number of training steps per epoch\r\n\tSTEPS_PER_EPOCH = 131\r\n\r\n# prepare train set\r\ntrain_set = KangarooDataset()\r\ntrain_set.load_dataset('kangaroo', is_train=True)\r\ntrain_set.prepare()\r\nprint('Train: %d' % len(train_set.image_ids))\r\n# prepare test\/val set\r\ntest_set = KangarooDataset()\r\ntest_set.load_dataset('kangaroo', is_train=False)\r\ntest_set.prepare()\r\nprint('Test: %d' % len(test_set.image_ids))\r\n# prepare config\r\nconfig = KangarooConfig()\r\nconfig.display()\r\n# define the model\r\nmodel = MaskRCNN(mode='training', model_dir='.\/', config=config)\r\n# load weights (mscoco) and exclude the output layers\r\nmodel.load_weights('mask_rcnn_coco.h5', by_name=True, exclude=[\"mrcnn_class_logits\", \"mrcnn_bbox_fc\",  \"mrcnn_bbox\", \"mrcnn_mask\"])\r\n# train weights (output layers or 'heads')\r\nmodel.train(train_set, test_set, learning_rate=config.LEARNING_RATE, epochs=5, layers='heads')<\/pre>\n<p>Running the example will report progress using the standard Keras progress bars.<\/p>\n<p>We can see that there are many different train and test loss scores reported for each of the output heads of the network. It can be quite confusing as to which loss to pay attention to.<\/p>\n<p>In this example where we are interested in object detection instead of object segmentation, I recommend paying attention to the loss for the classification output on the train and validation datasets (e.g. <em>mrcnn_class_loss<\/em> and <em>val_mrcnn_class_loss<\/em>), as well as the loss for the bounding box output for the train and validation datasets (<em>mrcnn_bbox_loss<\/em> and <em>val_mrcnn_bbox_loss<\/em>).<\/p>\n<pre class=\"crayon-plain-tag\">Epoch 1\/5\r\n131\/131 [==============================] - 106s 811ms\/step - loss: 0.8491 - rpn_class_loss: 0.0044 - rpn_bbox_loss: 0.1452 - mrcnn_class_loss: 0.0420 - mrcnn_bbox_loss: 0.2874 - mrcnn_mask_loss: 0.3701 - val_loss: 1.3402 - val_rpn_class_loss: 0.0160 - val_rpn_bbox_loss: 0.7913 - val_mrcnn_class_loss: 0.0092 - val_mrcnn_bbox_loss: 0.2263 - val_mrcnn_mask_loss: 0.2975\r\nEpoch 2\/5\r\n131\/131 [==============================] - 69s 526ms\/step - loss: 0.4774 - rpn_class_loss: 0.0025 - rpn_bbox_loss: 0.1159 - mrcnn_class_loss: 0.0170 - mrcnn_bbox_loss: 0.1134 - mrcnn_mask_loss: 0.2285 - val_loss: 0.6261 - val_rpn_class_loss: 8.9502e-04 - val_rpn_bbox_loss: 0.1624 - val_mrcnn_class_loss: 0.0197 - val_mrcnn_bbox_loss: 0.2148 - val_mrcnn_mask_loss: 0.2282\r\nEpoch 3\/5\r\n131\/131 [==============================] - 67s 515ms\/step - loss: 0.4471 - rpn_class_loss: 0.0029 - rpn_bbox_loss: 0.1153 - mrcnn_class_loss: 0.0234 - mrcnn_bbox_loss: 0.0958 - mrcnn_mask_loss: 0.2097 - val_loss: 1.2998 - val_rpn_class_loss: 0.0144 - val_rpn_bbox_loss: 0.6712 - val_mrcnn_class_loss: 0.0372 - val_mrcnn_bbox_loss: 0.2645 - val_mrcnn_mask_loss: 0.3125\r\nEpoch 4\/5\r\n131\/131 [==============================] - 66s 502ms\/step - loss: 0.3934 - rpn_class_loss: 0.0026 - rpn_bbox_loss: 0.1003 - mrcnn_class_loss: 0.0171 - mrcnn_bbox_loss: 0.0806 - mrcnn_mask_loss: 0.1928 - val_loss: 0.6709 - val_rpn_class_loss: 0.0016 - val_rpn_bbox_loss: 0.2012 - val_mrcnn_class_loss: 0.0244 - val_mrcnn_bbox_loss: 0.1942 - val_mrcnn_mask_loss: 0.2495\r\nEpoch 5\/5\r\n131\/131 [==============================] - 65s 493ms\/step - loss: 0.3357 - rpn_class_loss: 0.0024 - rpn_bbox_loss: 0.0804 - mrcnn_class_loss: 0.0193 - mrcnn_bbox_loss: 0.0616 - mrcnn_mask_loss: 0.1721 - val_loss: 0.8878 - val_rpn_class_loss: 0.0030 - val_rpn_bbox_loss: 0.4409 - val_mrcnn_class_loss: 0.0174 - val_mrcnn_bbox_loss: 0.1752 - val_mrcnn_mask_loss: 0.2513<\/pre>\n<p>A model file is created and saved at the end of each epoch in a subdirectory that starts with \u2018<em>kangaroo_cfg<\/em>\u2018 followed by random characters.<\/p>\n<p>A model must be selected for use; in this case, the loss continues to decrease for the bounding boxes on each epoch, so we will use the final model at the end of the run (\u2018<em>mask_rcnn_kangaroo_cfg_0005.h5<\/em>\u2018).<\/p>\n<p>Copy the model file from the config directory into your current working directory. We will use it in the following sections to evaluate the model and make predictions.<\/p>\n<p>The results suggest that perhaps more training epochs could be useful, perhaps fine-tuning all of the layers in the model; this might make an interesting extension to the tutorial.<\/p>\n<p>Next, let\u2019s look at evaluating the performance of this model.<\/p>\n<h2>How to Evaluate a Mask R-CNN Model<\/h2>\n<p>The performance of a model for an object recognition task is often evaluated using the mean absolute precision, or mAP.<\/p>\n<p>We are predicting bounding boxes so we can determine whether a bounding box prediction is good or not based on how well the predicted and actual bounding boxes overlap. This can be calculated by dividing the area of the overlap by the total area of both bounding boxes, or the intersection divided by the union, referred to as \u201c<em>intersection over union<\/em>,\u201d or IoU. A perfect bounding box prediction will have an IoU of 1.<\/p>\n<p>It is standard to assume a positive prediction of a bounding box if the IoU is greater than 0.5, e.g. they overlap by 50% or more.<\/p>\n<p>Precision refers to the percentage of the correctly predicted bounding boxes (IoU > 0.5) out of all bounding boxes predicted. Recall is the percentage of the correctly predicted bounding boxes (IoU > 0.5) out of all objects in the photo.<\/p>\n<p>As we make more predictions, the recall percentage will increase, but precision will drop or become erratic as we start making false positive predictions. The recall (<em>x<\/em>) can be plotted against the precision (<em>y<\/em>) for each number of predictions to create a curve or line. We can maximize the value of each point on this line and calculate the average value of the precision or AP for each value of recall.<\/p>\n<p><strong>Note<\/strong>: there are variations on how AP is calculated, e.g. the way it is calculated for the widely used PASCAL VOC dataset and the MS COCO dataset differ.<\/p>\n<p>The average or mean of the average precision (AP) across all of the images in a dataset is called the mean average precision, or mAP.<\/p>\n<p>The mask-rcnn library provides a <em>mrcnn.utils.compute_ap<\/em> to calculate the AP and other metrics for a given images. These AP scores can be collected across a dataset and the mean calculated to give an idea at how good the model is at detecting objects in a dataset.<\/p>\n<p>First, we must define a new <em>Config<\/em> object to use for making predictions, instead of training. We can extend our previously defined <em>KangarooConfig<\/em> to reuse the parameters. Instead, we will define a new object with the same values to keep the code compact. The config must change some of the defaults around using the GPU for inference that are different from how they are set for training a model (regardless of whether you are running on the GPU or CPU).<\/p>\n<pre class=\"crayon-plain-tag\"># define the prediction configuration\r\nclass PredictionConfig(Config):\r\n\t# define the name of the configuration\r\n\tNAME = \"kangaroo_cfg\"\r\n\t# number of classes (background + kangaroo)\r\n\tNUM_CLASSES = 1 + 1\r\n\t# simplify GPU config\r\n\tGPU_COUNT = 1\r\n\tIMAGES_PER_GPU = 1<\/pre>\n<p>Next, we can define the model with the config and set the \u2018<em>mode<\/em>\u2018 argument to \u2018<em>inference<\/em>\u2018 instead of \u2018<em>training<\/em>\u2018.<\/p>\n<pre class=\"crayon-plain-tag\"># create config\r\ncfg = PredictionConfig()\r\n# define the model\r\nmodel = MaskRCNN(mode='inference', model_dir='.\/', config=cfg)<\/pre>\n<p>Next, we can load the weights from our saved model.<\/p>\n<p>We can do that by specifying the path to the model file. In this case, the model file is \u2018<em>mask_rcnn_kangaroo_cfg_0005.h5<\/em>\u2018 in the current working directory.<\/p>\n<pre class=\"crayon-plain-tag\"># load model weights\r\nmodel.load_weights('mask_rcnn_kangaroo_cfg_0005.h5', by_name=True)<\/pre>\n<p>Next, we can evaluate the model. This involves enumerating the images in a dataset, making a prediction, and calculating the AP for the prediction before predicting a mean AP across all images.<\/p>\n<p>First, the image and ground truth mask can be loaded from the dataset for a given <em>image_id<\/em>. This can be achieved using the <em>load_image_gt()<\/em> convenience function.<\/p>\n<pre class=\"crayon-plain-tag\"># load image, bounding boxes and masks for the image id\r\nimage, image_meta, gt_class_id, gt_bbox, gt_mask = load_image_gt(dataset, cfg, image_id, use_mini_mask=False)<\/pre>\n<p>Next, the pixel values of the loaded image must be scaled in the same way as was performed on the training data, e.g. centered. This can be achieved using the <em>mold_image()<\/em> convenience function.<\/p>\n<pre class=\"crayon-plain-tag\"># convert pixel values (e.g. center)\r\nscaled_image = mold_image(image, cfg)<\/pre>\n<p>The dimensions of the image then need to be expanded one sample in a dataset and used as input to make a prediction with the model.<\/p>\n<pre class=\"crayon-plain-tag\">sample = expand_dims(scaled_image, 0)\r\n# make prediction\r\nyhat = model.detect(sample, verbose=0)\r\n# extract results for first sample\r\nr = yhat[0]<\/pre>\n<p>Next, the prediction can be compared to the ground truth and metrics calculated using the <em>compute_ap()<\/em> function.<\/p>\n<pre class=\"crayon-plain-tag\"># calculate statistics, including AP\r\nAP, _, _, _ = compute_ap(gt_bbox, gt_class_id, gt_mask, r[\"rois\"], r[\"class_ids\"], r[\"scores\"], r['masks'])<\/pre>\n<p>The AP values can be added to a list, then the mean value calculated.<\/p>\n<p>Tying this together, the <em>evaluate_model()<\/em> function below implements this and calculates the mAP given a dataset, model and configuration.<\/p>\n<pre class=\"crayon-plain-tag\"># calculate the mAP for a model on a given dataset\r\ndef evaluate_model(dataset, model, cfg):\r\n\tAPs = list()\r\n\tfor image_id in dataset.image_ids:\r\n\t\t# load image, bounding boxes and masks for the image id\r\n\t\timage, image_meta, gt_class_id, gt_bbox, gt_mask = load_image_gt(dataset, cfg, image_id, use_mini_mask=False)\r\n\t\t# convert pixel values (e.g. center)\r\n\t\tscaled_image = mold_image(image, cfg)\r\n\t\t# convert image into one sample\r\n\t\tsample = expand_dims(scaled_image, 0)\r\n\t\t# make prediction\r\n\t\tyhat = model.detect(sample, verbose=0)\r\n\t\t# extract results for first sample\r\n\t\tr = yhat[0]\r\n\t\t# calculate statistics, including AP\r\n\t\tAP, _, _, _ = compute_ap(gt_bbox, gt_class_id, gt_mask, r[\"rois\"], r[\"class_ids\"], r[\"scores\"], r['masks'])\r\n\t\t# store\r\n\t\tAPs.append(AP)\r\n\t# calculate the mean AP across all images\r\n\tmAP = mean(APs)\r\n\treturn mAP<\/pre>\n<p>We can now calculate the mAP for the model on the train and test datasets.<\/p>\n<pre class=\"crayon-plain-tag\"># evaluate model on training dataset\r\ntrain_mAP = evaluate_model(train_set, model, cfg)\r\nprint(\"Train mAP: %.3f\" % train_mAP)\r\n# evaluate model on test dataset\r\ntest_mAP = evaluate_model(test_set, model, cfg)\r\nprint(\"Test mAP: %.3f\" % test_mAP)<\/pre>\n<p>The full code listing is provided below for completeness.<\/p>\n<pre class=\"crayon-plain-tag\"># evaluate the mask rcnn model on the kangaroo dataset\r\nfrom os import listdir\r\nfrom xml.etree import ElementTree\r\nfrom numpy import zeros\r\nfrom numpy import asarray\r\nfrom numpy import expand_dims\r\nfrom numpy import mean\r\nfrom mrcnn.config import Config\r\nfrom mrcnn.model import MaskRCNN\r\nfrom mrcnn.utils import Dataset\r\nfrom mrcnn.utils import compute_ap\r\nfrom mrcnn.model import load_image_gt\r\nfrom mrcnn.model import mold_image\r\n\r\n# class that defines and loads the kangaroo dataset\r\nclass KangarooDataset(Dataset):\r\n\t# load the dataset definitions\r\n\tdef load_dataset(self, dataset_dir, is_train=True):\r\n\t\t# define one class\r\n\t\tself.add_class(\"dataset\", 1, \"kangaroo\")\r\n\t\t# define data locations\r\n\t\timages_dir = dataset_dir + '\/images\/'\r\n\t\tannotations_dir = dataset_dir + '\/annots\/'\r\n\t\t# find all images\r\n\t\tfor filename in listdir(images_dir):\r\n\t\t\t# extract image id\r\n\t\t\timage_id = filename[:-4]\r\n\t\t\t# skip bad images\r\n\t\t\tif image_id in ['00090']:\r\n\t\t\t\tcontinue\r\n\t\t\t# skip all images after 150 if we are building the train set\r\n\t\t\tif is_train and int(image_id) >= 150:\r\n\t\t\t\tcontinue\r\n\t\t\t# skip all images before 150 if we are building the test\/val set\r\n\t\t\tif not is_train and int(image_id) < 150:\r\n\t\t\t\tcontinue\r\n\t\t\timg_path = images_dir + filename\r\n\t\t\tann_path = annotations_dir + image_id + '.xml'\r\n\t\t\t# add to dataset\r\n\t\t\tself.add_image('dataset', image_id=image_id, path=img_path, annotation=ann_path)\r\n\r\n\t# extract bounding boxes from an annotation file\r\n\tdef extract_boxes(self, filename):\r\n\t\t# load and parse the file\r\n\t\ttree = ElementTree.parse(filename)\r\n\t\t# get the root of the document\r\n\t\troot = tree.getroot()\r\n\t\t# extract each bounding box\r\n\t\tboxes = list()\r\n\t\tfor box in root.findall('.\/\/bndbox'):\r\n\t\t\txmin = int(box.find('xmin').text)\r\n\t\t\tymin = int(box.find('ymin').text)\r\n\t\t\txmax = int(box.find('xmax').text)\r\n\t\t\tymax = int(box.find('ymax').text)\r\n\t\t\tcoors = [xmin, ymin, xmax, ymax]\r\n\t\t\tboxes.append(coors)\r\n\t\t# extract image dimensions\r\n\t\twidth = int(root.find('.\/\/size\/width').text)\r\n\t\theight = int(root.find('.\/\/size\/height').text)\r\n\t\treturn boxes, width, height\r\n\r\n\t# load the masks for an image\r\n\tdef load_mask(self, image_id):\r\n\t\t# get details of image\r\n\t\tinfo = self.image_info[image_id]\r\n\t\t# define box file location\r\n\t\tpath = info['annotation']\r\n\t\t# load XML\r\n\t\tboxes, w, h = self.extract_boxes(path)\r\n\t\t# create one array for all masks, each on a different channel\r\n\t\tmasks = zeros([h, w, len(boxes)], dtype='uint8')\r\n\t\t# create masks\r\n\t\tclass_ids = list()\r\n\t\tfor i in range(len(boxes)):\r\n\t\t\tbox = boxes[i]\r\n\t\t\trow_s, row_e = box[1], box[3]\r\n\t\t\tcol_s, col_e = box[0], box[2]\r\n\t\t\tmasks[row_s:row_e, col_s:col_e, i] = 1\r\n\t\t\tclass_ids.append(self.class_names.index('kangaroo'))\r\n\t\treturn masks, asarray(class_ids, dtype='int32')\r\n\r\n\t# load an image reference\r\n\tdef image_reference(self, image_id):\r\n\t\tinfo = self.image_info[image_id]\r\n\t\treturn info['path']\r\n\r\n# define the prediction configuration\r\nclass PredictionConfig(Config):\r\n\t# define the name of the configuration\r\n\tNAME = \"kangaroo_cfg\"\r\n\t# number of classes (background + kangaroo)\r\n\tNUM_CLASSES = 1 + 1\r\n\t# simplify GPU config\r\n\tGPU_COUNT = 1\r\n\tIMAGES_PER_GPU = 1\r\n\r\n# calculate the mAP for a model on a given dataset\r\ndef evaluate_model(dataset, model, cfg):\r\n\tAPs = list()\r\n\tfor image_id in dataset.image_ids:\r\n\t\t# load image, bounding boxes and masks for the image id\r\n\t\timage, image_meta, gt_class_id, gt_bbox, gt_mask = load_image_gt(dataset, cfg, image_id, use_mini_mask=False)\r\n\t\t# convert pixel values (e.g. center)\r\n\t\tscaled_image = mold_image(image, cfg)\r\n\t\t# convert image into one sample\r\n\t\tsample = expand_dims(scaled_image, 0)\r\n\t\t# make prediction\r\n\t\tyhat = model.detect(sample, verbose=0)\r\n\t\t# extract results for first sample\r\n\t\tr = yhat[0]\r\n\t\t# calculate statistics, including AP\r\n\t\tAP, _, _, _ = compute_ap(gt_bbox, gt_class_id, gt_mask, r[\"rois\"], r[\"class_ids\"], r[\"scores\"], r['masks'])\r\n\t\t# store\r\n\t\tAPs.append(AP)\r\n\t# calculate the mean AP across all images\r\n\tmAP = mean(APs)\r\n\treturn mAP\r\n\r\n# load the train dataset\r\ntrain_set = KangarooDataset()\r\ntrain_set.load_dataset('kangaroo', is_train=True)\r\ntrain_set.prepare()\r\nprint('Train: %d' % len(train_set.image_ids))\r\n# load the test dataset\r\ntest_set = KangarooDataset()\r\ntest_set.load_dataset('kangaroo', is_train=False)\r\ntest_set.prepare()\r\nprint('Test: %d' % len(test_set.image_ids))\r\n# create config\r\ncfg = PredictionConfig()\r\n# define the model\r\nmodel = MaskRCNN(mode='inference', model_dir='.\/', config=cfg)\r\n# load model weights\r\nmodel.load_weights('mask_rcnn_kangaroo_cfg_0005.h5', by_name=True)\r\n# evaluate model on training dataset\r\ntrain_mAP = evaluate_model(train_set, model, cfg)\r\nprint(\"Train mAP: %.3f\" % train_mAP)\r\n# evaluate model on test dataset\r\ntest_mAP = evaluate_model(test_set, model, cfg)\r\nprint(\"Test mAP: %.3f\" % test_mAP)<\/pre>\n<p>Running the example will make a prediction for each image in the train and test datasets and calculate the mAP for each.<\/p>\n<p>A mAP above 90% or 95% is a good score. We can see that the mAP score is good on both datasets, and perhaps slightly better on the test dataset, instead of the train dataset.<\/p>\n<p>This may be because the dataset is very small, and\/or because the model could benefit from further training.<\/p>\n<pre class=\"crayon-plain-tag\">Train mAP: 0.929\r\nTest mAP: 0.958<\/pre>\n<p>Now that we have some confidence that the model is sensible, we can use it to make some predictions.<\/p>\n<h2>How to Detect Kangaroos in New Photos<\/h2>\n<p>We can use the trained model to detect kangaroos in new photographs, specifically, in photos that we expect to have kangaroos.<\/p>\n<p>First, we need a new photo of a kangaroo.<\/p>\n<p>We could go to Flickr and find a random photo of a kangaroo. Alternately, we can use any of the photos in the test dataset that were not used to train the model.<\/p>\n<p>We have already seen in the previous section how to make a prediction with an image. Specifically, scaling the pixel values and calling <em>model.detect()<\/em>. For example:<\/p>\n<pre class=\"crayon-plain-tag\"># example of making a prediction\r\n...\r\n# load image\r\nimage = ...\r\n# convert pixel values (e.g. center)\r\nscaled_image = mold_image(image, cfg)\r\n# convert image into one sample\r\nsample = expand_dims(scaled_image, 0)\r\n# make prediction\r\nyhat = model.detect(sample, verbose=0)\r\n...<\/pre>\n<p>Let\u2019s take it one step further and make predictions for a number of images in a dataset, then plot the photo with bounding boxes side-by-side with the photo and the predicted bounding boxes. This will provide a visual guide to how good the model is at making predictions.<\/p>\n<p>The first step is to load the image and mask from the dataset.<\/p>\n<pre class=\"crayon-plain-tag\"># load the image and mask\r\nimage = dataset.load_image(image_id)\r\nmask, _ = dataset.load_mask(image_id)<\/pre>\n<p>Next, we can make a prediction for the image.<\/p>\n<pre class=\"crayon-plain-tag\"># convert pixel values (e.g. center)\r\nscaled_image = mold_image(image, cfg)\r\n# convert image into one sample\r\nsample = expand_dims(scaled_image, 0)\r\n# make prediction\r\nyhat = model.detect(sample, verbose=0)[0]<\/pre>\n<p>Next, we can create a subplot for the ground truth and plot the image with the known bounding boxes.<\/p>\n<pre class=\"crayon-plain-tag\"># define subplot\r\npyplot.subplot(n_images, 2, i*2+1)\r\n# plot raw pixel data\r\npyplot.imshow(image)\r\npyplot.title('Actual')\r\n# plot masks\r\nfor j in range(mask.shape[2]):\r\n\tpyplot.imshow(mask[:, :, j], cmap='gray', alpha=0.3)<\/pre>\n<p>We can then create a second subplot beside the first and plot the first, plot the photo again, and this time draw the predicted bounding boxes in red.<\/p>\n<pre class=\"crayon-plain-tag\"># get the context for drawing boxes\r\npyplot.subplot(n_images, 2, i*2+2)\r\n# plot raw pixel data\r\npyplot.imshow(image)\r\npyplot.title('Predicted')\r\nax = pyplot.gca()\r\n# plot each box\r\nfor box in yhat['rois']:\r\n\t# get coordinates\r\n\ty1, x1, y2, x2 = box\r\n\t# calculate width and height of the box\r\n\twidth, height = x2 - x1, y2 - y1\r\n\t# create the shape\r\n\trect = Rectangle((x1, y1), width, height, fill=False, color='red')\r\n\t# draw the box\r\n\tax.add_patch(rect)<\/pre>\n<p>We can tie all of this together into a function that takes a dataset, model, and config and creates a plot of the first five photos in the dataset with ground truth and predicted bound boxes.<\/p>\n<pre class=\"crayon-plain-tag\"># plot a number of photos with ground truth and predictions\r\ndef plot_actual_vs_predicted(dataset, model, cfg, n_images=5):\r\n\t# load image and mask\r\n\tfor i in range(n_images):\r\n\t\t# load the image and mask\r\n\t\timage = dataset.load_image(i)\r\n\t\tmask, _ = dataset.load_mask(i)\r\n\t\t# convert pixel values (e.g. center)\r\n\t\tscaled_image = mold_image(image, cfg)\r\n\t\t# convert image into one sample\r\n\t\tsample = expand_dims(scaled_image, 0)\r\n\t\t# make prediction\r\n\t\tyhat = model.detect(sample, verbose=0)[0]\r\n\t\t# define subplot\r\n\t\tpyplot.subplot(n_images, 2, i*2+1)\r\n\t\t# plot raw pixel data\r\n\t\tpyplot.imshow(image)\r\n\t\tpyplot.title('Actual')\r\n\t\t# plot masks\r\n\t\tfor j in range(mask.shape[2]):\r\n\t\t\tpyplot.imshow(mask[:, :, j], cmap='gray', alpha=0.3)\r\n\t\t# get the context for drawing boxes\r\n\t\tpyplot.subplot(n_images, 2, i*2+2)\r\n\t\t# plot raw pixel data\r\n\t\tpyplot.imshow(image)\r\n\t\tpyplot.title('Predicted')\r\n\t\tax = pyplot.gca()\r\n\t\t# plot each box\r\n\t\tfor box in yhat['rois']:\r\n\t\t\t# get coordinates\r\n\t\t\ty1, x1, y2, x2 = box\r\n\t\t\t# calculate width and height of the box\r\n\t\t\twidth, height = x2 - x1, y2 - y1\r\n\t\t\t# create the shape\r\n\t\t\trect = Rectangle((x1, y1), width, height, fill=False, color='red')\r\n\t\t\t# draw the box\r\n\t\t\tax.add_patch(rect)\r\n\t# show the figure\r\n\tpyplot.show()<\/pre>\n<p>The complete example of loading the trained model and making a prediction for the first few images in the train and test datasets is listed below.<\/p>\n<pre class=\"crayon-plain-tag\"># detect kangaroos in photos with mask rcnn model\r\nfrom os import listdir\r\nfrom xml.etree import ElementTree\r\nfrom numpy import zeros\r\nfrom numpy import asarray\r\nfrom numpy import expand_dims\r\nfrom matplotlib import pyplot\r\nfrom matplotlib.patches import Rectangle\r\nfrom mrcnn.config import Config\r\nfrom mrcnn.model import MaskRCNN\r\nfrom mrcnn.model import mold_image\r\nfrom mrcnn.utils import Dataset\r\n\r\n# class that defines and loads the kangaroo dataset\r\nclass KangarooDataset(Dataset):\r\n\t# load the dataset definitions\r\n\tdef load_dataset(self, dataset_dir, is_train=True):\r\n\t\t# define one class\r\n\t\tself.add_class(\"dataset\", 1, \"kangaroo\")\r\n\t\t# define data locations\r\n\t\timages_dir = dataset_dir + '\/images\/'\r\n\t\tannotations_dir = dataset_dir + '\/annots\/'\r\n\t\t# find all images\r\n\t\tfor filename in listdir(images_dir):\r\n\t\t\t# extract image id\r\n\t\t\timage_id = filename[:-4]\r\n\t\t\t# skip bad images\r\n\t\t\tif image_id in ['00090']:\r\n\t\t\t\tcontinue\r\n\t\t\t# skip all images after 150 if we are building the train set\r\n\t\t\tif is_train and int(image_id) >= 150:\r\n\t\t\t\tcontinue\r\n\t\t\t# skip all images before 150 if we are building the test\/val set\r\n\t\t\tif not is_train and int(image_id) < 150:\r\n\t\t\t\tcontinue\r\n\t\t\timg_path = images_dir + filename\r\n\t\t\tann_path = annotations_dir + image_id + '.xml'\r\n\t\t\t# add to dataset\r\n\t\t\tself.add_image('dataset', image_id=image_id, path=img_path, annotation=ann_path)\r\n\r\n\t# load all bounding boxes for an image\r\n\tdef extract_boxes(self, filename):\r\n\t\t# load and parse the file\r\n\t\troot = ElementTree.parse(filename)\r\n\t\tboxes = list()\r\n\t\t# extract each bounding box\r\n\t\tfor box in root.findall('.\/\/bndbox'):\r\n\t\t\txmin = int(box.find('xmin').text)\r\n\t\t\tymin = int(box.find('ymin').text)\r\n\t\t\txmax = int(box.find('xmax').text)\r\n\t\t\tymax = int(box.find('ymax').text)\r\n\t\t\tcoors = [xmin, ymin, xmax, ymax]\r\n\t\t\tboxes.append(coors)\r\n\t\t# extract image dimensions\r\n\t\twidth = int(root.find('.\/\/size\/width').text)\r\n\t\theight = int(root.find('.\/\/size\/height').text)\r\n\t\treturn boxes, width, height\r\n\r\n\t# load the masks for an image\r\n\tdef load_mask(self, image_id):\r\n\t\t# get details of image\r\n\t\tinfo = self.image_info[image_id]\r\n\t\t# define box file location\r\n\t\tpath = info['annotation']\r\n\t\t# load XML\r\n\t\tboxes, w, h = self.extract_boxes(path)\r\n\t\t# create one array for all masks, each on a different channel\r\n\t\tmasks = zeros([h, w, len(boxes)], dtype='uint8')\r\n\t\t# create masks\r\n\t\tclass_ids = list()\r\n\t\tfor i in range(len(boxes)):\r\n\t\t\tbox = boxes[i]\r\n\t\t\trow_s, row_e = box[1], box[3]\r\n\t\t\tcol_s, col_e = box[0], box[2]\r\n\t\t\tmasks[row_s:row_e, col_s:col_e, i] = 1\r\n\t\t\tclass_ids.append(self.class_names.index('kangaroo'))\r\n\t\treturn masks, asarray(class_ids, dtype='int32')\r\n\r\n\t# load an image reference\r\n\tdef image_reference(self, image_id):\r\n\t\tinfo = self.image_info[image_id]\r\n\t\treturn info['path']\r\n\r\n# define the prediction configuration\r\nclass PredictionConfig(Config):\r\n\t# define the name of the configuration\r\n\tNAME = \"kangaroo_cfg\"\r\n\t# number of classes (background + kangaroo)\r\n\tNUM_CLASSES = 1 + 1\r\n\t# simplify GPU config\r\n\tGPU_COUNT = 1\r\n\tIMAGES_PER_GPU = 1\r\n\r\n# plot a number of photos with ground truth and predictions\r\ndef plot_actual_vs_predicted(dataset, model, cfg, n_images=5):\r\n\t# load image and mask\r\n\tfor i in range(n_images):\r\n\t\t# load the image and mask\r\n\t\timage = dataset.load_image(i)\r\n\t\tmask, _ = dataset.load_mask(i)\r\n\t\t# convert pixel values (e.g. center)\r\n\t\tscaled_image = mold_image(image, cfg)\r\n\t\t# convert image into one sample\r\n\t\tsample = expand_dims(scaled_image, 0)\r\n\t\t# make prediction\r\n\t\tyhat = model.detect(sample, verbose=0)[0]\r\n\t\t# define subplot\r\n\t\tpyplot.subplot(n_images, 2, i*2+1)\r\n\t\t# plot raw pixel data\r\n\t\tpyplot.imshow(image)\r\n\t\tpyplot.title('Actual')\r\n\t\t# plot masks\r\n\t\tfor j in range(mask.shape[2]):\r\n\t\t\tpyplot.imshow(mask[:, :, j], cmap='gray', alpha=0.3)\r\n\t\t# get the context for drawing boxes\r\n\t\tpyplot.subplot(n_images, 2, i*2+2)\r\n\t\t# plot raw pixel data\r\n\t\tpyplot.imshow(image)\r\n\t\tpyplot.title('Predicted')\r\n\t\tax = pyplot.gca()\r\n\t\t# plot each box\r\n\t\tfor box in yhat['rois']:\r\n\t\t\t# get coordinates\r\n\t\t\ty1, x1, y2, x2 = box\r\n\t\t\t# calculate width and height of the box\r\n\t\t\twidth, height = x2 - x1, y2 - y1\r\n\t\t\t# create the shape\r\n\t\t\trect = Rectangle((x1, y1), width, height, fill=False, color='red')\r\n\t\t\t# draw the box\r\n\t\t\tax.add_patch(rect)\r\n\t# show the figure\r\n\tpyplot.show()\r\n\r\n# load the train dataset\r\ntrain_set = KangarooDataset()\r\ntrain_set.load_dataset('kangaroo', is_train=True)\r\ntrain_set.prepare()\r\nprint('Train: %d' % len(train_set.image_ids))\r\n# load the test dataset\r\ntest_set = KangarooDataset()\r\ntest_set.load_dataset('kangaroo', is_train=False)\r\ntest_set.prepare()\r\nprint('Test: %d' % len(test_set.image_ids))\r\n# create config\r\ncfg = PredictionConfig()\r\n# define the model\r\nmodel = MaskRCNN(mode='inference', model_dir='.\/', config=cfg)\r\n# load model weights\r\nmodel_path = 'mask_rcnn_kangaroo_cfg_0005.h5'\r\nmodel.load_weights(model_path, by_name=True)\r\n# plot predictions for train dataset\r\nplot_actual_vs_predicted(train_set, model, cfg)\r\n# plot predictions for test dataset\r\nplot_actual_vs_predicted(test_set, model, cfg)<\/pre>\n<p>Running the example first creates a figure showing five photos from the training dataset with the ground truth bounding boxes, with the same photo and the predicted bounding boxes alongside.<\/p>\n<p>We can see that the model has done well on these examples, finding all of the kangaroos, even in the case where there are two or three in one photo. The second photo down (in the right column) does show a slip-up where the model has predicted a bounding box around the same kangaroo twice.<\/p>\n<div id=\"attachment_7730\" style=\"width: 1034px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-7730\" class=\"size-large wp-image-7730\" src=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2019\/03\/Plot-of-Photos-of-Kangaroos-From-the-Training-Dataset-with-Ground-Truth-and-Predicted-Bounding-Boxes-1024x768.png\" alt=\"Plot of Photos of Kangaroos From the Training Dataset With Ground Truth and Predicted Bounding Boxes\" width=\"1024\" height=\"768\" srcset=\"http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2019\/03\/Plot-of-Photos-of-Kangaroos-From-the-Training-Dataset-with-Ground-Truth-and-Predicted-Bounding-Boxes-1024x768.png 1024w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2019\/03\/Plot-of-Photos-of-Kangaroos-From-the-Training-Dataset-with-Ground-Truth-and-Predicted-Bounding-Boxes-300x225.png 300w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2019\/03\/Plot-of-Photos-of-Kangaroos-From-the-Training-Dataset-with-Ground-Truth-and-Predicted-Bounding-Boxes-768x576.png 768w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2019\/03\/Plot-of-Photos-of-Kangaroos-From-the-Training-Dataset-with-Ground-Truth-and-Predicted-Bounding-Boxes.png 1280w\" sizes=\"(max-width: 1024px) 100vw, 1024px\"><\/p>\n<p id=\"caption-attachment-7730\" class=\"wp-caption-text\">Plot of Photos of Kangaroos From the Training Dataset With Ground Truth and Predicted Bounding Boxes<\/p>\n<\/div>\n<p>A second figure is created showing five photos from the test dataset with ground truth bounding boxes and predicted bounding boxes.<\/p>\n<p>These are images not seen during training, and again, in each photo, the model has detected the kangaroo. We can see that in the case of the second last photo that a minor mistake was made. Specifically, the same kangaroo was detected multiple times.<\/p>\n<p>No doubt these differences can be ironed out with more training, perhaps with a larger dataset and\/or data augmentation, to encourage the model to detect people as background and to detect a given kangaroo once only.<\/p>\n<div id=\"attachment_7878\" style=\"width: 1034px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-7878\" class=\"size-large wp-image-7878\" src=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2019\/05\/Plot-of-Photos-of-Kangaroos-From-the-Test-Dataset-with-Ground-Truth-and-Predicted-Bounding-Boxes-1024x768.png\" alt=\"Plot of Photos of Kangaroos From the Training Dataset With Ground Truth and Predicted Bounding Boxes\" width=\"1024\" height=\"768\" srcset=\"http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2019\/05\/Plot-of-Photos-of-Kangaroos-From-the-Test-Dataset-with-Ground-Truth-and-Predicted-Bounding-Boxes-1024x768.png 1024w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2019\/05\/Plot-of-Photos-of-Kangaroos-From-the-Test-Dataset-with-Ground-Truth-and-Predicted-Bounding-Boxes-300x225.png 300w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2019\/05\/Plot-of-Photos-of-Kangaroos-From-the-Test-Dataset-with-Ground-Truth-and-Predicted-Bounding-Boxes-768x576.png 768w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2019\/05\/Plot-of-Photos-of-Kangaroos-From-the-Test-Dataset-with-Ground-Truth-and-Predicted-Bounding-Boxes.png 1280w\" sizes=\"(max-width: 1024px) 100vw, 1024px\"><\/p>\n<p id=\"caption-attachment-7878\" class=\"wp-caption-text\">Plot of Photos of Kangaroos From the Training Dataset With Ground Truth and Predicted Bounding Boxes<\/p>\n<\/div>\n<h2>Further Reading<\/h2>\n<p>This section provides more resources on the topic if you are looking to go deeper.<\/p>\n<h3>Papers<\/h3>\n<ul>\n<li><a href=\"https:\/\/arxiv.org\/abs\/1703.06870\">Mask R-CNN, 2017<\/a>.<\/li>\n<\/ul>\n<h3>Projects<\/h3>\n<ul>\n<li><a href=\"https:\/\/github.com\/experiencor\/kangaroo\">Kangaroo Dataset, GitHub<\/a>.<\/li>\n<li><a href=\"https:\/\/github.com\/matterport\/Mask_RCNN\">Mask RCNN Project, GitHub<\/a>.<\/li>\n<\/ul>\n<h3>APIs<\/h3>\n<ul>\n<li><a href=\"https:\/\/docs.python.org\/3\/library\/xml.etree.elementtree.html\">xml.etree.ElementTree API<\/a><\/li>\n<li><a href=\"https:\/\/matplotlib.org\/api\/_as_gen\/matplotlib.patches.Rectangle.html\">matplotlib.patches.Rectangle API<\/a><\/li>\n<li><a href=\"https:\/\/matplotlib.org\/api\/_as_gen\/matplotlib.pyplot.subplot.html\">matplotlib.pyplot.subplot API<\/a><\/li>\n<li><a href=\"https:\/\/matplotlib.org\/api\/_as_gen\/matplotlib.pyplot.imshow.html\">matplotlib.pyplot.imshow API<\/a><\/li>\n<\/ul>\n<h3>Articles<\/h3>\n<ul>\n<li><a href=\"https:\/\/engineering.matterport.com\/splash-of-color-instance-segmentation-with-mask-r-cnn-and-tensorflow-7c761e238b46\">Splash of Color: Instance Segmentation with Mask R-CNN and TensorFlow, 2018<\/a>.<\/li>\n<li><a href=\"https:\/\/github.com\/matterport\/Mask_RCNN\/blob\/master\/samples\/balloon\/inspect_balloon_model.ipynb\">Mask R-CNN \u2013 Inspect Ballon Trained Model, Notebook<\/a>.<\/li>\n<li><a href=\"https:\/\/github.com\/matterport\/Mask_RCNN\/blob\/master\/samples\/shapes\/train_shapes.ipynb\">Mask R-CNN \u2013 Train on Shapes Dataset, Notebook<\/a>.<\/li>\n<li><a href=\"https:\/\/medium.com\/@jonathan_hui\/map-mean-average-precision-for-object-detection-45c121a31173\">mAP (mean Average Precision) for Object Detection, 2018<\/a>.<\/li>\n<\/ul>\n<h2>Summary<\/h2>\n<p>In this tutorial, you discovered how to develop a Mask R-CNN model for kangaroo object detection in photographs.<\/p>\n<p>Specifically, you learned:<\/p>\n<ul>\n<li>How to prepare an object detection dataset ready for modeling with an R-CNN.<\/li>\n<li>How to use transfer learning to train an object detection model on a new dataset.<\/li>\n<li>How to evaluate a fit Mask R-CNN model on a test dataset and make predictions on new photos.<\/li>\n<\/ul>\n<p>Do you have any questions?<br \/>\nAsk your questions in the comments below and I will do my best to answer.<\/p>\n<p>The post <a rel=\"nofollow\" href=\"https:\/\/machinelearningmastery.com\/how-to-train-an-object-detection-model-with-keras\/\">How to Train an Object Detection Model to Find Kangaroos in Photographs (R-CNN with Keras)<\/a> appeared first on <a rel=\"nofollow\" href=\"https:\/\/machinelearningmastery.com\/\">Machine Learning Mastery<\/a>.<\/p>\n<\/div>\n<p><a href=\"https:\/\/machinelearningmastery.com\/how-to-train-an-object-detection-model-with-keras\/\">Go to Source<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Author: Jason Brownlee Object detection is a challenging computer vision task that involves predicting both where the objects are in the image and what type [&hellip;] <span class=\"read-more-link\"><a class=\"read-more\" href=\"https:\/\/www.aiproblog.com\/index.php\/2019\/05\/28\/how-to-train-an-object-detection-model-to-find-kangaroos-in-photographs-r-cnn-with-keras\/\">Read More<\/a><\/span><\/p>\n","protected":false},"author":1,"featured_media":2199,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_bbp_topic_count":0,"_bbp_reply_count":0,"_bbp_total_topic_count":0,"_bbp_total_reply_count":0,"_bbp_voice_count":0,"_bbp_anonymous_reply_count":0,"_bbp_topic_count_hidden":0,"_bbp_reply_count_hidden":0,"_bbp_forum_subforum_count":0,"footnotes":""},"categories":[24],"tags":[],"_links":{"self":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/2198"}],"collection":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/comments?post=2198"}],"version-history":[{"count":0,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/2198\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media\/2199"}],"wp:attachment":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media?parent=2198"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/categories?post=2198"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/tags?post=2198"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}