{"id":2912,"date":"2019-12-10T16:00:01","date_gmt":"2019-12-10T16:00:01","guid":{"rendered":"https:\/\/www.aiproblog.com\/index.php\/2019\/12\/10\/this-object-recognition-dataset-stumped-the-worlds-best-computer-vision-models\/"},"modified":"2019-12-10T16:00:01","modified_gmt":"2019-12-10T16:00:01","slug":"this-object-recognition-dataset-stumped-the-worlds-best-computer-vision-models","status":"publish","type":"post","link":"https:\/\/www.aiproblog.com\/index.php\/2019\/12\/10\/this-object-recognition-dataset-stumped-the-worlds-best-computer-vision-models\/","title":{"rendered":"This object-recognition dataset stumped the world\u2019s best computer vision models"},"content":{"rendered":"<p>Author: Kim Martineau | MIT Quest for Intelligence<\/p>\n<div>\n<p>Computer vision models have learned to identify objects in photos so accurately that some can outperform humans on some datasets. But when those same object detectors are turned loose in the real world, their performance noticeably drops, creating reliability concerns for self-driving cars and other safety-critical systems that use machine vision.<\/p>\n<p>In an effort to close this performance gap, a team of MIT and IBM researchers set out to create a very different kind of object-recognition dataset. It\u2019s called <a href=\"http:\/\/objectnet.dev\/\">ObjectNet,<\/a> a play on ImageNet, the crowdsourced database of photos responsible for launching much of the modern boom in artificial intelligence.\u00a0<\/p>\n<p>Unlike ImageNet, which features photos taken from Flickr and other social media sites, ObjectNet features photos taken by paid freelancers. Objects are shown tipped on their side, shot at odd angles, and displayed in clutter-strewn rooms. When leading object-detection models were tested on ObjectNet<strong>,<\/strong> their accuracy rates fell from a high of 97 percent on ImageNet to just 50-55 percent.<\/p>\n<p>\u201cWe created this dataset to tell people the object-recognition problem continues to be a hard problem,\u201d says <a href=\"https:\/\/www.csail.mit.edu\/person\/boris-katz\">Boris Katz<\/a>, a research scientist at MIT\u2019s <a href=\"https:\/\/www.csail.mit.edu\/\">Computer Science and Artificial Intelligence Laboratory<\/a> (CSAIL) and <a href=\"https:\/\/cbmm.mit.edu\/\">Center for Brains, Minds and Machines<\/a> (CBMM).\u00a0 \u201cWe need better, smarter algorithms.\u201d Katz and his colleagues will present ObjectNet and their results at the <a href=\"https:\/\/nips.cc\/\">Conference on Neural Information Processing Systems (NeurIPS)<\/a>.<\/p>\n<p>Deep learning, the technique driving much of the recent progress in AI, uses layers of artificial &#8220;neurons&#8221; to find patterns in vast amounts of raw data. It learns to pick out, say, the chair in a photo after training on hundreds to thousands of examples. But even datasets with millions of images can\u2019t show each object in all of its possible orientations and settings, creating problems when the models encounter these objects in real life.<\/p>\n<p>ObjectNet is different from conventional image datasets in another important way: it contains no training images. Most datasets are divided into data for training the models and testing their performance. But the training set often shares subtle similarities with the test set, in effect giving the models a sneak peak at the test.\u00a0<\/p>\n<p>At first glance, <a href=\"http:\/\/www.image-net.org\/\">ImageNet<\/a>, at 14 million images, seems enormous. But when its training set is excluded, it\u2019s comparable in size to ObjectNet, at 50,000 photos.\u00a0<\/p>\n<p>\u201cIf we want to know how well algorithms will perform in the real world, we should test them on images that are unbiased and that they\u2019ve never seen before,\u201d says study co-author <a href=\"https:\/\/cbmm.mit.edu\/about\/people\/barbu\">Andrei Barbu<\/a>, a research scientist at CSAIL and CBMM.<em>\u00a0<\/em><\/p>\n<p><strong>A dataset that tries to capture the complexity of real-world objects\u00a0<\/strong><\/p>\n<p>Few people would think to share the photos from ObjectNet with their friends, and that\u2019s the point. The researchers hired freelancers from Amazon Mechanical Turk to take photographs of hundreds of randomly posed household objects. Workers received photo assignments on an app, with animated instructions telling them how to orient the assigned object, what angle to shoot from, and whether to pose the object in the kitchen, bathroom, bedroom, or living room.\u00a0<\/p>\n<p>They wanted to eliminate three common biases: objects shown head-on, in iconic positions, and in highly correlated settings \u2014 for example, plates stacked in the kitchen.\u00a0<\/p>\n<p>It took three years to conceive of the dataset and design an app that would standardize the data-gathering process. \u201cDiscovering how to gather data in a way that controls for various biases was incredibly tricky,\u201d says study co-author <a href=\"http:\/\/david-mayo.com\/\">David Mayo<\/a>, a graduate student at MIT\u2019s <a href=\"https:\/\/www.eecs.mit.edu\/\">Department of Electrical Engineering and Computer Science.<\/a> \u201cWe also had to run experiments to make sure our instructions were clear and that the workers knew exactly what was being asked of them.\u201d\u00a0<\/p>\n<p>It took another year to gather the actual data, and in the end, half of all the photos freelancers submitted had to be discarded for failing to meet the researchers\u2019 specifications. In an attempt to be helpful, some workers added labels to their objects, staged them on white backgrounds, or otherwise tried to improve on the aesthetics of the photos they were assigned to shoot.<\/p>\n<p>Many of the photos were taken outside of the United States, and thus, some objects may look unfamiliar. Ripe oranges are green, bananas come in different sizes, and clothing appears in a variety of shapes and textures.<\/p>\n<p><strong>Object Net vs. ImageNet: how leading object-recognition models compare<\/strong><\/p>\n<p>When the researchers tested state-of-the-art computer vision models on ObjectNet, they found a performance drop of 40-45 percentage points from ImageNet. The results show that object detectors still struggle to understand that objects are three-dimensional and can be rotated and moved into new contexts, the researchers say. \u201cThese notions are not built into the architecture of modern object detectors,\u201d says study co-author <a href=\"https:\/\/researcher.watson.ibm.com\/researcher\/view.php?person=us-dgutfre\">Dan Gutfreund<\/a>, a researcher at IBM.<\/p>\n<p>To show that ObjectNet is difficult precisely because of how objects are viewed and positioned, the researchers allowed the models to train on half of the ObjectNet data before testing them on the remaining half. Training and testing on the same dataset typically improves performance, but here the models improved only slightly, suggesting that object detectors have yet to fully comprehend how objects exist in the real world.<\/p>\n<p>Computer vision models have progressively improved since 2012, when an object detector called AlexNet crushed the competition at the annual ImageNet contest. As datasets have gotten bigger, performance has also improved.<\/p>\n<p>But designing bigger versions of ObjectNet, with its added viewing angles and orientations, won\u2019t necessarily lead to better results, the researchers warn. The goal of ObjectNet is to motivate researchers to come up with the next wave of revolutionary techniques, much as the initial launch of the ImageNet challenge did.<\/p>\n<p>\u201cPeople feed these detectors huge amounts of data, but there are diminishing returns,\u201d says Katz. \u201cYou can\u2019t view an object from every angle and in every context. Our hope is that this new dataset will result in robust computer vision without surprising failures in the real world.\u201d<\/p>\n<p>The study\u2019s other authors are Julian Alvero, William Luo, Chris Wang, and Joshua Tenenbaum of MIT. The research was funded by the National Science Foundation, MIT\u2019s Center for Brains, Minds, and Machines, the MIT-IBM Watson AI Lab, Toyota Research Institute, and the SystemsThatLearn@CSAIL initiative.<\/p>\n<\/div>\n<p><a href=\"http:\/\/news.mit.edu\/2019\/object-recognition-dataset-stumped-worlds-best-computer-vision-models-1210\">Go to Source<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Author: Kim Martineau | MIT Quest for Intelligence Computer vision models have learned to identify objects in photos so accurately that some can outperform humans [&hellip;] <span class=\"read-more-link\"><a class=\"read-more\" href=\"https:\/\/www.aiproblog.com\/index.php\/2019\/12\/10\/this-object-recognition-dataset-stumped-the-worlds-best-computer-vision-models\/\">Read More<\/a><\/span><\/p>\n","protected":false},"author":1,"featured_media":465,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_bbp_topic_count":0,"_bbp_reply_count":0,"_bbp_total_topic_count":0,"_bbp_total_reply_count":0,"_bbp_voice_count":0,"_bbp_anonymous_reply_count":0,"_bbp_topic_count_hidden":0,"_bbp_reply_count_hidden":0,"_bbp_forum_subforum_count":0,"footnotes":""},"categories":[24],"tags":[],"_links":{"self":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/2912"}],"collection":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/comments?post=2912"}],"version-history":[{"count":0,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/2912\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media\/473"}],"wp:attachment":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media?parent=2912"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/categories?post=2912"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/tags?post=2912"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}