{"id":7832,"date":"2024-12-20T22:00:00","date_gmt":"2024-12-20T22:00:00","guid":{"rendered":"https:\/\/www.aiproblog.com\/index.php\/2024\/12\/20\/ecologists-find-computer-vision-models-blind-spots-in-retrieving-wildlife-images\/"},"modified":"2024-12-20T22:00:00","modified_gmt":"2024-12-20T22:00:00","slug":"ecologists-find-computer-vision-models-blind-spots-in-retrieving-wildlife-images","status":"publish","type":"post","link":"https:\/\/www.aiproblog.com\/index.php\/2024\/12\/20\/ecologists-find-computer-vision-models-blind-spots-in-retrieving-wildlife-images\/","title":{"rendered":"Ecologists find computer vision models\u2019 blind spots in retrieving wildlife images"},"content":{"rendered":"<p>Author: Alex Shipps | MIT CSAIL<\/p>\n<div>\n<p>Try taking a picture of each of North America&#8217;s\u00a0<a href=\"https:\/\/blogs.ifas.ufl.edu\/news\/2022\/05\/05\/11000-tree-varieties-in-north-america-but-only-a-few-species-dot-cityscapes\/\">roughly<\/a> 11,000 tree species, and you\u2019ll have a mere fraction of the millions of photos within nature image datasets. These massive collections of snapshots \u2014 ranging from\u00a0<a href=\"http:\/\/www.naba.org\/\">butterflies<\/a> to\u00a0<a href=\"https:\/\/happywhale.com\/home\">humpback whales<\/a> \u2014 are a great research tool for ecologists because they provide evidence of organisms\u2019 unique behaviors, rare conditions, migration patterns, and responses to pollution and other forms of climate change.<\/p>\n<p>While comprehensive, nature image datasets aren\u2019t yet as useful as they could be. It\u2019s time-consuming to search these databases and retrieve the images most relevant to your hypothesis. You\u2019d be better off with an automated research assistant \u2014 or perhaps artificial intelligence systems called multimodal vision language models (VLMs). They\u2019re trained on both text and images, making it easier for them to pinpoint finer details, like the specific trees in the background of a photo.<\/p>\n<p>But just how well can VLMs assist nature researchers with image retrieval? A team from MIT\u2019s Computer Science and Artificial Intelligence Laboratory (CSAIL), University College London, iNaturalist, and elsewhere designed a performance test to find out. Each VLM\u2019s task: locate and reorganize the most relevant results within the team\u2019s \u201cINQUIRE\u201d dataset, composed of 5 million wildlife pictures and 250 search prompts from ecologists and other biodiversity experts.\u00a0<\/p>\n<p><strong>Looking for that special frog<\/strong><\/p>\n<p>In these evaluations, the researchers found that larger, more advanced VLMs, which are trained on far more data, can sometimes get researchers the results they want to see. The models performed reasonably well on straightforward queries about visual content, like identifying debris on a reef, but struggled significantly with queries requiring expert knowledge, like identifying specific biological conditions or behaviors. For example, VLMs somewhat easily uncovered examples of jellyfish on the beach, but struggled with more technical prompts like \u201caxanthism in a green frog,\u201d a condition that limits their ability to make their skin yellow.<\/p>\n<p>Their findings indicate that the models need much more domain-specific training data to process difficult queries. MIT PhD student Edward Vendrow, a CSAIL affiliate who co-led work on the dataset in a new\u00a0<a href=\"https:\/\/arxiv.org\/abs\/2411.02537\">paper<\/a>, believes that by familiarizing with more informative data, the VLMs could one day be great research assistants. \u201cWe want to build retrieval systems that find the exact results scientists seek when monitoring biodiversity and analyzing climate change,\u201d says Vendrow. \u201cMultimodal models don\u2019t quite understand more complex scientific language yet, but we believe that INQUIRE will be an important benchmark for tracking how they improve in comprehending scientific terminology and ultimately helping researchers automatically find the exact images they need.\u201d<\/p>\n<p>The team\u2019s experiments illustrated that larger models tended to be more effective for both simpler and more intricate searches due to their expansive training data. They first used the INQUIRE dataset to test if VLMs could narrow a pool of 5 million images to the top 100 most-relevant results (also known as \u201cranking\u201d). For straightforward search queries like \u201ca reef with manmade structures and debris,\u201d relatively large models like \u201c<a href=\"https:\/\/huggingface.co\/docs\/transformers\/en\/model_doc\/siglip\">SigLIP<\/a>\u201d found matching images, while smaller-sized CLIP models struggled. According to Vendrow, larger VLMs are \u201conly starting to be useful\u201d at ranking tougher queries.<\/p>\n<p>Vendrow and his colleagues also evaluated how well multimodal models could re-rank those 100 results, reorganizing which images were most pertinent to a search. In these tests, even huge LLMs trained on more curated data, like GPT-4o, struggled: Its precision score was only 59.6 percent, the highest score achieved by any model.<\/p>\n<p>The researchers presented these results at the Conference on Neural Information Processing Systems (NeurIPS) earlier this month.<\/p>\n<p><strong>Inquiring for INQUIRE<\/strong><\/p>\n<p>The INQUIRE dataset includes search queries based on discussions with ecologists, biologists, oceanographers, and other experts about the types of images they\u2019d look for, including animals\u2019 unique physical conditions and behaviors. A team of annotators then spent 180 hours searching the iNaturalist dataset with these prompts, carefully combing through roughly 200,000 results to label 33,000 matches that fit the prompts.<\/p>\n<p>For instance, the annotators used queries like \u201ca hermit crab using plastic waste as its shell\u201d and \u201ca California condor tagged with a green \u201826\u2019\u201d to identify the subsets of the larger image dataset that depict these specific, rare events.<\/p>\n<p>Then, the researchers used the same search queries to see how well VLMs could retrieve iNaturalist images. The annotators\u2019 labels revealed when the models struggled to understand scientists\u2019 keywords, as their results included images previously tagged as irrelevant to the search. For example, VLMs\u2019 results for \u201credwood trees with fire scars\u201d sometimes included images of trees without any markings.<\/p>\n<p>\u201cThis is careful curation of data, with a focus on capturing real examples of scientific inquiries across research areas in ecology and environmental science,\u201d says Sara Beery, the Homer A. Burnell Career Development Assistant Professor at MIT, CSAIL principal investigator, and co-senior author of the work. \u201cIt\u2019s proved vital to expanding our understanding of the current capabilities of VLMs in these potentially impactful scientific settings. It has also outlined gaps in current research that we can now work to address, particularly for complex compositional queries, technical terminology, and the fine-grained, subtle differences that delineate categories of interest for our collaborators.\u201d<\/p>\n<p>\u201cOur findings imply that some vision models are already precise enough to aid wildlife scientists with retrieving some images, but many tasks are still too difficult for even the largest, best-performing models,\u201d says Vendrow. \u201cAlthough INQUIRE is focused on ecology and biodiversity monitoring, the wide variety of its queries means that VLMs that perform well on INQUIRE are likely to excel at analyzing large image collections in other observation-intensive fields.\u201d<\/p>\n<p><strong>Inquiring minds want to see<\/strong><\/p>\n<p>Taking their project further, the researchers are working with iNaturalist to develop a query system to better help scientists and other curious minds find the images they actually want to see. Their working\u00a0<a href=\"http:\/\/inquire-demo.csail.mit.edu\/\" target=\"_blank\" rel=\"noopener\">demo<\/a> allows users to filter searches by species, enabling quicker discovery of relevant results like, say, the diverse eye colors of cats. Vendrow and co-lead author Omiros Pantazis, who recently received his PhD from University College London, also aim to improve the re-ranking system by augmenting current models to provide better results.<\/p>\n<p>University of Pittsburgh Associate Professor Justin Kitzes highlights INQUIRE\u2019s ability to uncover secondary data. \u201cBiodiversity datasets are rapidly becoming too large for any individual scientist to review,\u201d says Kitzes, who wasn\u2019t involved in the research. \u201cThis paper draws attention to a difficult and unsolved problem, which is how to effectively search through such data with questions that go beyond simply \u2018who is here\u2019 to ask instead about individual characteristics, behavior, and species interactions. Being able to efficiently and accurately uncover these more complex phenomena in biodiversity image data will be critical to fundamental science and real-world impacts in ecology and conservation.\u201d<\/p>\n<p>Vendrow, Pantazis, and Beery wrote the paper with iNaturalist software engineer Alexander Shepard, University College London professors Gabriel Brostow and Kate Jones, University of Edinburgh associate professor and co-senior author Oisin Mac Aodha, and University of Massachusetts at Amherst Assistant Professor Grant Van Horn, who served as co-senior author. Their work was supported, in part, by the Generative AI Laboratory at the University of Edinburgh, the U.S. National Science Foundation\/Natural Sciences and Engineering Research Council of Canada Global Center on AI and Biodiversity Change, a Royal Society Research Grant, and the Biome Health Project funded by the World Wildlife Fund United Kingdom.<\/p>\n<\/div>\n<p><a href=\"https:\/\/news.mit.edu\/2024\/ecologists-find-computer-vision-models-blind-spots-retrieving-wildlife-images-1220\">Go to Source<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Author: Alex Shipps | MIT CSAIL Try taking a picture of each of North America&#8217;s\u00a0roughly 11,000 tree species, and you\u2019ll have a mere fraction of [&hellip;] <span class=\"read-more-link\"><a class=\"read-more\" href=\"https:\/\/www.aiproblog.com\/index.php\/2024\/12\/20\/ecologists-find-computer-vision-models-blind-spots-in-retrieving-wildlife-images\/\">Read More<\/a><\/span><\/p>\n","protected":false},"author":1,"featured_media":458,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_bbp_topic_count":0,"_bbp_reply_count":0,"_bbp_total_topic_count":0,"_bbp_total_reply_count":0,"_bbp_voice_count":0,"_bbp_anonymous_reply_count":0,"_bbp_topic_count_hidden":0,"_bbp_reply_count_hidden":0,"_bbp_forum_subforum_count":0,"footnotes":""},"categories":[24],"tags":[],"_links":{"self":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/7832"}],"collection":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/comments?post=7832"}],"version-history":[{"count":0,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/7832\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media\/466"}],"wp:attachment":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media?parent=7832"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/categories?post=7832"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/tags?post=7832"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}