{"id":6945,"date":"2023-11-20T14:00:00","date_gmt":"2023-11-20T14:00:00","guid":{"rendered":"https:\/\/www.aiproblog.com\/index.php\/2023\/11\/20\/synthetic-imagery-sets-new-bar-in-ai-training-efficiency\/"},"modified":"2023-11-20T14:00:00","modified_gmt":"2023-11-20T14:00:00","slug":"synthetic-imagery-sets-new-bar-in-ai-training-efficiency","status":"publish","type":"post","link":"https:\/\/www.aiproblog.com\/index.php\/2023\/11\/20\/synthetic-imagery-sets-new-bar-in-ai-training-efficiency\/","title":{"rendered":"Synthetic imagery sets new bar in AI training efficiency"},"content":{"rendered":"<p>Author: Rachel Gordon | MIT CSAIL<\/p>\n<div>\n<p>Data is the new soil, and in this fertile new ground, MIT researchers are planting more than just pixels. By using synthetic images to train machine learning models, a team of scientists recently surpassed results obtained from traditional \u201creal-image\u201d training methods.\u00a0<\/p>\n<p>At the core of the approach is a system called <a href=\"https:\/\/arxiv.org\/abs\/2306.00984\" target=\"_blank\" rel=\"noopener\">StableRep<\/a>, which doesn&#8217;t just use any synthetic images; it generates them through ultra-popular text-to-image models like Stable Diffusion. It\u2019s like creating worlds with words.\u00a0<\/p>\n<p>So what\u2019s in StableRep&#8217;s secret sauce? A strategy called \u201cmulti-positive contrastive learning.\u201d<\/p>\n<p>\u201cWe&#8217;re teaching the model to learn more about high-level concepts through context and variance, not just feeding it data,\u201d says Lijie Fan, MIT PhD student in electrical engineering, affiliate of the MIT Computer Science and Artificial Intelligence Laboratory (CSAIL), lead researcher on the work. \u201cWhen multiple images, all generated from the same text, all treated as depictions of the same underlying thing, the model dives deeper into the concepts behind the images, say the object, not just their pixels.\u201d<\/p>\n<p>This approach considers multiple images spawned from identical text prompts as positive pairs, providing additional information during training, not just adding more diversity but specifying to the vision system which images are alike and which are different. Remarkably, StableRep outshone the prowess of top-tier models trained on real images, such as SimCLR and CLIP, in extensive datasets.<\/p>\n<p>\u201cWhile StableRep helps mitigate the challenges of data acquisition in machine learning, it also ushers in a stride towards a new era of AI training techniques. The capacity to produce high-caliber, diverse synthetic images on command could help curtail cumbersome expenses and resources,\u201d says Fan.\u00a0<\/p>\n<p>The process of data collection has never been straightforward. Back in the 1990s, researchers had to manually capture photographs to assemble datasets for objects and faces. The 2000s saw individuals scouring the internet for data. However, this raw, uncurated data often contained discrepancies when compared to real-world scenarios and reflected societal biases, presenting a distorted view of reality. The task of cleansing datasets through human intervention is not only expensive, but also exceedingly challenging. Imagine, though, if this arduous data collection could be distilled down to something as simple as issuing a command in natural language.\u00a0<\/p>\n<p>A pivotal aspect of StableRep\u2019s triumph is the adjustment of the \u201cguidance scale\u201d in the generative model, which ensures a delicate balance between the synthetic images\u2019 diversity and fidelity. When finely tuned, synthetic images used in training these self-supervised models were found to be as effective, if not more so, than real images.<\/p>\n<p>Taking it a step forward, language supervision was added to the mix, creating an enhanced variant: StableRep+. When trained with 20 million synthetic images, StableRep+ not only achieved superior accuracy but also displayed remarkable efficiency compared to CLIP models trained with a staggering 50 million real images.<\/p>\n<p>Yet, the path ahead isn&#8217;t without its potholes. The researchers candidly address several limitations, including the current slow pace of image generation, semantic mismatches between text prompts and the resultant images, potential amplification of biases, and complexities in image attribution, all of which are imperative to address for future advancements. Another issue is that StableRep requires first training the generative model on large-scale real data. The team acknowledges that starting with real data remains a necessity; however, when you have a good generative model, you can repurpose it for new tasks, like training recognition models and visual representations.\u00a0<\/p>\n<p>The team notes that they haven\u2019t gotten around the need to start with real data; it\u2019s just that once you have a good generative model you can repurpose it for new tasks, like training recognition models and visual representations.\u00a0<\/p>\n<p>While StableRep offers a good solution by diminishing the dependency on vast real-image collections, it brings to the fore concerns regarding hidden biases within the uncurated data used for these text-to-image models. The choice of text prompts, integral to the image synthesis process, is not entirely free from bias, \u201cindicating the essential role of meticulous text selection or possible human curation,\u201d says Fan.\u00a0<\/p>\n<p>\u201cUsing the latest text-to-image models, we&#8217;ve gained unprecedented control over image generation, allowing for a diverse range of visuals from a single text input. This surpasses real-world image collection in efficiency and versatility. It proves especially useful in specialized tasks, like balancing image variety in long-tail recognition, presenting a practical supplement to using real images for training,\u201d says Fan. \u201cOur work signifies a step forward in visual learning, towards the goal of offering cost-effective training alternatives while highlighting the need for ongoing improvements in data quality and synthesis.\u201d<\/p>\n<p>\u201cOne dream of generative model learning has long been to be able to generate data useful for discriminative model training,\u201d says Google DeepMind researcher and University of Toronto professor of computer science David Fleet, who was not involved in the paper. \u201cWhile we have seen some signs of life, the dream has been elusive, especially on large-scale complex domains like high-resolution images. This paper provides compelling evidence, for the first time to my knowledge, that the dream is becoming a reality. They show that contrastive learning from massive amounts of synthetic image data can produce representations that outperform those learned from real data at scale, with the potential to improve myriad downstream vision tasks.\u201d<\/p>\n<p>Fan is joined by Yonglong Tian PhD \u201922 as lead authors of the paper, as well as MIT associate professor of electrical engineering and computer science and CSAIL principal investigator Phillip Isola; Google researcher and OpenAI technical staff member Huiwen Chang; and Google staff research scientist Dilip Krishnan. The team will present StableRep at the 2023 Conference on Neural Information Processing Systems (NeurIPS) in New Orleans.<\/p>\n<\/div>\n<p><a href=\"https:\/\/news.mit.edu\/2023\/synthetic-imagery-sets-new-bar-ai-training-efficiency-1120\">Go to Source<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Author: Rachel Gordon | MIT CSAIL Data is the new soil, and in this fertile new ground, MIT researchers are planting more than just pixels. [&hellip;] <span class=\"read-more-link\"><a class=\"read-more\" href=\"https:\/\/www.aiproblog.com\/index.php\/2023\/11\/20\/synthetic-imagery-sets-new-bar-in-ai-training-efficiency\/\">Read More<\/a><\/span><\/p>\n","protected":false},"author":1,"featured_media":465,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_bbp_topic_count":0,"_bbp_reply_count":0,"_bbp_total_topic_count":0,"_bbp_total_reply_count":0,"_bbp_voice_count":0,"_bbp_anonymous_reply_count":0,"_bbp_topic_count_hidden":0,"_bbp_reply_count_hidden":0,"_bbp_forum_subforum_count":0,"footnotes":""},"categories":[24],"tags":[],"_links":{"self":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/6945"}],"collection":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/comments?post=6945"}],"version-history":[{"count":0,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/6945\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media\/465"}],"wp:attachment":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media?parent=6945"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/categories?post=6945"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/tags?post=6945"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}