{"id":5707,"date":"2022-06-22T17:00:00","date_gmt":"2022-06-22T17:00:00","guid":{"rendered":"https:\/\/www.aiproblog.com\/index.php\/2022\/06\/22\/how-ai-creates-photorealistic-images-from-text\/"},"modified":"2022-06-22T17:00:00","modified_gmt":"2022-06-22T17:00:00","slug":"how-ai-creates-photorealistic-images-from-text","status":"publish","type":"post","link":"https:\/\/www.aiproblog.com\/index.php\/2022\/06\/22\/how-ai-creates-photorealistic-images-from-text\/","title":{"rendered":"How AI creates photorealistic images from text"},"content":{"rendered":"<p>Author: <\/p>\n<div>\n<div class=\"block-image_full_width\">\n<div class=\"h-c-page\">\n<div class=\"article-image h-c-grid__col--10 h-c-grid__col--offset-1 h-c-grid__col-l--offset-2 h-c-grid__col-l--8\"><img decoding=\"async\" alt=\"Pictures of puppy in a nest emerging from a cracked egg. Photos overlooking a steampunk city with airships. Picture of two robots having a romantic evening at the movies.\" class=\"article-image--full\" src=\"https:\/\/storage.googleapis.com\/gweb-uniblog-publish-prod\/images\/Final_Hero_Image_Imagen_Parti.max-1000x1000.png\" tabindex=\"0\"><\/div>\n<\/div>\n<\/div>\n<div class=\"block-paragraph\">\n<div class=\"rich-text\">\n<p data-block-key=\"93kc5\">Have you ever seen a puppy in a nest emerging from a cracked egg? What about a photo that\u2019s overlooking a steampunk city with airships? Or a picture of two robots having a romantic evening at the movies? These might sound far-fetched, but a novel type of machine learning technology called text-to-image generation makes them possible. These models can generate high-quality, photorealistic images from a simple text prompt.<\/p>\n<p data-block-key=\"ci883\">Within Google Research, our scientists and engineers have been exploring text-to-image generation using a variety of AI techniques. After a lot of testing we recently announced two new text-to-image models \u2014 <a href=\"https:\/\/imagen.research.google\/\">Imagen<\/a> and <a href=\"https:\/\/parti.research.google\/\">Parti<\/a>. Both have the ability to generate photorealistic images but use different approaches. We want to share a little more about how these models work and their potential.<\/p>\n<\/div>\n<\/div>\n<div class=\"block-paragraph\">\n<div class=\"rich-text\">\n<h3 data-block-key=\"p9eqh\">How text-to-image models work<\/h3>\n<p data-block-key=\"asd6s\">With text-to-image models, people provide a text description and the models produce images matching the description as closely as possible. This can be something as simple as \u201can apple\u201d or \u201ca cat sitting on a couch\u201d to more complex details, interactions and descriptive indicators like \u201ca cute sloth holding a small treasure chest. A bright golden glow is coming from the chest.\u201d<\/p>\n<\/div>\n<\/div>\n<div class=\"block-image_full_width\">\n<div class=\"h-c-page\">\n<div class=\"article-image h-c-grid__col-l--6 h-c-grid__col--8 h-c-grid__col-l--offset-3 h-c-grid__col--offset-2\"><img decoding=\"async\" alt=\"A picture of a cute sloth holding a small treasure chest. A bright golden glow is coming from the chest\" class=\"article-image--large\" src=\"https:\/\/storage.googleapis.com\/gweb-uniblog-publish-prod\/images\/Sloth_Image.max-1000x1000.png\" tabindex=\"0\"><\/div>\n<\/div>\n<\/div>\n<div class=\"block-paragraph\">\n<div class=\"rich-text\">\n<p data-block-key=\"nufok\">In the past few years, ML models have been trained on large image datasets with corresponding textual descriptions, resulting in higher quality images and a broader range of descriptions. This has sparked major breakthroughs in this area, including Open AI\u2019s <a href=\"https:\/\/openai.com\/dall-e-2\/\">DALL-E 2<\/a>.<\/p>\n<\/div>\n<\/div>\n<div class=\"block-paragraph\">\n<div class=\"rich-text\">\n<h3 data-block-key=\"v9atw\">How Imagen and Parti work<\/h3>\n<p data-block-key=\"e7l3b\">Imagen and Parti build on previous models. Transformer models are able to process words in relationship to one another in a sentence. They\u00a0are foundational to how we represent text in our text-to-image models. Both models also use a new <a href=\"https:\/\/openreview.net\/forum?id=qw8AKxfYbI\">technique<\/a> that helps generate images that more closely match the text description. While Imagen and Parti use similar technology, they pursue different, but complementary strategies.<\/p>\n<p data-block-key=\"f0ll9\">Imagen is a Diffusion model, which learns to convert a pattern of random dots to images. These images first start as low resolution and then progressively increase in resolution. Recently, Diffusion models have seen success in both <a href=\"https:\/\/iterative-refinement.github.io\/palette\/\">image<\/a> and <a href=\"https:\/\/wavegrad.github.io\/\">audio<\/a> tasks like enhancing image resolution, recoloring black and white photos, editing regions of an image, uncropping images, and text-to-speech synthesis.<\/p>\n<p data-block-key=\"60p7s\">Parti\u2019s approach first <a href=\"https:\/\/ai.googleblog.com\/2022\/05\/vector-quantized-image-modeling-with.html\">converts<\/a> a collection of images into a sequence of code entries, similar to puzzle pieces. A given text prompt is then <a href=\"https:\/\/ai.googleblog.com\/2017\/08\/transformer-novel-neural-network.html\">translated<\/a> into these code entries and a new image is created. This approach takes advantage of existing research and infrastructure for large language models such as <a href=\"https:\/\/ai.googleblog.com\/2022\/04\/pathways-language-model-palm-scaling-to.html\">PaLM<\/a> and is critical for handling long, complex text prompts and producing high-quality images.<\/p>\n<p data-block-key=\"1sag7\">These models have many limitations. For example, neither can reliably produce specific counts of objects (e.g. \u201cten apples\u201d), nor place them correctly based on specific spatial descriptions (e.g. \u201ca red sphere to the left of a blue block with a yellow triangle on it\u201d). Also, as prompts become more complex, the models begin to falter, either missing details or introducing details that were not provided in the prompt. These behaviors are a result of several shortcomings, including lack of explicit training material, limited data representation, and lack of 3D awareness. We hope to address these gaps through broader representations and more effective integration into the text-to-image generation process.<\/p>\n<\/div>\n<\/div>\n<div class=\"block-paragraph\">\n<div class=\"rich-text\">\n<h3 data-block-key=\"164gm\">Taking a responsible approach to Imagen and Parti<\/h3>\n<p data-block-key=\"cq51v\">Text-to-image models are exciting tools for inspiration and creativity. They also come with risks related to disinformation, bias and safety. We\u2019re having discussions around Responsible AI practices and the necessary steps to safely pursue this technology. As an initial step, we\u2019re using easily identifiable watermarks to ensure people can always recognize an Imagen- or Parti-generated image. We\u2019re also conducting experiments to better understand biases of the models, like how they represent people and cultures, while exploring possible mitigations. The <a href=\"https:\/\/arxiv.org\/pdf\/2205.11487.pdf\">Imagen<\/a> and <a href=\"https:\/\/parti.research.google\/paper\">Parti<\/a> papers provide extensive discussion of these issues.<\/p>\n<\/div>\n<\/div>\n<div class=\"block-paragraph\">\n<div class=\"rich-text\">\n<h3 data-block-key=\"164gm\">What\u2019s next for text-to-image models at Google<\/h3>\n<p data-block-key=\"boadj\">We will push on new ideas that combine the best of both models, and expand to related tasks such as adding the ability to interactively generate and edit images through text. We\u2019re also continuing to conduct in-depth comparisons and evaluations to align with our <a href=\"https:\/\/ai.google\/principles\/\">Responsible AI Principles<\/a>. Our goal is to bring user experiences based on these models to the world in a safe, responsible way that will inspire creativity.<\/p>\n<\/div>\n<\/div>\n<\/div>\n<p><a href=\"https:\/\/blog.google\/technology\/research\/how-ai-creates-photorealistic-images-from-text\/\">Go to Source<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Author: Have you ever seen a puppy in a nest emerging from a cracked egg? What about a photo that\u2019s overlooking a steampunk city with [&hellip;] <span class=\"read-more-link\"><a class=\"read-more\" href=\"https:\/\/www.aiproblog.com\/index.php\/2022\/06\/22\/how-ai-creates-photorealistic-images-from-text\/\">Read More<\/a><\/span><\/p>\n","protected":false},"author":1,"featured_media":5708,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_bbp_topic_count":0,"_bbp_reply_count":0,"_bbp_total_topic_count":0,"_bbp_total_reply_count":0,"_bbp_voice_count":0,"_bbp_anonymous_reply_count":0,"_bbp_topic_count_hidden":0,"_bbp_reply_count_hidden":0,"_bbp_forum_subforum_count":0,"footnotes":""},"categories":[24],"tags":[],"_links":{"self":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/5707"}],"collection":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/comments?post=5707"}],"version-history":[{"count":0,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/5707\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media\/5708"}],"wp:attachment":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media?parent=5707"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/categories?post=5707"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/tags?post=5707"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}