{"id":6420,"date":"2023-04-12T20:55:00","date_gmt":"2023-04-12T20:55:00","guid":{"rendered":"https:\/\/www.aiproblog.com\/index.php\/2023\/04\/12\/mit-csail-researchers-discuss-frontiers-of-generative-ai\/"},"modified":"2023-04-12T20:55:00","modified_gmt":"2023-04-12T20:55:00","slug":"mit-csail-researchers-discuss-frontiers-of-generative-ai","status":"publish","type":"post","link":"https:\/\/www.aiproblog.com\/index.php\/2023\/04\/12\/mit-csail-researchers-discuss-frontiers-of-generative-ai\/","title":{"rendered":"MIT CSAIL researchers discuss frontiers of generative AI"},"content":{"rendered":"<p>Author: Rachel Gordon | MIT CSAIL<\/p>\n<div>\n<p>The emergence of generative artificial intelligence has ignited a deep philosophical exploration into the nature of consciousness, creativity, and authorship. As we bear witness to new advances in the field, it\u2019s increasingly apparent that these synthetic agents possess a remarkable capacity to create, iterate, and challenge our traditional notions of intelligence. But what does it really mean for an AI system to be \u201cgenerative,\u201d with newfound blurred boundaries of creative expression between humans and machines?\u00a0<\/p>\n<p>For those who feel as if \u201cgenerative artificial intelligence\u201d\u00a0\u2014 a type of AI that can cook up new and original data or content similar to what it&#8217;s been trained on \u2014 cascaded into existence like an overnight sensation, while indeed the new capabilities have surprised many, the underlying technology has been in the making for some time.\u00a0<\/p>\n<p>But understanding true capacity can be as indistinct as some of the generative content these models produce. To that end, researchers from MIT\u2019s Computer Science and Artificial Intelligence Laboratory (CSAIL) convened in discussions around the capabilities and limitations of generative AI, as well as its potential impacts on society and industries, with regard to language, images, and code.\u00a0<\/p>\n<p>There are various models of generative AI, each with their own unique approaches and techniques. These include generative adversarial networks (GANs), variational autoencoders (VAEs), and diffusion models, which have all shown off exceptional power in various industries and fields, from art to music and medicine. With that has also come a slew of ethical and social conundrums, such as the potential for generating fake news, deepfakes, and misinformation. Making these considerations is critical, the researchers say, to continue studying the capabilities and limitations of generative AI and ensure ethical use and responsibility.\u00a0<\/p>\n<p>During opening remarks, to illustrate visual prowess of these models, MIT professor of electrical engineering and computer science (EECS) and CSAIL Director Daniela Rus pulled out a special gift her students recently bestowed upon her: a collage of AI portraits ripe with smiling shots of Rus, running a spectrum of mirror-like reflections. Yet, there was no commissioned artist in sight.\u00a0<\/p>\n<p>The machine was to thank.\u00a0<\/p>\n<p>Generative models learn to make imagery by downloading many photos from the internet and trying to make the output image look like the sample training data. There are many ways to train a neural network generator, and diffusion models are just one popular way. These models, explained by MIT associate professor of EECS and CSAIL principal investigator Phillip Isola, map from random noise to imagery. Using a process called diffusion, the model will convert structured objects like images into random noise, and the process is inverted by training a neural net to remove noise step by step until that noiseless image is obtained. If you\u2019ve ever tried a hand at using DALL-E 2, where a sentence and random noise are input, and the noise congeals into images, you\u2019ve used a diffusion model.<\/p>\n<p>\u201cTo me, the most thrilling aspect of generative data is not its ability to create photorealistic images, but rather the unprecedented level of control it affords us. It offers us new knobs to turn and dials to adjust, giving rise to exciting possibilities. Language has emerged as a particularly powerful interface for image generation, allowing us to input a description such as \u2018Van Gogh style\u2019 and have the model produce an image that matches that description,\u201d says Isola. \u201cYet, language is not all-encompassing; some things are difficult to convey solely through words. For instance, it might be challenging to communicate the precise location of a mountain in the background of a portrait. In such cases, alternative techniques like sketching can be used to provide more specific input to the model and achieve the desired output.\u201d\u00a0<\/p>\n<p>Isola then used a bird&#8217;s image to show how different factors that control the various aspects of an image created by a computer are like \u201cdice rolls.\u201d By changing these factors, such as the color or shape of the bird, the computer can generate many different variations of the image.\u00a0<\/p>\n<p>And if you haven\u2019t used an image generator, there\u2019s a chance you might have used similar models for text. Jacob Andreas, MIT assistant professor of EECS and CSAIL principal investigator, brought the audience from images into the world of generated words, acknowledging the impressive nature of models that can write poetry, have conversations, and do targeted generation of specific documents all in the same hour.\u00a0<\/p>\n<p>How do these models seem to express things that look like desires and beliefs? They leverage the power of word embeddings, Andreas explains, where words with similar meanings are assigned numerical values (vectors) and are placed in a space with many different dimensions. When these values are plotted, words that have similar meanings end up close to each other in this space. The proximity of those values shows how closely related the words are in meaning. (For example, perhaps \u201cRomeo\u201d is usually close to \u201cJuliet\u201d, and so on). Transformer models, in particular, use something called an \u201cattention mechanism\u201d that selectively focuses on specific parts of the input sequence, allowing for multiple rounds of dynamic interactions between different elements. This iterative process can be likened to a series of &#8220;wiggles&#8221; or fluctuations between the different points, leading to the predicted next word in the sequence.\u00a0<\/p>\n<p>\u201cImagine being in your text editor and having a magical button in the top right corner that you could press to transform your sentences into beautiful and accurate English. We have had grammar and spell checking for a while, sure, but we can now explore many other ways to incorporate these magical features into our apps,\u201d says Andreas. \u201cFor instance, we can shorten a lengthy passage, just like how we shrink an image in our image editor, and have the words appear as we desire. We can even push the boundaries further by helping users find sources and citations as they&#8217;re developing an argument. However, we must keep in mind that even the best models today are far from being able to do this in a reliable or trustworthy way, and there&#8217;s a huge amount of work left to do to make these sources reliable and unbiased. Nonetheless, there\u2019s a massive space of possibilities where we can explore and create with this technology.\u201d\u00a0<\/p>\n<p>Another feat of large language models, which can at times feel quite \u201cmeta,\u201d was also explored: models that write code \u2014 sort of like little magic wands, except instead of spells, they conjure up lines of code, bringing (some) software developer dreams to life. MIT professor of EECS and CSAIL principal investigator Armando Solar-Lezama recalls some history from 2014, explaining how, at the time, there was a significant advancement in using \u201clong short-term memory (LSTM),\u201d a technology for language translation that could be used to correct programming assignments for predictable text with a well-defined task. Two years later, everyone\u2019s favorite basic human need came on the scene: attention, ushered in by the 2017 Google paper introducing the mechanism, \u201cAttention is All You Need.\u201d Shortly thereafter, a former CSAILer, Rishabh Singh, was part of a team that used attention to construct whole programs for relatively simple tasks in an automated way. Soon after, transformers emerged, leading to an explosion of research on using text-to-text mapping to generate code.\u00a0<\/p>\n<p>\u201cCode can be run, tested, and analyzed for vulnerabilities, making it very powerful. However, code is also very brittle and small errors can have a significant impact on its functionality or security,\u201d says Solar-Lezema. \u201cAnother challenge is the sheer size and complexity of commercial software, which can be difficult for even the largest models to handle. Additionally, the diversity of coding styles and libraries used by different companies means that the bar for accuracy when working with code can be very high.\u201d<\/p>\n<p>In the ensuing question-and-answer-based discussion, Rus opened with one on content: How can we make the output of generative AI more powerful, by incorporating domain-specific knowledge and constraints into the models? \u201cModels for processing complex visual data such as 3-D models, videos, and light fields, which resemble the holodeck in Star Trek, still heavily rely on domain knowledge to function efficiently,\u201d says Isola. \u201cThese models incorporate equations of projection and optics into their objective functions and optimization routines. However, with the increasing availability of data, it\u2019s possible that some of the domain knowledge could be replaced by the data itself, which will provide sufficient constraints for learning. While we cannot predict the future, it\u2019s plausible that as we move forward, we might need less structured data. Even so, for now, domain knowledge remains a crucial aspect of working with structured data.&#8221;\u00a0<\/p>\n<p>The panel also discussed the crucial nature of assessing the validity of generative content. Many benchmarks have been constructed to show that models are capable of achieving human-level accuracy in certain tests or tasks that require advanced linguistic abilities. However, upon closer inspection, simply paraphrasing the examples can cause the models to fail completely. Identifying modes of failure has become just as crucial, if not more so, than training the models themselves.\u00a0<\/p>\n<p>Acknowledging the stage for the conversation \u2014 academia \u2014 Solar-Lezama talked about progress in developing large language models against the deep and mighty pockets of industry. Models in academia, he says, \u201cneed really big computers\u201d to create desired technologies that don\u2019t rely too heavily on industry support.\u00a0<\/p>\n<p>Beyond technical capabilities, limitations, and how it\u2019s all evolving, Rus also brought up the moral stakes around living in an AI-generated world, in relation to deepfakes, misinformation, and bias. Isola mentioned newer technical solutions focused on watermarking, which could help users subtly tell whether an image or a piece of text was generated by a machine. \u201cOne of the things to watch out for here, is that this is a problem that\u2019s not going to be solved purely with technical solutions. We can provide the space of solutions and also raise awareness about the capabilities of these models, but it is very important for the broader public to be aware of what these models can actually do,\u201d says Solar-Lezama. \u201cAt the end of the day, this has to be a broader conversation. This should not be limited to technologists, because it is a pretty big social problem that goes beyond the technology itself.\u201d\u00a0<\/p>\n<p>Another inclination around chatbots, robots, and a favored trope in many dystopian pop culture settings was discussed: the seduction of anthropomorphization. Why, for many, is there a natural tendency to project human-like qualities onto nonhuman entities? Andreas explained the opposing schools of thought around these large language models and their seemingly superhuman capabilities.\u00a0<\/p>\n<p>&#8220;Some believe that models like ChatGPT have already achieved human-level intelligence and may even be conscious,&#8221;\u00a0Andreas said, &#8220;but in reality these models still lack the true human-like capabilities to comprehend not only nuance, but sometimes they behave in extremely conspicuous, weird, nonhuman-like ways. On the other hand, some argue that these models are just shallow pattern recognition tools that can\u2019t learn the true meaning of language. But this view also underestimates the level of understanding they can acquire from text. While we should be cautious of overstating their capabilities, we should also not overlook the potential harms of underestimating their impact. In the end, we should approach these models with humility and recognize that there is still much to learn about what they can and can\u2019t do.\u201d\u00a0<\/p>\n<\/div>\n<p><a href=\"https:\/\/news.mit.edu\/2023\/mit-csail-researchers-discuss-frontiers-generative-ai-0412\">Go to Source<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Author: Rachel Gordon | MIT CSAIL The emergence of generative artificial intelligence has ignited a deep philosophical exploration into the nature of consciousness, creativity, and [&hellip;] <span class=\"read-more-link\"><a class=\"read-more\" href=\"https:\/\/www.aiproblog.com\/index.php\/2023\/04\/12\/mit-csail-researchers-discuss-frontiers-of-generative-ai\/\">Read More<\/a><\/span><\/p>\n","protected":false},"author":1,"featured_media":469,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_bbp_topic_count":0,"_bbp_reply_count":0,"_bbp_total_topic_count":0,"_bbp_total_reply_count":0,"_bbp_voice_count":0,"_bbp_anonymous_reply_count":0,"_bbp_topic_count_hidden":0,"_bbp_reply_count_hidden":0,"_bbp_forum_subforum_count":0,"footnotes":""},"categories":[24],"tags":[],"_links":{"self":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/6420"}],"collection":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/comments?post=6420"}],"version-history":[{"count":0,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/6420\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media\/469"}],"wp:attachment":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media?parent=6420"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/categories?post=6420"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/tags?post=6420"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}