{"id":6211,"date":"2023-01-08T05:00:00","date_gmt":"2023-01-08T05:00:00","guid":{"rendered":"https:\/\/www.aiproblog.com\/index.php\/2023\/01\/08\/unpacking-the-black-box-to-build-better-ai-models\/"},"modified":"2023-01-08T05:00:00","modified_gmt":"2023-01-08T05:00:00","slug":"unpacking-the-black-box-to-build-better-ai-models","status":"publish","type":"post","link":"https:\/\/www.aiproblog.com\/index.php\/2023\/01\/08\/unpacking-the-black-box-to-build-better-ai-models\/","title":{"rendered":"Unpacking the \u201cblack box\u201d to build better AI models"},"content":{"rendered":"<p>Author: Adam Zewe | MIT News Office<\/p>\n<div>\n<p>When deep learning models are deployed in the real world, perhaps to detect financial fraud from credit card activity or identify cancer in medical images, they are often able to outperform humans.<\/p>\n<\/p>\n<p>But what exactly are these deep learning models learning? Does a model trained to spot skin cancer in clinical images, for example, actually learn the colors and textures of cancerous tissue, or is it flagging some other features or patterns?<\/p>\n<\/p>\n<p>These powerful machine-learning models are typically based on <a href=\"https:\/\/news.mit.edu\/2017\/explained-neural-networks-deep-learning-0414\" target=\"_blank\" rel=\"noopener\">artificial neural networks<\/a> that can have millions of nodes that process data to make predictions. Due to their complexity, researchers often call these models \u201cblack boxes\u201d because even the scientists who build them don\u2019t understand everything that is going on under the hood.<\/p>\n<\/p>\n<p>Stefanie Jegelka isn\u2019t satisfied with that \u201cblack box\u201d explanation. A newly tenured associate professor in the MIT Department of Electrical Engineering and Computer Science, Jegelka is digging deep into deep learning to understand what these models can learn and how they behave, and how to build certain prior information into these models.<\/p>\n<\/p>\n<p>\u201cAt the end of the day, what a deep-learning model will learn depends on so many factors. But building an understanding that is relevant in practice will help us design better models, and also help us understand what is going on inside them so we know when we can deploy a model and when we can\u2019t. That is critically important,\u201d says Jegelka, who is also a member of the Computer Science and Artificial Intelligence Laboratory (CSAIL) and the Institute for Data, Systems, and Society (IDSS).<\/p>\n<\/p>\n<p>Jegelka is particularly interested in optimizing machine-learning models when input data are in the form of graphs. Graph data pose specific challenges: For instance, information in the data consists of both information about individual nodes and edges, as well as the structure \u2014 what is connected to what. In addition, graphs have mathematical symmetries that need to be respected by the machine-learning model so that, for instance, the same graph always leads to the same prediction. Building such symmetries into a machine-learning model is usually not easy.<\/p>\n<\/p>\n<p>Take molecules, for instance. Molecules can be represented as graphs, with vertices that correspond to atoms and edges that correspond to chemical bonds between them. Drug companies may want to use deep learning to rapidly predict the properties of many molecules, narrowing down the number they must physically test in the lab.<\/p>\n<\/p>\n<p>Jegelka studies methods to build mathematical machine-learning models that can effectively take graph data as an input and output something else, in this case a prediction of a molecule\u2019s chemical properties. This is particularly challenging since a molecule\u2019s properties are determined not only by the atoms within it, but also by the connections between them. \u00a0<\/p>\n<\/p>\n<p>Other examples of machine learning on graphs include traffic routing, chip design, and recommender systems.<\/p>\n<\/p>\n<p>Designing these models is made even more difficult by the fact that data used to train them are often different from data the models see in practice. Perhaps the model was trained using small molecular graphs or traffic networks, but the graphs it sees once deployed are larger or more complex.<\/p>\n<\/p>\n<p>In this case, what can researchers expect this model to learn, and will it still work in practice if the real-world data are different?<\/p>\n<\/p>\n<p>\u201cYour model is not going to be able to learn everything because of some hardness problems in computer science, but what you can learn and what you can\u2019t learn depends on how you set the model up,\u201d Jegelka says.<\/p>\n<\/p>\n<p>She approaches this question by combining her passion for algorithms and discrete mathematics with her excitement for machine learning.<\/p>\n<\/p>\n<p><strong>From butterflies to bioinformatics<\/strong><\/p>\n<\/p>\n<p>Jegelka grew up in a small town in Germany and became interested in science when she was a high school student; a supportive teacher encouraged her to participate in an international science competition. She and her teammates from the U.S. and Singapore won an award for a website they created about butterflies, in three languages.<\/p>\n<\/p>\n<p>\u201cFor our project, we took images of wings with a scanning electron microscope at a local university of applied sciences. I also got the opportunity to use a high-speed camera at Mercedes Benz \u2014 this camera usually filmed combustion engines \u2014 which I used to capture a slow-motion video of the movement of a butterfly\u2019s wings. That was the first time I really got in touch with science and exploration,\u201d she recalls.<\/p>\n<\/p>\n<p>Intrigued by both biology and mathematics, Jegelka decided to study bioinformatics at the University of T\u00fcbingen and the University of Texas at Austin. She had a few opportunities to conduct research as an undergraduate, including an internship in computational neuroscience at Georgetown University, but wasn\u2019t sure what career to follow.<\/p>\n<\/p>\n<p>When she returned for her final year of college, Jegelka moved in with two roommates who were working as research assistants at the Max Planck Institute in T\u00fcbingen.<\/p>\n<\/p>\n<p>\u201cThey were working on machine learning, and that sounded really cool to me. I had to write my bachelor\u2019s thesis, so I asked at the institute if they had a project for me. I started working on machine learning at the Max Planck Institute and I loved it. I learned so much there, and it was a great place for research,\u201d she says.<\/p>\n<\/p>\n<p>She stayed on at the Max Planck Institute to complete a master\u2019s thesis, and then embarked on a PhD in machine learning at the Max Planck Institute and the Swiss Federal Institute of Technology<a>. <\/a><\/p>\n<\/p>\n<p>During her PhD, she explored how concepts from discrete mathematics can help improve machine-learning techniques.<\/p>\n<\/p>\n<p><strong>Teaching models to learn<\/strong><\/p>\n<\/p>\n<p>The more Jegelka learned about machine learning, the more intrigued she became by the challenges of understanding how models behave, and how to steer this behavior.<\/p>\n<\/p>\n<p>\u201cYou can do so much with machine learning, but only if you have the right model and data. It is not just a black-box thing where you throw it at the data and it works. You actually have to think about it, its properties, and what you want the model to learn and do,\u201d she says.<\/p>\n<\/p>\n<p>After completing a postdoc at the University of California at Berkeley, Jegelka was hooked on research and decided to pursue a career in academia. She joined the faculty at MIT in 2015 as an assistant professor.<\/p>\n<\/p>\n<p>\u201cWhat I really loved about MIT, from the very beginning, was that the people really care deeply about research and creativity. That is what I appreciate the most about MIT. The people here really value originality and depth in research,\u201d she says.<\/p>\n<\/p>\n<p>That focus on creativity has enabled Jegelka to explore a broad range of topics.<\/p>\n<\/p>\n<p>In collaboration with other faculty at MIT, she studies machine-learning applications in biology, imaging, computer vision, and materials science.<\/p>\n<\/p>\n<p>But what really drives Jegelka is probing the fundamentals of machine learning, and most recently, the issue of robustness. Often, a model performs well on training data, but its performance deteriorates when it is deployed on slightly different data. Building prior knowledge into a model can make it more reliable, but understanding what information the model needs to be successful and how to build it in is not so simple, she says.<\/p>\n<\/p>\n<p>She is also exploring methods to improve the performance of machine-learning models for image classification.<\/p>\n<\/p>\n<p>Image classification models are everywhere, from the facial recognition systems on mobile phones to tools that identify fake accounts on social media. These models need massive amounts of data for training, but since it is expensive for humans to hand-label millions of images, researchers often use unlabeled datasets to pretrain models instead.<\/p>\n<\/p>\n<p>These models then reuse the representations they have learned when they are fine-tuned later for a specific task.<\/p>\n<\/p>\n<p>Ideally, researchers want the model to learn as much as it can during pretraining, so it can apply that knowledge to its downstream task. But in practice, these models often learn only a few simple correlations \u2014 like that one image has sunshine and one has shade \u2014 and use these \u201cshortcuts\u201d to classify images.<\/p>\n<\/p>\n<p>\u201cWe showed that this is a problem in \u2018contrastive learning,\u2019 which is a standard technique for pre-training, both theoretically and empirically. But we also show that you can influence the kinds of information the model will learn to represent by modifying the types of data you show the model. This is one step toward understanding what models are actually going to do in practice,\u201d she says.<\/p>\n<\/p>\n<p>Researchers still don\u2019t understand everything that goes on inside a deep-learning model, or details about how they can influence what a model learns and how it behaves, but Jegelka looks forward to continue exploring these topics.<\/p>\n<\/p>\n<p>\u201cOften in machine learning, we see something happen in practice and we try to understand it theoretically. This is a huge challenge. You want to build an understanding that matches what you see in practice, so that you can do better. We are still just at the beginning of understanding this,\u201d she says.<\/p>\n<\/p>\n<p>Outside the lab, Jegelka is a fan of music, art, traveling, and cycling. But these days, she enjoys spending most of her free time with her preschool-aged daughter.<\/p>\n<\/div>\n<p><a href=\"https:\/\/news.mit.edu\/2023\/stefanie-jegelka-machine-learning-0108\">Go to Source<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Author: Adam Zewe | MIT News Office When deep learning models are deployed in the real world, perhaps to detect financial fraud from credit card [&hellip;] <span class=\"read-more-link\"><a class=\"read-more\" href=\"https:\/\/www.aiproblog.com\/index.php\/2023\/01\/08\/unpacking-the-black-box-to-build-better-ai-models\/\">Read More<\/a><\/span><\/p>\n","protected":false},"author":1,"featured_media":468,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_bbp_topic_count":0,"_bbp_reply_count":0,"_bbp_total_topic_count":0,"_bbp_total_reply_count":0,"_bbp_voice_count":0,"_bbp_anonymous_reply_count":0,"_bbp_topic_count_hidden":0,"_bbp_reply_count_hidden":0,"_bbp_forum_subforum_count":0,"footnotes":""},"categories":[24],"tags":[],"_links":{"self":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/6211"}],"collection":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/comments?post=6211"}],"version-history":[{"count":0,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/6211\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media\/471"}],"wp:attachment":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media?parent=6211"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/categories?post=6211"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/tags?post=6211"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}