{"id":7174,"date":"2024-03-05T05:00:00","date_gmt":"2024-03-05T05:00:00","guid":{"rendered":"https:\/\/www.aiproblog.com\/index.php\/2024\/03\/05\/using-generative-ai-to-improve-software-testing\/"},"modified":"2024-03-05T05:00:00","modified_gmt":"2024-03-05T05:00:00","slug":"using-generative-ai-to-improve-software-testing","status":"publish","type":"post","link":"https:\/\/www.aiproblog.com\/index.php\/2024\/03\/05\/using-generative-ai-to-improve-software-testing\/","title":{"rendered":"Using generative AI to improve software testing"},"content":{"rendered":"<p>Author: Zach Winn | MIT News<\/p>\n<div>\n<p>Generative AI is getting plenty of attention for its ability to create text and images. But those media represent only a fraction of the data that proliferate in our society today. Data are generated every time a patient goes through a medical system, a storm impacts a flight, or a person interacts with a software application.<\/p>\n<\/p>\n<p>Using generative AI to create realistic synthetic data around those scenarios can help organizations more effectively treat patients, reroute planes, or improve software platforms \u2014 especially in scenarios where real-world data are limited or sensitive.<\/p>\n<\/p>\n<p>For the last three years, the MIT spinout DataCebo has offered a generative software system called the Synthetic Data Vault to help organizations create synthetic data to do things like test software applications and train machine learning models.<\/p>\n<\/p>\n<p>The Synthetic Data Vault, or SDV, has been downloaded more than 1 million times, with more than 10,000 data scientists using the open-source library for generating synthetic tabular data. The founders \u2014 Principal Research Scientist Kalyan Veeramachaneni and alumna Neha Patki \u201915, SM \u201916 \u2014 believe the company\u2019s success is due to SDV\u2019s ability to revolutionize software testing.<\/p>\n<\/p>\n<p><strong>SDV goes viral<\/strong><\/p>\n<p>In 2016, Veeramachaneni\u2019s group in the Data to AI Lab unveiled a suite of open-source generative AI tools to help organizations create synthetic data that matched the statistical properties of real data.<\/p>\n<p>Companies can use synthetic data instead of sensitive information in programs while still preserving the statistical relationships between datapoints. Companies can also use synthetic data to run new software through simulations to see how it performs before releasing it to the public.<\/p>\n<p>Veeramachaneni\u2019s group came across the problem because it was working with companies that wanted to share their data for research.<\/p>\n<p>\u201cMIT helps you see all these different use cases,\u201d Patki explains. \u201cYou work with finance companies and health care companies, and all those projects are useful to formulate solutions across industries.\u201d<\/p>\n<p>In 2020, the researchers founded DataCebo to build more SDV features for larger organizations. Since then, the use cases have been as impressive as they\u2019ve been varied.<\/p>\n<p>With DataCebo&#8217;s new flight simulator, for instance, airlines can plan for rare weather events in a way that would be impossible using only historic data. In another application, SDV users synthesized medical records to predict health outcomes for patients with cystic fibrosis. A team from Norway recently used SDV to create synthetic student data to evaluate whether various admissions policies were meritocratic and free from bias.<\/p>\n<p>In 2021, the data science platform Kaggle hosted a competition for data scientists that used SDV to create synthetic data sets to avoid using proprietary data. Roughly 30,000 data scientists participated, building solutions and predicting outcomes based on the company\u2019s realistic data.<\/p>\n<\/p>\n<p>And as DataCebo has grown, it\u2019s stayed true to its MIT roots: All of the company\u2019s current employees are MIT alumni.<\/p>\n<\/p>\n<p><strong>Supercharging software testing<\/strong><\/p>\n<\/p>\n<p>Although their open-source tools are being used for a variety of use cases, the company is focused on growing its traction in software testing.<\/p>\n<\/p>\n<p>\u201cYou need data to test these software applications,\u201d Veeramachaneni says. \u201cTraditionally, developers manually write scripts to create synthetic data. With generative models, created using SDV, you can learn from a sample of data collected and then sample a large volume of synthetic data (which has the same properties as real data), or create specific scenarios and edge cases, and use the data to test your application.\u201d<\/p>\n<\/p>\n<p>For example, if a bank wanted to test a program designed to reject transfers from accounts with no money in them, it would have to simulate many accounts simultaneously transacting. Doing that with data created manually would take a lot of time. With DataCebo\u2019s generative models, customers can create any edge case they want to test.<\/p>\n<\/p>\n<p>\u201cIt\u2019s common for industries to have data that is sensitive in some capacity,\u201d Patki says. \u201cOften when you\u2019re in a domain with sensitive data you\u2019re dealing with regulations, and<strong> <\/strong>even if there aren\u2019t legal regulations, it\u2019s in companies\u2019 best interest to be diligent about who gets access to what at which time. So, synthetic data is always better from a privacy perspective.\u201d<\/p>\n<p><strong>Scaling synthetic data<\/strong><\/p>\n<p>Veeramachaneni believes DataCebo is advancing the field of what it calls synthetic enterprise data, or data generated from user behavior on large companies\u2019 software applications.<\/p>\n<p>\u201cEnterprise data of this kind is complex, and there is no universal availability of it, unlike language data,\u201d Veeramachaneni says. \u201cWhen folks use our publicly available software and report back if works on a certain pattern, we learn a lot of these unique patterns, and it allows us to improve our algorithms. From one perspective, we are building a corpus of these complex patterns, which for language and images is readily available. \u201c<\/p>\n<\/p>\n<p>DataCebo also recently released features to improve SDV\u2019s usefulness, including tools to assess the \u201crealism\u201d of the generated data, called the <a href=\"https:\/\/docs.sdv.dev\/sdmetrics\/\" target=\"_blank\" rel=\"noopener\">SDMetrics library<\/a> as well as a way to compare models\u2019 performances called <a href=\"https:\/\/docs.sdv.dev\/sdgym\" target=\"_blank\" rel=\"noopener\">SDGym<\/a>.<\/p>\n<p>\u201cIt\u2019s about ensuring organizations trust this new data,\u201d Veeramachaneni says. \u201c[Our tools offer] programmable synthetic data, which means we allow enterprises to insert their specific insight and intuition to build more transparent models.\u201d<\/p>\n<p>As companies in every industry rush to adopt AI and other data science tools, DataCebo is ultimately helping them do so in a way that is more transparent and responsible.<\/p>\n<p>\u201cIn the next few years, synthetic data from generative models will transform all data work,\u201d Veeramachaneni says. \u201cWe believe 90 percent of enterprise operations can be done with synthetic data.\u201d<\/p>\n<\/div>\n<p><a href=\"https:\/\/news.mit.edu\/2024\/using-generative-ai-improve-software-testing-datacebo-0305\">Go to Source<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Author: Zach Winn | MIT News Generative AI is getting plenty of attention for its ability to create text and images. But those media represent [&hellip;] <span class=\"read-more-link\"><a class=\"read-more\" href=\"https:\/\/www.aiproblog.com\/index.php\/2024\/03\/05\/using-generative-ai-to-improve-software-testing\/\">Read More<\/a><\/span><\/p>\n","protected":false},"author":1,"featured_media":468,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_bbp_topic_count":0,"_bbp_reply_count":0,"_bbp_total_topic_count":0,"_bbp_total_reply_count":0,"_bbp_voice_count":0,"_bbp_anonymous_reply_count":0,"_bbp_topic_count_hidden":0,"_bbp_reply_count_hidden":0,"_bbp_forum_subforum_count":0,"footnotes":""},"categories":[24],"tags":[],"_links":{"self":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/7174"}],"collection":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/comments?post=7174"}],"version-history":[{"count":0,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/7174\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media\/458"}],"wp:attachment":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media?parent=7174"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/categories?post=7174"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/tags?post=7174"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}