{"id":718,"date":"2018-06-24T06:38:55","date_gmt":"2018-06-24T06:38:55","guid":{"rendered":"https:\/\/www.aiproblog.com\/index.php\/2018\/06\/24\/the-next-big-thing-in-data-science-is-biology\/"},"modified":"2018-06-24T06:38:55","modified_gmt":"2018-06-24T06:38:55","slug":"the-next-big-thing-in-data-science-is-biology","status":"publish","type":"post","link":"https:\/\/www.aiproblog.com\/index.php\/2018\/06\/24\/the-next-big-thing-in-data-science-is-biology\/","title":{"rendered":"The Next Big Thing in Data Science is \u2026. Biology"},"content":{"rendered":"<p>Author: William Vorhies<\/p>\n<div>\n<p><strong><em>Summary:<\/em><\/strong> <em>Computational Synthetic Biology (CSB) is likely to be both the next big thing and perhaps most important field to exploit data science.\u00a0 As the name implies, this lies at the intersection of data science and biological research.\u00a0 Big advancements and big investments are already starting to occur here.\u00a0 Data scientists with deep learning skills will want to check this out.<\/em><\/p>\n<p>\u00a0<\/p>\n<p><a href=\"http:\/\/api.ning.com\/files\/lHb5gfQBUPcfN71uZ6zCm696PMgSBr5Y5E5a6aynptV-3J6cp2fz-*N6t8mBZv9tCys-l5p8qejJB4AAPIlGzHfXojkeWM9d\/comp_biology_460.jpg\" target=\"_self\"><img decoding=\"async\" src=\"http:\/\/api.ning.com\/files\/lHb5gfQBUPcfN71uZ6zCm696PMgSBr5Y5E5a6aynptV-3J6cp2fz-*N6t8mBZv9tCys-l5p8qejJB4AAPIlGzHfXojkeWM9d\/comp_biology_460.jpg?width=350\" width=\"350\" class=\"align-right\"><\/a>And the next big thing in data science is <em>(wait for it)<\/em> \u2013 biology!\u00a0 Actually <strong>Computational Synthetic Biology (CSB)<\/strong> sometimes referred to as \u2018computational systems biology\u2019 or simply \u2018synthetic biology\u2019.<\/p>\n<p>From the biological researcher\u2019s perspective CSB broadly refers to the design and fabrication of biological components and systems that don\u2019t already exist in the natural world or to the redesign and fabrication of existing biological system.<\/p>\n<p>To the data scientist and particularly the start-up world CSB is a newly emerging field that will capitalize on advances in deep learning.\u00a0<\/p>\n<p>Depending on your personal sense of priority, CSB will remarkably accelerate cures to some of mankind\u2019s most intractable diseases or be the foundation for the next generation of unicorns in the time frame of 5 to 7 years.<\/p>\n<p>Perhaps the better way to frame this is which would you rather be working on, facial recognition to label your friends faces in Facebook, creating chatbots for that travel platform, or working to cure cancer and extend quality human lifetimes.<\/p>\n<p>\u00a0<\/p>\n<p><span style=\"font-size: 12pt;\"><strong>Isn\u2019t This Just Bioinformatics?<\/strong><\/span><\/p>\n<p>Like most important innovations CSB wasn\u2019t born yesterday.\u00a0 The discovery and use of restriction enzymes in 1978 is sometimes cited as the first use of engineering concepts in biology.<\/p>\n<p>Just as deep learning has had to wait for MPP and the use of GPUs to sufficiently accelerate compute, CSB remained mostly a concept through the decoding of the human genome in 2003 followed by the explosion of genomic data in the ensuing 15 years.<\/p>\n<p>Early bioinformatics attempted to solve problems appropriate for the beginning stages of our understanding of genomics.\u00a0 For example how to assemble a full genome model or mark specific areas of DNA using SNPs (single nucleotide polymorphism) of which there are about 10 million in the human genome.<\/p>\n<p>\u00a0<\/p>\n<p><span style=\"font-size: 12pt;\"><strong>CSB is Not Bioinformatics Business as Usual.\u00a0<\/strong><\/span><\/p>\n<p><a href=\"http:\/\/api.ning.com\/files\/lHb5gfQBUPfx6pU4iOl*4vDjp0bcviUUq6*qeO9oTIB02rKSLRF5*5PEfSY426Oi3mfAfw1UzDnZuoTpqnyQp9h7qX28BefK\/diseasegenenetwork.jpg\" target=\"_self\"><img decoding=\"async\" src=\"http:\/\/api.ning.com\/files\/lHb5gfQBUPfx6pU4iOl*4vDjp0bcviUUq6*qeO9oTIB02rKSLRF5*5PEfSY426Oi3mfAfw1UzDnZuoTpqnyQp9h7qX28BefK\/diseasegenenetwork.jpg?width=350\" width=\"350\" class=\"align-right\"><\/a>Starting with the explosion in deep learning capabilities just two or three years ago, the first visionary biologist\/data scientist teams began to explore how to exploit these new synergies in seemingly unrelated disciplines.<\/p>\n<p>To give you a sense of how new and wide open this field is, the website Angel.co which tracks the formation and investment in startups lists a little over 4 Million startups, the great majority of which are related to tech.\u00a0 A little over 5,000 are targeting \u2018Big Data\u2019 and another 5,000 are categorized as \u2018Analytics\u2019.\u00a0 Only 222 are identified as bioinformatics and only a portion of these are pursuing CSB.<\/p>\n<p>This feels like the age of deep learning in about 2010, still three years out from having image classification or speech recognition hit the 95% accuracy rate that ushered in 10,000 new AI startups and applications.<\/p>\n<p>\u00a0<\/p>\n<p><span style=\"font-size: 12pt;\"><strong>Some Examples<\/strong><\/span><\/p>\n<p>Needless to say, in materials published so far the innovators in this field have been shy about saying much about their proprietary algorithms other than that are based on deep learning.\u00a0 Here are a few snapshots of what\u2019s underway.<\/p>\n<p><strong>Hexagon Bio:<\/strong>\u00a0 Some three-quarters of antibiotics and half of anticancer compounds, including penicillin and statins came from naturally occurring fungi (you know, mushrooms and molds). \u00a0But discovery of new compounds has been largely haphazard and based on researcher\u2019s intuition.<\/p>\n<p>Hexagon mines the fungal genome of over 2,000 species of mushrooms and molds to predict which gene clusters are most likely to produce useful compounds.\u00a0 They then fit their test microorganisms with custom-printed DNA parts to produce likely compounds that might, for example, attack cancer cells.\u00a0 They currently have roughly 22 compounds that show clinical promise.<\/p>\n<p>In addition to their proprietary algorithms Hexagon has moved to utilize the most efficient tools of the trade like DNA sequencing and automated workstations.\u00a0 It\u2019s also using a technology making it much faster to synthesize DNA by essentially downloading and printing copies of gene clusters.\u00a0 These can be used to redesign a yeast with the press of a button.<\/p>\n<p>In the last 18 months they\u2019ve raised $8 Million from private investors.<\/p>\n<p>The fungal drug discovery field is particularly hot with competitors differentiating on how quickly and accurately their algorithms can spot potentially useful sections of DNA.\u00a0 Others playing in this field include:<\/p>\n<p><strong>LifeMine Therapeutics:<\/strong> A startup co-founded by Harvard University chemical biologist landed a $55 million Series A round from a large group of investors including WuXi Healthcare Ventures, Google and Merck\u2019s venture arm.<\/p>\n<p><strong>Lodo Therapeutics Corp.<\/strong> signed a genome-mining deal with a unit of Roche for $969 million in May.<\/p>\n<p><strong>Adapsyn Bioscience Inc.<\/strong> received $162 Million from Pfizer in January for microbe mining.<\/p>\n<p>\u00a0<\/p>\n<p><span style=\"font-size: 12pt;\"><strong>Not All CSB Involves Wet Lab Work<\/strong><\/span><\/p>\n<p><strong>BenevolentAI<\/strong> is pursuing the discovery of new solutions to the diseases of inflammation, neurodegeneration, orphan diseases, and rare cancers.\u00a0 As a group these don\u2019t offer the blockbuster market size necessary to attract research dollars from the major pharma companies.\u00a0 BenevolentAI believes the answers to many of these may already exist in the untapped research created by pharma R&#038;D organization.<\/p>\n<p>Their approach is to develop an advanced artificial intelligence platform which they label a deep judgement system.\u00a0 This platform, a kind of advanced Watson QAM, learns and reasons from the interaction between human judgement and data.<\/p>\n<p>Using vast amounts of unstructured data in scientific papers, patents, clinical trial information and from a large number of structured data sets the platform attempts to identify previously hidden scientific knowledge and deduces what \u2018should\u2019 be known based on what \u2018is\u2019 already known.<\/p>\n<p>\u00a0<\/p>\n<p><span style=\"font-size: 12pt;\"><strong>Generative Models May Be the Cutting Edge<\/strong><\/span><\/p>\n<p>Harvard chemistry professor Alan Aspuru-Guzik has harnessed generative DNN architecture to suggest molecular architectures that might replicate the combined properties of two different drugs, for example aspirin with ibuprofen.\u00a0 Combinations of effective drugs and combinations of effective protocols would greatly accelerate our ability to cure more diseases effectively and cost efficiently.<\/p>\n<p>We more often think of using generative DNNs (RNNs, LSTMs) in applications like Google\u2019s Smart Reply feature that suggests responses to emails.\u00a0 But use potential molecular architectures as the input and the AI is able to suggest potential combinations that would both physically fit together and potentially have the combined therapeutic effect.\u00a0<\/p>\n<p>In December 2017 Aspuru-Guzik and colleagues at Harvard, the University of Toronto, and Cambridge <a href=\"https:\/\/arxiv.org\/abs\/1610.02415\"><em><u>published promising results<\/u><\/em><\/a> of the generative model trained on 250,000 drug-like molecules.<\/p>\n<p>\u00a0<\/p>\n<p><span style=\"font-size: 12pt;\"><strong>What Sorts of Data Scientists are These Companies Looking For?<\/strong><\/span><\/p>\n<p>For those of you who might be interested in making the switch, your deep learning skills in CNNs, RNNs, LSTMs, and Watson style QAMs will serve you well, depending on the company.\u00a0 The job descriptions we looked at called out Python and R but not much else specific to bioinformatics.<\/p>\n<p>The exception is that the descriptions we saw asked for more than a passing familiarity with biological research.\u00a0 Our guess is that there aren\u2019t enough data scientist with parallel degrees in biology to go around and that these companies will begin to favor strong data science over biology.<\/p>\n<p>On the other hand, if we were advising our children what to study as they approach high school and college the combination of data science and biology looks strong.\u00a0<\/p>\n<p>Our guess is that this field is just getting underway and that to become as mature as tech AI is today will take another 7 to 10 years.\u00a0 That could be a good long career run for young data scientists today, or a good entry point for new data scientists graduating from school in 10 years.<\/p>\n<p>Where we are today with CSB is roughly equivalent to Henry Ford\u2019s hand built Model A.\u00a0 Between advances in data science and automation in this field, we could be routinely designing or editing genomes on computer screens in the not too distant future.<\/p>\n<p>George Church, a genome scientist at Harvard Medical Schools says \u201cI think this could be bigger than the space revolution or the computer revolution\u201d.\u00a0 We think so too.<\/p>\n<p>\u00a0<\/p>\n<p><a href=\"https:\/\/www.datasciencecentral.com\/profiles\/blog\/list?user=0h5qapp2gbuf8\"><em><u>Other articles by Bill Vorhies.<\/u><\/em><\/a><\/p>\n<p>\u00a0<\/p>\n<p>About the author:\u00a0 Bill Vorhies is Editorial Director for Data Science Central and has practiced as a data scientist since 2001.\u00a0 He can be reached at:<\/p>\n<p><a href=\"mailto:Bill@DataScienceCentral.com\">Bill@DataScienceCentral.com<\/a><\/p>\n<\/div>\n<p><a href=\"https:\/\/www.datasciencecentral.com\/xn\/detail\/6448529:BlogPost:734542\">Go to Source<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Author: William Vorhies Summary: Computational Synthetic Biology (CSB) is likely to be both the next big thing and perhaps most important field to exploit data [&hellip;] <span class=\"read-more-link\"><a class=\"read-more\" href=\"https:\/\/www.aiproblog.com\/index.php\/2018\/06\/24\/the-next-big-thing-in-data-science-is-biology\/\">Read More<\/a><\/span><\/p>\n","protected":false},"author":1,"featured_media":468,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_bbp_topic_count":0,"_bbp_reply_count":0,"_bbp_total_topic_count":0,"_bbp_total_reply_count":0,"_bbp_voice_count":0,"_bbp_anonymous_reply_count":0,"_bbp_topic_count_hidden":0,"_bbp_reply_count_hidden":0,"_bbp_forum_subforum_count":0,"footnotes":""},"categories":[26],"tags":[],"_links":{"self":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/718"}],"collection":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/comments?post=718"}],"version-history":[{"count":0,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/718\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media\/472"}],"wp:attachment":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media?parent=718"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/categories?post=718"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/tags?post=718"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}