{"id":1142,"date":"2018-10-10T06:37:12","date_gmt":"2018-10-10T06:37:12","guid":{"rendered":"https:\/\/www.aiproblog.com\/index.php\/2018\/10\/10\/practicing-no-code-data-science\/"},"modified":"2018-10-10T06:37:12","modified_gmt":"2018-10-10T06:37:12","slug":"practicing-no-code-data-science","status":"publish","type":"post","link":"https:\/\/www.aiproblog.com\/index.php\/2018\/10\/10\/practicing-no-code-data-science\/","title":{"rendered":"Practicing \u2018No Code\u2019 Data Science"},"content":{"rendered":"<p>Author: William Vorhies<\/p>\n<div>\n<p><strong><em>Summary:<\/em><\/strong><em>\u00a0 We are entering a new phase in the practice of data science, the \u2018Code-Free\u2019 era.\u00a0 Like all major changes this one has not sprung fully grown but the movement is now large enough that its momentum is clear.\u00a0 Here\u2019s what you need to know.<\/em><\/p>\n<p>\u00a0<\/p>\n<p><a href=\"http:\/\/api.ning.com\/files\/BTuxSudnJV32GqRTp77qxIRjyusYYZLOIUOdTmB7NYoKp8CNEhemMq8FLzqdnmlqkStC9T3cb5KTyw-k*NSk1DGUhwbwb-ja\/nocodingrequired1.png\" target=\"_self\"><img decoding=\"async\" src=\"http:\/\/api.ning.com\/files\/BTuxSudnJV32GqRTp77qxIRjyusYYZLOIUOdTmB7NYoKp8CNEhemMq8FLzqdnmlqkStC9T3cb5KTyw-k*NSk1DGUhwbwb-ja\/nocodingrequired1.png?width=300\" width=\"300\" class=\"align-right\"><\/a>We are entering a new phase in the practice of data science, the \u2018Code-Free\u2019 era.\u00a0 Like all major changes this one has not sprung fully grown but the movement is now large enough that its momentum is clear.<\/p>\n<p>Barely a week goes by that we don\u2019t learn about some new automated \/ no-code capability being introduced.\u00a0 Sometimes these are new startups with integrated offerings.\u00a0 More frequently they\u2019re features or modules being added by existing analytic platform vendors.<\/p>\n<p>I\u2019ve been following these automated machine learning (AML) platforms since they emerged.\u00a0 I wrote first about them in the spring of 2016 under the somewhat scary title \u201c<a href=\"https:\/\/www.datasciencecentral.com\/profiles\/blogs\/data-scientists-automated-and-unemployed-by-2025\"><em>Data Scientists Automated and Unemployed by 2025!<\/em><\/a><em>\u201d.<\/em><\/p>\n<p><em>Of course this was never my prediction, but in the last 2 \u00bd years the spread of automated features in our profession has been striking.<\/em><\/p>\n<p><em>\u00a0<\/em><\/p>\n<p><span style=\"font-size: 12pt;\"><strong>No Code Data Science<\/strong><\/span><\/p>\n<p>No-Code data science, or automated machine learning, or as Gartner has tried to brand this, \u2018augmented\u2019 data science offers a continuum of ease-of-use.\u00a0 These range from:<\/p>\n<ul>\n<li><strong><a href=\"http:\/\/api.ning.com\/files\/BTuxSudnJV139oS4qyEUx2-LLoqAlgY9StOwDuVq6Z2WENIz7W2adNfVxLDAX2d-bEGhF4zI-BpSpxOWdl6ocsWbo32ypRA9\/nohands2.jpg\" target=\"_self\"><img decoding=\"async\" src=\"http:\/\/api.ning.com\/files\/BTuxSudnJV139oS4qyEUx2-LLoqAlgY9StOwDuVq6Z2WENIz7W2adNfVxLDAX2d-bEGhF4zI-BpSpxOWdl6ocsWbo32ypRA9\/nohands2.jpg\" width=\"218\" class=\"align-right\"><\/a>Guided Platforms:<\/strong> Platforms with highly guided modeling procedures (but still requiring the user to move through the steps, (e.g. BigML, SAS, Alteryx). Classic drag-and-drop platforms are the basis for this generation.<\/li>\n<li><strong>Automated Machine Learning (AML):<\/strong> Fully automated machine learning platforms (e.g. DataRobot).<\/li>\n<li><strong>Conversational Analytics:<\/strong> In this last version, the user merely poses the question to be solved in common English and the platform presents the best answer, selecting data, features, modeling technique, and presumably even best data visualization.<\/li>\n<\/ul>\n<p>This list also pretty well describes the developmental timeline.\u00a0 Guided Platforms are now old hat.\u00a0 AML platforms are becoming numerous and mature.\u00a0 Conversational analytics is just beginning.<\/p>\n<p>\u00a0<\/p>\n<p><span style=\"font-size: 12pt;\"><strong>Not Just for Advanced Analytics<\/strong><\/span><\/p>\n<p>This smart augmentation of our tools extends beyond predictive \/ prescriptive modeling into the realm of data blending and prep, and even into data viz.\u00a0 What this means is that code-free smart features are being made available to classical BI business analysts, and of course to power user LOB managers (aka Citizen Data Scientists).<\/p>\n<p>The market drivers for this evolution are well known.\u00a0 In advanced analytics and AI it\u2019s about the shortage, cost, and acquisition of sufficient skilled data scientists.\u00a0 In this realm it\u2019s about time to insight, efficiency, and consistency.\u00a0 Essentially doing more with less and faster.<\/p>\n<p>However in the data prep, blending, feature identification world which is also important to data scientists, the real draw is the much larger data analyst \/ BI practitioner world.\u00a0 In this world the ETL of classic static data is still a huge burden and time delay that is moving rapidly from an IT specialist function to self-service.<\/p>\n<p>\u00a0<\/p>\n<p><span style=\"font-size: 12pt;\"><strong>Everything Old is New Again<\/strong><\/span><\/p>\n<p>When I started in data science in about 2001 SAS and SPSS were the dominant players and were already moving away from their proprietary code toward drag-and-drop, the earliest form of this automation. \u00a0<\/p>\n<p>The transition in academia 7 or 8 years later to teaching in R seems to have been driven financially by the fact that although SAS and SPSS gave essentially free access to students, they still charged instructors, albeit at a large academic discount. \u00a0R however was free.<\/p>\n<p>We then regressed back to an age, continuing till today when to be a data scientist means working in code.\u00a0 That\u2019s the way this current generation of data scientists has been taught, and expectedly, that\u2019s how they practice.\u00a0<\/p>\n<p>There has also been an incorrect bias that working in a drag-and-drop system did not allow the fine grain hyperparameter tuning that code allows.\u00a0 If you\u2019ve ever worked in SAS Enterprise Miner or its competitors you know this is incorrect, and in fact that fine tuning is made all the easier.<\/p>\n<p>In my mind this was always an unnecessary digression back to the bad old days of coding-only which tended to take the new practitioner\u2019s eye off the ball of the fundamentals and make it look like just another programming language to master.\u00a0 So I for one both welcome and expected this return to procedures that are both speedy and consistent among practitioners.<\/p>\n<p>\u00a0<\/p>\n<p><span style=\"font-size: 12pt;\"><strong>What About Model Quality<\/strong><\/span><\/p>\n<p><a href=\"http:\/\/api.ning.com\/files\/BTuxSudnJV04PIISrtuNLCaMXH7G3ZdH*MxQ-D6NEbHJVNFavhBSjRc2OT*aWOu8-F7oq5999uw0AAWQNoiTn4r0wKbnTdGR\/Bulls_Eye_with_Split_Arrow.jpg\" target=\"_self\"><img decoding=\"async\" src=\"http:\/\/api.ning.com\/files\/BTuxSudnJV04PIISrtuNLCaMXH7G3ZdH*MxQ-D6NEbHJVNFavhBSjRc2OT*aWOu8-F7oq5999uw0AAWQNoiTn4r0wKbnTdGR\/Bulls_Eye_with_Split_Arrow.jpg?width=300\" width=\"300\" class=\"align-right\"><\/a>We tend to think of a \u2018win\u2019 in advanced analytics as improving the accuracy of a model.\u00a0 There\u2019s a perception that relying on automated No-Code solutions gives up some of this accuracy.\u00a0 This isn\u2019t true.<\/p>\n<p>The AutoML platforms like DataRobot, Tazi.ai, and OneClick.ai (among many others) not only run hundreds of model types in parallel including variations on hyperparameters, but they also perform transforms, feature selection, and even some feature engineering.\u00a0 It\u2019s unlikely that you\u2019re going to beat one of these platforms on pure accuracy.\u00a0<\/p>\n<p>A caveat here is that domain expertise applied to feature engineering is still a human advantage.<\/p>\n<p>Perhaps more importantly, when we\u2019re talking about variations in accuracy at the second or third data point, is the many weeks you spent on development a good cost tradeoff compared to the few days or even hours these AutoML platforms offer?<\/p>\n<p>\u00a0<\/p>\n<p><span style=\"font-size: 12pt;\"><strong>The Broader Impact of No Code<\/strong><\/span><\/p>\n<p>It seems to me that the biggest beneficiaries of no-code are actually classic data analysts and LOB managers who continue to be most focused on BI static data.\u00a0 The standalone data blending and prep platforms are a huge benefit to this group (and to IT whose workload is significantly lightened).<\/p>\n<p>These no-code data prep platforms like ClearStory Data, Paxata, and Trifacta are moving rapidly to incorporate ML features into their processes that help users select which data sources are appropriate to blend, what the data items actually mean (using more ad hoc sources in the absence of good data dictionaries), and even extending into feature engineering and feature selection.\u00a0<\/p>\n<p>Modern data prep platforms are using embedded ML for example for smart automated cleaning or treatment of outliers.<\/p>\n<p>Others like Octopai, just reviewed by Gartner as one of \u201c5 Cool Companies\u201d focus on enabling users to quickly find trusted data through automation by using machine learning and pattern analysis to determine the relationships among different data elements, the context in which the data was created, and the data\u2019s prior uses and transformations.<\/p>\n<p>These platforms also enable secure self-service by enforcing permissions and protecting PID and other similarly sensitive data.<\/p>\n<p>Even data viz leader Tableau is rolling out conversational analytic features using NLP and other ML tools to allow users to pose queries in plain English and return optimum visualizations.<\/p>\n<p>\u00a0<\/p>\n<p><span style=\"font-size: 12pt;\"><strong>What Does This Actually Mean for Data Scientists<\/strong><\/span><\/p>\n<p>Gartner believes that within two years, by 2020, citizen data scientists will surpass data scientists in the quantity and value of the advanced analytics they produce.\u00a0 They propose that data scientists will instead focus on specialized problems and embedding enterprise-grade models into applications.<\/p>\n<p>I disagree.\u00a0 This would seem to relegate data scientists to the role of QA and implementation.\u00a0 That\u2019s not what we signed on for.<\/p>\n<p>My take is that this will rapidly expand the use of advanced analytics deeper and deeper into organizations thanks to smaller groups of data scientists being able to handle more and more projects.<\/p>\n<p>We\u2019ve already emerged by only a year or two from where the data scientist\u2019s most important skills included blending and cleaning the data, and selecting the right predictive algorithms for the task.\u00a0 These are specifically the areas that augmented\/automatic no-code tools are taking over.<\/p>\n<p>Companies that must create, monitor, and manage hundreds or thousands of models have been the earliest adopters, specifically insurance and financial services.<\/p>\n<p>What\u2019s that leave?\u00a0 It leaves the senior role of <a href=\"https:\/\/www.datasciencecentral.com\/profiles\/blogs\/analytics-translator-the-most-important-new-role-in-analytics\"><em>Analytic Translator<\/em><\/a>.\u00a0 That\u2019s the role McKinsey recently identified as the most important in any data science initiative.\u00a0 In short, the job of Analytics Translator is to:<\/p>\n<ol>\n<li>Lead the identification of opportunities where advanced analytics can make a difference.<\/li>\n<li>Facilitate the process of prioritizing these opportunities.<\/li>\n<li>Frequently serve as project manager on the projects.<\/li>\n<li>Actively champion adoption of the solutions across the business and promote cost effective scaling.<\/li>\n<\/ol>\n<p>In other words, translate business problems into data science projects and lead in quantifying the various types of risk and rewards that allow these projects to be prioritized.<\/p>\n<p>\u00a0<\/p>\n<p><span style=\"font-size: 12pt;\"><strong>What About AI?<\/strong><\/span><\/p>\n<p>Yes even our most recent advancements into image, text, and speech with CNNs and RNNs are rapidly being rolled out as automated no-code solutions.\u00a0 And it couldn\u2019t come fast enough because the shortage of data scientists with deep learning skills is even greater than with our more general practitioners.<\/p>\n<p>Both Microsoft and Google rolled out automated deep learning platforms within the last year.\u00a0 These started with transfer learning but are headed toward full AutoDL.\u00a0 See Microsoft Custom Vision Services (<a href=\"https:\/\/www.customvision.ai\/\"><em>https:\/\/www.customvision.ai\/<\/em><\/a><span>) and<\/span> Google\u2019s similar entry <a href=\"https:\/\/cloud.google.com\/automl\/\"><em>Cloud AutoML<\/em><\/a><span>.<\/span><\/p>\n<p>There are also a number of startup integrated AutoDL platforms.\u00a0 We reviewed <a href=\"https:\/\/www.oneclick.ai\/\"><em>OneClick.AI<\/em><\/a> earlier this year.\u00a0 They include both a full AutoML and AutoDL platform.\u00a0 Gartner recently nominated <a name=\"_Toc526321075\"><\/a><a href=\"https:\/\/dimensionalmechanics.com\/\"><em>DimensionalMechanics<\/em><\/a> as one of its \u201c5 Cool Companies\u201d with an AutoDL platform.<\/p>\n<p><em>For a while I tried to personally keep up with the list of vendors of both No-Code AutoML and AutoDL and offer updates on their capabilities.\u00a0 This rapidly became too much.\u00a0<\/em><\/p>\n<p><em>I was hoping Gartner or some other worthy group would step up with a comprehensive review and in 2017 Gartner did a fairly lengthy report \u201c<\/em><a href=\"https:\/\/www.datarobot.com\/resource\/complimentary-gartner-report-augmented-analytics-is-the-future-of-data-and-analytics\/\">Augmented Analytics In the Future of Data and Analytics<\/a><em>\u201d.\u00a0 The report was a good broad brush but failed to capture many of the vendors I was personally aware of.<\/em><\/p>\n<p>To the best of my knowledge there\u2019s still no comprehensive listing of all the platforms that offer either complete automation or significantly automated features.\u00a0 They do however run from IBM and SAS all the way down to small startups, all worthy of your consideration.<\/p>\n<p>Many of these are mentioned or reviewed in the articles linked below.\u00a0 If you\u2019re using advanced analytics in any form, or simply want to make your traditional business analysis function better, look at the solutions mentioned in these.<\/p>\n<p>\u00a0<\/p>\n<p><strong>Additional articles on Automated Machine Learning, Automated Deep Learning, and Other No-Code Solutions<\/strong><\/p>\n<p><a href=\"https:\/\/www.datasciencecentral.com\/profiles\/blogs\/what-s-new-in-data-prep\"><em>What\u2019s New in Data Prep<\/em><\/a><em><u>\u00a0\u00a0 (September 2018)<\/u><\/em><\/p>\n<p><a href=\"https:\/\/www.datasciencecentral.com\/profiles\/blogs\/democratizing-deep-learning-the-stanford-dawn-project\"><em>Democratizing Deep Learning \u2013 The Stanford Dawn Project<\/em><\/a> <em><u>(September 2018)<\/u><\/em><\/p>\n<p><a href=\"https:\/\/www.datasciencecentral.com\/profiles\/blogs\/transfer-learning-deep-learning-for-everyone\"><em>Transfer Learning \u2013Deep Learning for Everyone<\/em><\/a> <em><u>(April 2018)<\/u><\/em><\/p>\n<p><a href=\"https:\/\/www.datasciencecentral.com\/profiles\/blogs\/automated-deep-learning-so-simple-anyone-can-do-it\"><em>Automated Deep Learning \u2013 So Simple Anyone Can Do It<\/em><\/a> <em><u>(April 2018)<\/u><\/em><\/p>\n<p><a href=\"https:\/\/www.datasciencecentral.com\/profiles\/blogs\/next-generation-automated-machine-learning-aml\"><em>Next Generation Automated Machine Learning (AML)<\/em><\/a> <em><u>(April 2018)<\/u><\/em><\/p>\n<p><a href=\"https:\/\/www.datasciencecentral.com\/profiles\/blogs\/more-on-fully-automated-machine-learning\"><em>More on Fully Automated Machine Learning<\/em><\/a> <em><u>(August 2017)<\/u><\/em><\/p>\n<p><a href=\"https:\/\/www.datasciencecentral.com\/profiles\/blogs\/automated-machine-learning-for-professionals\"><em><u>Automated Machine Learning for Professionals<\/u><\/em><\/a>\u00a0 (July 2017)<\/p>\n<p><a href=\"https:\/\/www.datasciencecentral.com\/profiles\/blogs\/data-scientists-automated-and-unemployed-by-2025-update\"><em><u>Data Scientists Automated and Unemployed by 2025 &#8211; Update!<\/u><\/em><\/a>\u00a0 (July 2017)<\/p>\n<p><a href=\"https:\/\/www.datasciencecentral.com\/profiles\/blogs\/data-scientists-automated-and-unemployed-by-2025\"><em>Data Scientists Automated and Unemployed by 2025!<\/em><\/a> <u>\u00a0(April 2016)<\/u><\/p>\n<p>\u00a0<\/p>\n<p><a href=\"https:\/\/www.datasciencecentral.com\/profiles\/blog\/list?user=0h5qapp2gbuf8\"><em>Other articles by Bill Vorhies.<\/em><\/a><\/p>\n<p>\u00a0<\/p>\n<p>About the author:\u00a0 Bill Vorhies is Editorial Director for Data Science Central and has practiced as a data scientist since 2001.\u00a0 He can be reached at:<\/p>\n<p><a href=\"mailto:Bill@Data-Magnum.com\">Bill@Data-Magnum.com<\/a> <span>or<\/span> <a href=\"mailto:Bill@DataScienceCentral.com\">Bill@DataScienceCentral.com<\/a><\/p>\n<p><span>\u00a0<\/span><\/p>\n<\/div>\n<p><a href=\"https:\/\/www.datasciencecentral.com\/xn\/detail\/6448529:BlogPost:766729\">Go to Source<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Author: William Vorhies Summary:\u00a0 We are entering a new phase in the practice of data science, the \u2018Code-Free\u2019 era.\u00a0 Like all major changes this one [&hellip;] <span class=\"read-more-link\"><a class=\"read-more\" href=\"https:\/\/www.aiproblog.com\/index.php\/2018\/10\/10\/practicing-no-code-data-science\/\">Read More<\/a><\/span><\/p>\n","protected":false},"author":1,"featured_media":462,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_bbp_topic_count":0,"_bbp_reply_count":0,"_bbp_total_topic_count":0,"_bbp_total_reply_count":0,"_bbp_voice_count":0,"_bbp_anonymous_reply_count":0,"_bbp_topic_count_hidden":0,"_bbp_reply_count_hidden":0,"_bbp_forum_subforum_count":0,"footnotes":""},"categories":[26],"tags":[],"_links":{"self":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/1142"}],"collection":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/comments?post=1142"}],"version-history":[{"count":0,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/1142\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media\/456"}],"wp:attachment":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media?parent=1142"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/categories?post=1142"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/tags?post=1142"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}