{"id":775,"date":"2018-07-10T06:33:26","date_gmt":"2018-07-10T06:33:26","guid":{"rendered":"https:\/\/www.aiproblog.com\/index.php\/2018\/07\/10\/data-science-is-changing-and-data-scientists-will-need-to-change-too-heres-why-and-how\/"},"modified":"2018-07-10T06:33:26","modified_gmt":"2018-07-10T06:33:26","slug":"data-science-is-changing-and-data-scientists-will-need-to-change-too-heres-why-and-how","status":"publish","type":"post","link":"https:\/\/www.aiproblog.com\/index.php\/2018\/07\/10\/data-science-is-changing-and-data-scientists-will-need-to-change-too-heres-why-and-how\/","title":{"rendered":"Data Science is Changing and Data Scientists will Need to Change Too \u2013 Here\u2019s Why and How"},"content":{"rendered":"<p>Author: William Vorhies<\/p>\n<div>\n<p><strong><em>Summary:<\/em><\/strong><em>\u00a0 Deep changes are underway in how data science is practiced and successfully deployed to solve business problems and create strategic advantage.\u00a0 These same changes point to major changes in how data scientists will do their work.\u00a0 Here\u2019s why and how.<\/em><\/p>\n<p>\u00a0<\/p>\n<p><a href=\"http:\/\/api.ning.com\/files\/75PaEIOOgHOtSxfXlCA*OHpDyGCQ3p9WNmTPRpm99EcTxYpQRby2x36JQ0i8CqTMsIV87xhUtYCK8Q-SaDTcGvu3RlGGOd9L\/bigchangesahead.jpg\" target=\"_self\"><img decoding=\"async\" src=\"http:\/\/api.ning.com\/files\/75PaEIOOgHOtSxfXlCA*OHpDyGCQ3p9WNmTPRpm99EcTxYpQRby2x36JQ0i8CqTMsIV87xhUtYCK8Q-SaDTcGvu3RlGGOd9L\/bigchangesahead.jpg?width=250\" width=\"250\" class=\"align-right\"><\/a>There\u2019s a sea change underway in data science.\u00a0 It\u2019s changing how companies embrace data science and it\u2019s changing the way data scientists do their job.\u00a0 The increasing adoption and strategic importance of advanced analytics of all types is the backdrop.\u00a0 There are two parts to this change.\u00a0<\/p>\n<p>One is what is happening right now as analytic platforms build out to become one-stop shops for data scientists.\u00a0 But the second and more important is what is just beginning but will now take over rapidly.\u00a0 Advanced analytics will become the hidden layer of <strong>Systems of Intelligence<\/strong> (SOI) in the new enterprise applications stack.\u00a0<\/p>\n<p>Both these movements are changing the way data scientists need to do their jobs and how we create value.<\/p>\n<p>\u00a0<\/p>\n<p><span style=\"font-size: 12pt;\"><strong>What\u2019s Happening Now<\/strong><\/span><\/p>\n<p>Advanced analytic platforms are undergoing several evolutionary steps at once.\u00a0 This is the final buildout in the current competitive strategy being used by advanced analytic platforms to capture as many data science users as possible.\u00a0 These last steps include:<\/p>\n<ol>\n<li>Full integration from data blending, through prep, modeling, deployment, and maintenance.<\/li>\n<li>Cloud based so they can expand and contract their MPP resources as required.<\/li>\n<li>Expanding capabilities to include deep learning for text, speech, and image analysis.<\/li>\n<li>Adopting higher and higher levels of automation in both modeling and prep reducing data science labor and increasing speed to solution. Gartner says that within two years 40% of our tasks will be automated.<\/li>\n<\/ol>\n<p>Here are a few examples I\u2019m sure you\u2019ll recognize.\u00a0<\/p>\n<ul>\n<li>Alteryx with roots in data blending is continuously upgrading its on-board analytic tools and expanding access to third party GIS and consumer data such as Experian.<\/li>\n<li>SAS and SPSS have increased blending capability, incorporated MPP, and most recently added enhanced one-click model building and data prep options.<\/li>\n<li>New entrants like DataRobot emphasize labor savings and speed-to-solution through MPP and maximum one-click automation.<\/li>\n<li>The major cloud providers are introducing complete analytic platforms of their own to capture the maximum number of data science users. These include Google\u2019s Cloud Datalab, Microsoft Azure, and Amazon SageMaker.<\/li>\n<\/ul>\n<p>\u00a0<\/p>\n<p><span style=\"font-size: 12pt;\"><strong>The Whole Strategic Focus of Advanced Analytic Platforms is About to Change<\/strong><\/span><\/p>\n<p>We are in the final stages of large analytics users wanting to assemble different packages in a best of breed strategy.\u00a0 Gartner says users, starting with the largest will increasingly consolidate around a single platform.<\/p>\n<p>These same consolidation forces were at work in ERP systems in the 90s or DW\/BI, and CRM systems in the 00s.\u00a0 Give the customer greater efficiency and ease of use with a single vendor solution creating a wide moat of good user experience combined with painful high switching costs.<\/p>\n<p>This is only the end of the last phase and not where advanced analytic platforms are headed over the next two to five years.\u00a0 So far the emphasis has been on internal completeness and self-sufficiency.\u00a0 According to both strategists and Venture Capitalists the next movement will see the advanced analytic platform disappear into an integrated enterprise stack as the critical middle <strong>System of Intelligence<\/strong>.<\/p>\n<p>\u00a0<a href=\"http:\/\/api.ning.com\/files\/75PaEIOOgHN655axnFDcnQr4VR0fAaWX7-3GxXEvutU2Ile10ifnKxp*2-GczmB6z1YaZ3Us1EVl9w4qbEq7TpVNpmjIB8W6\/systemofintelligence.png\" target=\"_self\"><img decoding=\"async\" src=\"http:\/\/api.ning.com\/files\/75PaEIOOgHN655axnFDcnQr4VR0fAaWX7-3GxXEvutU2Ile10ifnKxp*2-GczmB6z1YaZ3Us1EVl9w4qbEq7TpVNpmjIB8W6\/systemofintelligence.png?width=550\" width=\"550\" class=\"align-center\"><\/a><\/p>\n<\/p>\n<p><span style=\"font-size: 12pt;\"><strong>Why the Change in Strategy \u2013 and When?<\/strong><\/span><\/p>\n<p>The phrase Systems of Intelligence (SOI) was first used by Microsoft CEO Satya Nadella in early 2015.\u00a0 However it wasn\u2019t until 2017 that <a href=\"https:\/\/news.greylock.com\/the-new-moats-53f61aeac2d9\"><em><u>the strategy of creating wide moats using SOI<\/u><\/em><\/a> was articulated by venture capitalist Jerry Chen at Greylock Partners.<\/p>\n<p>Suddenly Systems of Intelligence is on everyone\u2019s tongue as the next great generational shift in enterprise infrastructure, the great pivot in the ML platform revolution.<\/p>\n<p>Where current Advanced Analytic Platform strategies rely on being the one-stop general-purpose data science platform of choice, those investing and developing the next generation of platforms say that is about to change.\u00a0 That the needs of each industry, or the needs of each major business process like finance, HR, ITSM, supply chain, ecommerce, and others have become so specialized in terms of their data science content that wide moats are best constructed by making the data science disappear as the middle layer between systems of record and systems of engagements.<\/p>\n<p>As Chen states, \u201c<span>Companies that focus too much on technology without putting it in context of a customer problem will be caught between a rock and a hard place\u201d.\u00a0 As an investor he would say that he is unwilling to back a general purpose DS platform for that very reason.\u00a0<\/span><\/p>\n<p><span>Chen and many others are investing directly on the basis of these thoughts that the future of data science, machine learning, and AI is as the invisible secret sauce middle layer.\u00a0 No one cares exactly how the magic is done, so long as your package arrives on time, or the campaign is successful, or whatever insight the DS has provided proves valuable. It\u2019s all about the end user.\u00a0<\/span><\/p>\n<p><span>From the developer\u2019s and investor\u2019s point of view, this strategy is also the only forward path to deliver measurable and lasting competitive differentiation.\u00a0 The treasured wide moat.<\/span><\/p>\n<p><span>So in the marketplace the emphasis is on the system of engagement.\u00a0 Look at Slack, Amazon Alexa, and every other speech \/text \/conversational UI startup that uses ML as the basis for its interaction with the end user.\u00a0 In China, Tencent and Alibaba have almost completely dominated ecommerce, gaming, chat, and mobile payments by focusing on improving their system of engagement through advanced ML.<\/span><\/p>\n<p><span>It\u2019s also true that systems of engagement experience more rapid evolution and turnover than either the underlying ML or the systems of records.\u00a0 So it\u2019s important that in this new enterprise stack the ML be able to work with a variety of existing and new systems of engagement and also systems of record.\u00a0<\/span><\/p>\n<p><span>The old methods of engagement don\u2019t disappear but new ones are added.\u00a0 In fact being in control of the end user and being compatible with multiple systems of records provides access to the flow of data that will allow the ML SOI to constantly improve enhancing your dominant position.<\/span><\/p>\n<p><span>Here\u2019s how Chen and other SOI enthusiasts see the market today.<\/span><\/p>\n<p><span>\u00a0<a href=\"http:\/\/api.ning.com\/files\/75PaEIOOgHO9tImcPAkt01PaNvfNuB9qkqDMNwNA35ju2L5gjA1QcgYXoJGuIzbGBo4qkCQSiz3qrXSGvMyCbCIgU3V6Q8kq\/systemintelligencecompetitors.png\" target=\"_self\"><img decoding=\"async\" src=\"http:\/\/api.ning.com\/files\/75PaEIOOgHO9tImcPAkt01PaNvfNuB9qkqDMNwNA35ju2L5gjA1QcgYXoJGuIzbGBo4qkCQSiz3qrXSGvMyCbCIgU3V6Q8kq\/systemintelligencecompetitors.png?width=550\" width=\"550\" class=\"align-center\"><\/a><\/span><\/p>\n<p><span>\u00a0<\/span><\/p>\n<p><span style=\"font-size: 12pt;\"><strong>How Does this Change the Way Data Scientists Work?<\/strong><\/span><\/p>\n<p><span>So why does this matter to data scientists and how will it change the way we perform our tasks?\u00a0 Gartner says that by 2020 more than 40% of data science tasks will be automated.\u00a0 There are two direct results:<\/span><\/p>\n<p><span>\u00a0<\/span><\/p>\n<p><span style=\"font-size: 12pt;\"><strong>Algorithm Selection and Tuning Will No Longer Matter<\/strong>\u00a0<\/span><\/p>\n<p><span>It will be automated.\u00a0 It will no longer be one of the data scientist\u2019s primary tasks.\u00a0 We see the movement to automating model construction all around us from the automated modeling features in SPSS to the fully automated modeling platforms like DataRobot.\u00a0<\/span><\/p>\n<p><span>Our ability to try various algorithms including our hands-on ability to tune hyperparameters will very rapidly be replaced by smart automation.\u00a0 The amount of time we need to spend on this part of the project is dramatically reduced and will no longer be the best and most effective use of our expertise.<\/span><\/p>\n<p><span>\u00a0<\/span><\/p>\n<p><span style=\"font-size: 12pt;\"><strong>Data Prep will be Mostly Automated\u00a0<\/strong><\/span><\/p>\n<p><span>Data prep for the most part will be automated and in some narrowly defined instances can be completely automated.\u00a0 This problem is actually much more difficult to totally automate than model creation.\u00a0 However you can already utilize automated data prep in tools as diverse as SPSS and Xpanse Analytics.\u00a0 Right now, of the many steps in prep at least the following can be reliably automated:<\/span><\/p>\n<ul>\n<li><span>Blending data sources.<\/span><\/li>\n<li><span>Profile the data for initial discovery.<\/span><\/li>\n<li><span>Recode missing and mislabeled values.<\/span><\/li>\n<li><span>Normalize the data distribution.<\/span><\/li>\n<li><span>Run univariate analyses.<\/span><\/li>\n<li><span>Bin categoricals.<\/span><\/li>\n<li><span>Create N-grams from text fields.<\/span><\/li>\n<li><span>Detect and resolve outliers.<\/span><\/li>\n<\/ul>\n<p><span>If you\u2019ve experienced any of these automated prep tools you know that today they\u2019re not perfect.\u00a0 Give them a little time.\u00a0 This step alone eliminates all the unpleasant grunt work and lower level time and labor in ML.<\/span><\/p>\n<p><span>\u00a0<\/span><\/p>\n<p><span style=\"font-size: 12pt;\"><strong>Who You Want to Work For<\/strong><\/span><\/p>\n<p><span>The Systems of Intelligence strategy shift raises another interesting change.\u00a0 It probably impacts who you want to work for.\u00a0 One of the great imbalances in the shortage of the best data scientists is that such a high percentage work for tech companies mostly engaged in one-size-fits-all platforms.\u00a0 Certainly one implication is that we may want to search out industry or process vertical solution developers who will be the primary beneficiaries of this major change.<\/span><\/p>\n<p><span>\u00a0<\/span><\/p>\n<p><span style=\"font-size: 12pt;\"><strong>What\u2019s Left for the Data Scientist to do?<\/strong><\/span><\/p>\n<p><span>Whether you\u2019ve been in the industry for long or are fresh out of school you\u2019ve been intently focused on data prep, model selection, and tuning.\u00a0 For many of us these are the tasks that define our core skill sets.\u00a0 So what\u2019s left?<\/span><\/p>\n<p><span>This isn\u2019t as dark as it seems.\u00a0 We shift to the higher value tasks that were always there but represented a much smaller percentage of our work.<\/span><\/p>\n<p><span>\u00a0<\/span><\/p>\n<p><span style=\"font-size: 12pt;\"><strong>Feature Engineering and Model Validation Become a Focus<\/strong><\/span><\/p>\n<p><span>In all the automation of prep so far there have been some attempts to automate feature engineering (feature creation) by for example taking the difference in all the possible date fields, creating all the possible ratios among variables, looking at trending of values, and other techniques.\u00a0 These have been brute force and tend to create lots of meaningless engineered features.<\/span><\/p>\n<p><span>It is your knowledge of both data science and particularly the industry specific domain knowledge that will keep the creation and selection of important new predictive engineered features a major part of our future efforts.\u00a0<\/span><\/p>\n<p><span>Your expertise will also be required at the earliest stages of data examination to ensure the automation hasn\u2019t gone off the rails.\u00a0 It\u2019s pretty easy to fool today\u2019s automated prep tools into believing data may be linear when in fact it may be curvilinear or even non-correlated (I\u2019m thinking<\/span> <a href=\"https:\/\/www.datasciencecentral.com\/profiles\/blogs\/when-data-viz-trumps-statistics\"><em><u>Anscombe\u2019s Quartet<\/u><\/em><\/a> <span>here).\u00a0 It still takes an expert to validate that the automation is heading in the right direction.<\/span><\/p>\n<p><span>\u00a0<\/span><\/p>\n<p><span style=\"font-size: 12pt;\"><strong>Your Understanding of the Business Problem to be Solved<\/strong><\/span><\/p>\n<p><span>If you are working inside a large corporation as part of the advanced analytics team then your ability to correctly understand the business problem and translate that into a data science problem will be key.<\/span><\/p>\n<p><span>If you are working under the SOI strategy and trying to solve a cross industry process problems (HR, finance, supply chain, ITSM) or even if you are working with a more narrowly defined industry vertical (e.g. ecommerce customer engagement) it will be your knowledge and understanding of the end users experience that will be valued.<\/span><\/p>\n<p><span>Even today progress as a data scientist requires deep domain knowledge of your specialty process or industry.\u00a0 Knowledge of the data science required to implement the solution is not sufficient without domain knowledge.<\/span><\/p>\n<p><span>\u00a0<\/span><\/p>\n<p><span style=\"font-size: 12pt;\"><strong>Machine Learning Will Increasingly be a Team Sport<\/strong><\/span><\/p>\n<p><span>With all this talk of automation it is easy to be misled that professional data scientists will no longer be necessary.\u00a0 Nothing could be further from the truth.\u00a0 True, fewer of us will be required to solve problems which can be implemented much more quickly.<\/span><\/p>\n<p><span>Where does this leave the Citizen Data Scientist?\u00a0 This is a movement that has quite a lot of momentum and it\u2019s easy to understand that reasonably smart and motivated LOB managers and analysts may not only want to consume more data science but also want a hands-on seat at the table.<\/span><\/p>\n<p><span>And indeed they should have a major role in defining the problem and implementing the solution.\u00a0 However, even with all the new automated features the underlying data science still requires an expert\u2019s eye.\u00a0<\/span><\/p>\n<p><span>The new focus of your skills will be as a team leader, one with deep knowledge of the data science and the business domain.<\/span><\/p>\n<p><span>\u00a0<\/span><\/p>\n<p><span style=\"font-size: 12pt;\"><strong>How Fast Will All This Happen<\/strong><\/span><\/p>\n<p><span>The build out of advanced analytic platforms and automated features has been underway for about the last two years.\u00a0 I\u2019m with Gartner on this one.\u00a0 I think roughly half our tasks will be automated within two years.\u00a0 Beyond that it\u2019s about how fast this trickles down from the largest companies to the smaller ones.\u00a0 The speed and reduced cost that automation offers will be impossible to resist.<\/span><\/p>\n<p><span>As for the absorption of the data science platform into the hidden middle layer of the stack as the System of Intelligence, you can already see this underway in many of the thousands of VC funded startups.\u00a0 This is fairly new and it will take time for these startups to scale and mature.\u00a0 However, don\u2019t overlook the role that M&#038;A will play in bringing these new platform concepts inside large existing players.\u00a0 This is probable and will only accelerate the trend.<\/span><\/p>\n<p><span>Is hiding the data science from the end user in any way a bad thing?\u00a0 Not at all.\u00a0 Our contribution to the end user\u2019s experience was never meant to be on direct display.\u00a0 This means more opportunities to apply our data science skills on more tightly focused groups of end users and create more delight in their experience.<\/span><\/p>\n<p>\u00a0<\/p>\n<p>\u00a0<\/p>\n<p>About the author:\u00a0 Bill Vorhies is Editorial Director for Data Science Central and has practiced as a data scientist since 2001.\u00a0 He can be reached at:<\/p>\n<p><a href=\"mailto:Bill@DataScienceCentral.com\">Bill@DataScienceCentral.com<\/a><\/p>\n<\/div>\n<p><a href=\"https:\/\/www.datasciencecentral.com\/xn\/detail\/6448529:BlogPost:682549\">Go to Source<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Author: William Vorhies Summary:\u00a0 Deep changes are underway in how data science is practiced and successfully deployed to solve business problems and create strategic advantage.\u00a0 [&hellip;] <span class=\"read-more-link\"><a class=\"read-more\" href=\"https:\/\/www.aiproblog.com\/index.php\/2018\/07\/10\/data-science-is-changing-and-data-scientists-will-need-to-change-too-heres-why-and-how\/\">Read More<\/a><\/span><\/p>\n","protected":false},"author":1,"featured_media":474,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_bbp_topic_count":0,"_bbp_reply_count":0,"_bbp_total_topic_count":0,"_bbp_total_reply_count":0,"_bbp_voice_count":0,"_bbp_anonymous_reply_count":0,"_bbp_topic_count_hidden":0,"_bbp_reply_count_hidden":0,"_bbp_forum_subforum_count":0,"footnotes":""},"categories":[26],"tags":[],"_links":{"self":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/775"}],"collection":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/comments?post=775"}],"version-history":[{"count":0,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/775\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media\/472"}],"wp:attachment":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media?parent=775"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/categories?post=775"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/tags?post=775"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}