{"id":853,"date":"2018-07-31T06:32:59","date_gmt":"2018-07-31T06:32:59","guid":{"rendered":"https:\/\/www.aiproblog.com\/index.php\/2018\/07\/31\/cliff-notes-for-managing-the-data-science-function\/"},"modified":"2018-07-31T06:32:59","modified_gmt":"2018-07-31T06:32:59","slug":"cliff-notes-for-managing-the-data-science-function","status":"publish","type":"post","link":"https:\/\/www.aiproblog.com\/index.php\/2018\/07\/31\/cliff-notes-for-managing-the-data-science-function\/","title":{"rendered":"Cliff Notes for Managing the Data Science Function"},"content":{"rendered":"<p>Author: William Vorhies<\/p>\n<div>\n<p><strong><em>Summary:<\/em><\/strong><em>\u00a0 There are an increasing number of larger companies that have truly embraced advanced analytics and deploy fairly large numbers of data scientists.\u00a0 Many of these same companies are the one\u2019s beginning to ask about using AI.\u00a0 Here are some observations and tips on the problems and opportunities associated with managing a larger data science function.<\/em><\/p>\n<p>\u00a0<\/p>\n<p><a href=\"http:\/\/api.ning.com\/files\/wg4s-qXIYshX8ACPvY5qUwwQ51athVYF2RVSZ5tojahofX4DJRAmEWIjAQAdYFsUnFHnFXtw7dElC4h3Pn8ev*CeaUxpfoI8\/DSteam.jpg\" target=\"_self\"><img decoding=\"async\" src=\"http:\/\/api.ning.com\/files\/wg4s-qXIYshX8ACPvY5qUwwQ51athVYF2RVSZ5tojahofX4DJRAmEWIjAQAdYFsUnFHnFXtw7dElC4h3Pn8ev*CeaUxpfoI8\/DSteam.jpg?width=350\" width=\"350\" class=\"align-right\"><\/a>We spend a lot of time looking inward at our profession of data science, studying new developments, looking for <a href=\"https:\/\/www.datasciencecentral.com\/profiles\/blogs\/when-variable-reduction-doesn-t-work\"><em><u>anomalies in our own practices<\/u><\/em><\/a>, and spreading the word to other practitioners.\u00a0 But when we look outward to communicate about data science to others it\u2019s different.\u00a0 Maybe you have this same experience but when I talk to new clients it\u2019s often as not to educate them at a fairly basic level about what\u2019s possible and what\u2019s not.<\/p>\n<p>The good news is that there\u2019s now a third group:\u00a0 execs and managers in larger companies who have embraced advanced analytics and who try to keep up by reading, but are not formally trained.\u00a0 If these analytics managers are data scientists, all well and good.\u00a0 But as you move up the chain of command just a little bit you\u2019ll soon find yourself talking to someone who may be an enthusiastic supporter but whose well intentioned self-education still leaves them short on some basic knowledge.<\/p>\n<p>Who are these folks?\u00a0 Well Gartner says you\u2019re a mid-size user if you have 6 to 12 data scientists and it takes more than 12 to be a larger user.\u00a0 And that\u2019s not counting the dedicated data engineers, IT, and analysts also assigned to the task.\u00a0 So it\u2019s certainly the large users and probably many of the mid-size users we\u2019re addressing.<\/p>\n<p>For a while I\u2019ve been collecting what I call <strong>Cliff Notes for Managing Data Science<\/strong> to address this group.\u00a0 Here\u2019s the first installment.<\/p>\n<p>\u00a0<\/p>\n<p><span style=\"font-size: 12pt;\"><strong>Do you really want an AI strategy?<\/strong><\/span><\/p>\n<p><a href=\"http:\/\/api.ning.com\/files\/wg4s-qXIYsiAYYaI47I3AgvZERv1YKU90PHYd60OYO0grxlbAzwGiylMwznW*gOOLFIZeT0yQTRA6SmmSuQ0H02GYP11QNxI\/AIstrategysquare.jpg\" target=\"_self\"><img decoding=\"async\" src=\"http:\/\/api.ning.com\/files\/wg4s-qXIYsiAYYaI47I3AgvZERv1YKU90PHYd60OYO0grxlbAzwGiylMwznW*gOOLFIZeT0yQTRA6SmmSuQ0H02GYP11QNxI\/AIstrategysquare.jpg?width=300\" width=\"300\" class=\"align-right\"><\/a>I\u2019ll try to keep this short because this topic tends to set me off.\u00a0 The popular press and many of the platform and application vendors have started just recently calling everything in advanced analytics \u201c<a href=\"https:\/\/www.datasciencecentral.com\/profiles\/blogs\/what-exactly-is-artificial-intelligence-and-why-is-it-driving-me-\"><em><u>Artificial Intelligence<\/u><\/em><\/a>\u201d.\u00a0 Not only is this not accurate it makes the conversation much more difficult.<\/p>\n<p>First of all if you\u2019ve already got a dozen data scientists then you are firmly in the camp of <a href=\"https:\/\/www.datasciencecentral.com\/profiles\/blogs\/machine-learning-can-we-please-just-agree-what-this-means\"><em><u>machine learning \/ predictive analytics<\/u><\/em><\/a>.\u00a0 Machine learning is much more mature and more broadly useful than just AI <em>(which also uses a narrow group of machine learning techniques)<\/em>.\u00a0 So good for you.\u00a0 Keep up the good work.\u00a0 Just because it helps humans make decisions, what you have been doing so far is not AI.<\/p>\n<p><a href=\"https:\/\/www.datasciencecentral.com\/profiles\/blogs\/the-data-science-behind-ai\"><em><u>Modern AI<\/u><\/em><\/a> is the outcome of deep neural nets and reinforcement learning.\u00a0 AI involves recognition and response to text, voice, image, and video.\u00a0 It also encompasses automated and autonomous vehicles, game play, and the examination of ultra-large data sets to identify very rare events.\u00a0 This area is quite new and only the text, voice, image, and video capabilities are ready for commercial deployment.<\/p>\n<p>Technically if you have deployed a chatbot anywhere in your organization you are utilizing AI.\u00a0 <a href=\"https:\/\/www.datasciencecentral.com\/profiles\/blogs\/beginners-guide-to-chatbots\"><em><u>Chatbot input and sometimes output<\/u><\/em><\/a> is based on NLU (natural language understanding) one of the good applications of deep learning.\u00a0 Chances are that if you do not have at least one chatbot today, you will have one within a year.\u00a0 Chatbots are a great way to engage with customers and save money.\u00a0 They are not whiz-bang solutions.\u00a0 This is what most of AI is going to look like when deployed.<\/p>\n<p>By all means begin the conversation about <a href=\"https:\/\/www.datasciencecentral.com\/profiles\/blogs\/how-to-put-ai-to-work\"><em><u>where modern AI may be of value in your strategy<\/u><\/em><\/a> but don\u2019t oversell this yet.\u00a0 The real money is in what you\u2019ve been doing right along with predictive and prescriptive analytics, IoT, and the other well developed machine learning technologies.<\/p>\n<p>\u00a0<\/p>\n<p><strong><span style=\"font-size: 12pt;\"><span>Should<\/span> your data scientists be centralized or decentralized?<\/span><\/strong><\/p>\n<p><a href=\"http:\/\/api.ning.com\/files\/wg4s-qXIYsjEzQpC2V4FlHLUXaTKPggP82TA*gmrdpLZT1KesFySBVql89177EJfMoKyYcjuJ2d60ZIHaXp3lA9wYvfrecfp\/Centralizevsdecentralize.png\" target=\"_self\"><img decoding=\"async\" src=\"http:\/\/api.ning.com\/files\/wg4s-qXIYsjEzQpC2V4FlHLUXaTKPggP82TA*gmrdpLZT1KesFySBVql89177EJfMoKyYcjuJ2d60ZIHaXp3lA9wYvfrecfp\/Centralizevsdecentralize.png?width=300\" width=\"300\" class=\"align-right\"><\/a>There are two schools of thought here and the deciding factor is probably how many data scientists you have.\u00a0 One school of thought is that you should embed them closest to where the action is, in marketing, sales, finance, manufacture.\u00a0 You name it, every process can benefit from advanced analytics.\u00a0 They will learn the unique perspective, language, problems, and data of that process which will make them more effective.<\/p>\n<p>On the other hand, the average data scientist has had that title only 2 \u00bd years.\u00a0 That just shows you how fast we\u2019re starting to graduate new ones and how rapidly they\u2019re getting snapped up.\u00a0 What it means is that you probably have a few relatively experienced data scientists who have been around the block and a larger number of juniors who are just getting started.<\/p>\n<p>The juniors should have come with a very impressive set of technical skills and in theory can contribute to any data science problem.\u00a0 The reality is that the juniors and the seniors as well need to keep learning by experience, not to mention having time to catch up with the new techniques that are being introduced all the time.\u00a0 So the goal will be to have enough contact time between the seniors and the juniors so that everyone continues to develop.\u00a0<\/p>\n<p>If you\u2019ve got a half-dozen with various experience levels working together that\u2019s probably OK.\u00a0 However one interesting model is a hybrid that brings all your data scientists together on a fairly regular schedule so they can share experiences and learning.\u00a0<\/p>\n<p>Another possible implementation would be to have a few seniors deployed out in each end user organization with the juniors on rotating assignment to assist.<\/p>\n<p>Spread them too thin and you won\u2019t benefit from their growth.\u00a0 It may also cause them to leave for greener pastures.<\/p>\n<p>\u00a0<\/p>\n<p><span style=\"font-size: 12pt;\"><strong>Should every data science project have an ROI?<\/strong><\/span><\/p>\n<p>The further you go up the chain of command, the more senior management will say \u2018of course, this is our most basic concept\u2019.\u00a0 And that\u2019s not necessarily bad but it needs to come with some balance.<\/p>\n<p>It will be many years and perhaps never when you fully exploit all the data and all the analytics that will create competitive advantage.\u00a0 And many of those applications haven\u2019t even been conceived today.<\/p>\n<p>One type of financial discipline that is completely appropriate is pre-establishing time budgets or measures that tell you when the solution is good enough.\u00a0 This is particularly true in all types of customer behavior modeling.\u00a0<\/p>\n<p>If your data science team has a built in bias that is a potential weakness it is that they will always want to keep working to make those models better.\u00a0 Even if the time would be better spent on other data science projects.\u00a0 This scheduling discipline needs the understanding of exactly how the work is done and that most likely belongs to your Chief Data Scientist.<\/p>\n<p>Incidentally, that Chief Data Scientist should also be regularly evaluating and recommending platforms and techniques to make the group more efficient, particularly in the fast emerging area of <a href=\"https:\/\/www.datasciencecentral.com\/profiles\/blogs\/data-scientists-automated-and-unemployed-by-2025-update\"><em><u>automated machine learning<\/u><\/em><\/a>.<\/p>\n<p>HOWEVER, there needs to be an opportunity for discovery.\u00a0 This is a little like an engineering lab where your data scientists need a little formally allocated time to \u2018go in there and find something interesting\u2019.\u00a0 Give them a little unstructured time to explore.\u00a0<\/p>\n<p>The most interesting phrase in science is not \u2018Eureka\u2019, but \u2018that\u2019s funny\u2019 <em>(Isaac Asimov).<\/em><\/p>\n<p>\u00a0<\/p>\n<p><span style=\"font-size: 12pt;\"><strong>Should you keep all that data or only what you need?<\/strong><\/span><\/p>\n<p>This question is very closely related to the one above about ROI.\u00a0 Our ordinary instinct is to keep only what we need.\u00a0 However there is a strong school of thought among data scientists that data is now so inexpensive to store that we should keep it all and figure out how to benefit from it later.<\/p>\n<p>The opposite school starts with \u2018what is the problem we are trying to solve\u2019 and works backwards to the data necessary to achieve that.\u00a0 This is also the school that says when we have achieved X% accuracy to this question that is sufficient and any data not necessary to support that should be discarded.<\/p>\n<p>Well the problem is that all that data that doesn\u2019t appear to be predictive most assuredly contains pockets of outliers and pockets of really interesting new opportunities.\u00a0 You may not have the manpower to dig into it today, but that\u2019s also why we argued for giving your data science team some self-directed time to \u2018find something interesting\u2019.<\/p>\n<p>Storing new data in a cloud data lake where data scientists can explore it is ridiculously cheap (but not free).\u00a0 Where you need discipline is when you operationalize your new insight and it becomes mission critical.\u00a0 Then you need the full weight of good data management, provenance control, bias elimination, and the proverbial single definition of the truth.<\/p>\n<p>\u00a0<\/p>\n<p><span style=\"font-size: 12pt;\"><strong>What about Citizen Data Scientists and the democratization of analytics?<\/strong><\/span><\/p>\n<p>\u00a0<a href=\"http:\/\/api.ning.com\/files\/wg4s-qXIYsgf05BptSpH6zRaUKT-gi8H8uWm1B6HdYm1uUHkc1E*xlWARn5D8tluk4Gl55mDv1kEN*jalaqJsDPeyMrSoyfR\/datascienceteamsportanacondawide.jpg\" target=\"_self\"><img decoding=\"async\" src=\"http:\/\/api.ning.com\/files\/wg4s-qXIYsgf05BptSpH6zRaUKT-gi8H8uWm1B6HdYm1uUHkc1E*xlWARn5D8tluk4Gl55mDv1kEN*jalaqJsDPeyMrSoyfR\/datascienceteamsportanacondawide.jpg?width=550\" width=\"550\" class=\"align-center\"><\/a><\/p>\n<p>All data science projects are team efforts and those teams consists of data scientists, LOB SMEs, and probably some analysts and folks from IT.\u00a0 As these non-DS team members get more experience of course they become more valuable.\u00a0 Particularly they are increasingly able to restate business problems as data science problems that they can bring to the table.<\/p>\n<p>What really scares me and should scare you too are statements in the press to the effect that AI and machine learning have become so user friendly that they are \u201conly a little more complex than word processors or spreadsheets\u201d, or \u201cusers no longer need to code\u201d.<\/p>\n<p>That may be very narrowly true but that does not mean you should ever begin a data science project without including an adequate number of formally trained data scientists.<\/p>\n<p>It is true that our advanced analytic platforms are becoming easier to use.\u00a0 The benefit is that fewer data scientists can do the work that used to require many more.\u00a0 It does not mean that citizen data scientists, no matter how well intentioned, should be given control over these projects.\u00a0<\/p>\n<p>The slick visual user interfaces in many analytic applications hide many critical considerations that a DS will know and a CDS will not.\u00a0 The issues are much too long to list here but for example include false positive\/false negative threshold cost tradeoffs, best algorithm selection, the creation of new features, hyperparameter adjustment of those algorithms, bias detection, and the list goes on.\u00a0 This is no self-driving car.<\/p>\n<p>What we recommend is indeed getting those analysts and LOB managers deeply involved in the team process.\u00a0 That will allow them to spot new opportunities.\u00a0 If you want to empower your organizations start with actively educating for data literacy.\u00a0 Leave actually driving the data science to the data scientists.<\/p>\n<p>\u00a0<\/p>\n<p><span style=\"font-size: 12pt;\"><strong>Cybersecurity may be what forces your hand into true AI<\/strong><\/span><\/p>\n<p>Whether you are dealing with cybersecurity in-house or contracting out, this is the first place you should be sure that there is real AI at work.\u00a0 It turns out that the deep learning techniques at the core of modern AI are particularly good at spotting anomalies and threats. If you want to front load your AI strategy this is the best place to start.\u00a0 As the Marines say, don\u2019t bring a knife to a gun fight.<\/p>\n<p>\u00a0<\/p>\n<p>\u00a0<\/p>\n<p>About the author:\u00a0 Bill Vorhies is Editorial Director for Data Science Central and has practiced as a data scientist since 2001.\u00a0 He can be reached at:<\/p>\n<p><a href=\"mailto:Bill@DataScienceCentral.com\">Bill@DataScienceCentral.com<\/a><\/p>\n<\/div>\n<p><a href=\"https:\/\/www.datasciencecentral.com\/xn\/detail\/6448529:BlogPost:690839\">Go to Source<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Author: William Vorhies Summary:\u00a0 There are an increasing number of larger companies that have truly embraced advanced analytics and deploy fairly large numbers of data [&hellip;] <span class=\"read-more-link\"><a class=\"read-more\" href=\"https:\/\/www.aiproblog.com\/index.php\/2018\/07\/31\/cliff-notes-for-managing-the-data-science-function\/\">Read More<\/a><\/span><\/p>\n","protected":false},"author":1,"featured_media":473,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_bbp_topic_count":0,"_bbp_reply_count":0,"_bbp_total_topic_count":0,"_bbp_total_reply_count":0,"_bbp_voice_count":0,"_bbp_anonymous_reply_count":0,"_bbp_topic_count_hidden":0,"_bbp_reply_count_hidden":0,"_bbp_forum_subforum_count":0,"footnotes":""},"categories":[26],"tags":[],"_links":{"self":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/853"}],"collection":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/comments?post=853"}],"version-history":[{"count":0,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/853\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media\/467"}],"wp:attachment":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media?parent=853"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/categories?post=853"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/tags?post=853"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}