{"id":3786,"date":"2020-08-20T06:30:02","date_gmt":"2020-08-20T06:30:02","guid":{"rendered":"https:\/\/www.aiproblog.com\/index.php\/2020\/08\/20\/data-scientists-think-data-is-their-1-problem-heres-why-theyre-wrong\/"},"modified":"2020-08-20T06:30:02","modified_gmt":"2020-08-20T06:30:02","slug":"data-scientists-think-data-is-their-1-problem-heres-why-theyre-wrong","status":"publish","type":"post","link":"https:\/\/www.aiproblog.com\/index.php\/2020\/08\/20\/data-scientists-think-data-is-their-1-problem-heres-why-theyre-wrong\/","title":{"rendered":"Data Scientists think data is their #1 problem. Here&#8217;s why they&#8217;re wrong."},"content":{"rendered":"<p>Author: James Taylor<\/p>\n<div>\n<p>I often see articles or posts that identify data integration or preparation as the key issues facing data science projects. This always puzzles me as this is not our lived experience &#8211; not what we see when we work with Fortune 500 companies adopting predictive analytics, machine learning or AI. But I think I have figured it out. The problem is as follows:<\/p>\n<p style=\"text-align: center;\"><span style=\"font-size: 14pt;\"><em><strong>What data scientists think counts as a &#8220;data science project&#8221; <br \/>is not, in fact, a data science project.<\/strong><\/em><\/span><\/p>\n<p>Let me illustrate this with some data from a great study. Back in 2016, the Economist Information Unit did a survey on &#8220;<a href=\"https:\/\/eiuperspectives.economist.com\/marketing\/broken-links-why-analytics-investments-have-yet-pay\" target=\"_blank\" rel=\"noopener noreferrer\">Broken links: Why analytics investments have yet to pay off<\/a>&#8221; and below you see how this data appears to support the argument that data problems are #1.<\/p>\n<p><a href=\"https:\/\/storage.ning.com\/topology\/rest\/1.0\/file\/get\/7531338281?profile=original\" target=\"_blank\" rel=\"noopener noreferrer\"><img decoding=\"async\" src=\"https:\/\/storage.ning.com\/topology\/rest\/1.0\/file\/get\/7531338281?profile=RESIZE_710x\" class=\"align-center\" style=\"padding: 5px;\"><\/a><\/p>\n<p>Wow &#8211; pretty clear that Data integration\/preparation is the biggest problem with nearly twice as many projects reporting it as a problem as the next one.<\/p>\n<p>In fact, though, this is a subset of the data from the survey. Here&#8217;s the full data set:<\/p>\n<p><a href=\"https:\/\/storage.ning.com\/topology\/rest\/1.0\/file\/get\/7531342085?profile=original\" target=\"_blank\" rel=\"noopener noreferrer\"><img decoding=\"async\" src=\"https:\/\/storage.ning.com\/topology\/rest\/1.0\/file\/get\/7531342085?profile=RESIZE_710x\" class=\"align-center\" style=\"padding: 5px;\"><\/a><\/p>\n<p>Data integration and preparation only ranks #4. Problem definition\/framing, Solution approach\/design and Action\/change management all rank higher. This is our experience.<\/p>\n<p>In large, established &#8220;grown-up&#8221; companies, data science projects fail for one or both of two reasons:<\/p>\n<ul>\n<li>They are solving the wrong problem. They are building an analytic that is not what the business need, that will not solve a true business problem or that is poorly designed to fit into the business context.<\/li>\n<li>They cannot action the model they build. They can&#8217;t change the business decision making to take advantage of the analytic by changing the decisions made and actions taken.<\/li>\n<\/ul>\n<p>And this illustrates the problem.<\/p>\n<p>The problem is that data scientists THINK their project starts with data and ends with the communication of their analysis. If that&#8217;s your focus, then data is your #1 problem.<\/p>\n<p>But this is not where data science projects start nor where they end. They have to start and end with the <strong>business<\/strong>. That means starting with a <strong>business<\/strong> problem &#8211; a business decision that the business wants to improve &#8211; and ending with that problem being solved &#8211; the <strong>business<\/strong> behaves differently (better). If that&#8217;s your focus, then your problem is not data but problem definition and operationalization &#8211; making the analytic work IRL.<\/p>\n<p>Here&#8217;s the difference, shown on those phases. On the left, what many data scientist think their projects involved and on the right, what it really involves.<\/p>\n<p><a href=\"https:\/\/storage.ning.com\/topology\/rest\/1.0\/file\/get\/7531362091?profile=original\" target=\"_blank\" rel=\"noopener noreferrer\"><img decoding=\"async\" src=\"https:\/\/storage.ning.com\/topology\/rest\/1.0\/file\/get\/7531362091?profile=RESIZE_710x\" class=\"align-center\" style=\"padding: 5px;\"><\/a><\/p>\n<p style=\"text-align: center;\"><span style=\"font-size: 14pt;\"><em><strong>Bottom line: If your data science team is telling you that <br \/>data is their #1 problem then they&#8217;re doing it wrong<\/strong><\/em><\/span><\/p>\n<p>I&#8217;ve written about this before &#8211; check out this LinkedIn <a href=\"https:\/\/www.linkedin.com\/pulse\/fixing-broken-links-analytics-value-chain-james-taylor\/\" target=\"_blank\" rel=\"noopener noreferrer\">article on the study itself<\/a> and this one on <a href=\"https:\/\/www.linkedin.com\/pulse\/adopt-decision-modeling-decisionsfirst-analytic-success-james-taylor\/\" target=\"_blank\" rel=\"noopener noreferrer\">adopting decision modeling<\/a> as a better way to define the problems your data science team is trying to solve. You might also like our recent white paper and videos on <a href=\"https:\/\/www.decisionmanagementsolutions.com\/analytic-enterprise\/\" target=\"_blank\" rel=\"noopener noreferrer\">Building an Analytic Enterprise<\/a>.<\/p>\n<p>Feel free to <a href=\"https:\/\/www.linkedin.com\/in\/jamestaylor\/\" target=\"_blank\" rel=\"noopener noreferrer\">connect with me on LinkedIn<\/a> to message me with questions and comments. And if we can help your data science team start working on a better definition of a project, <a href=\"https:\/\/www.decisionmanagementsolutions.com\/about-decision-management-solutions-2\/contact-us\/\" target=\"_blank\" rel=\"noopener noreferrer\">we&#8217;d love to<\/a>.<\/p>\n<p>This article was originally posted to <a href=\"https:\/\/www.linkedin.com\/pulse\/data-scientists-think-1-problem-heres-why-theyre-wrong-james-taylor\/\" target=\"_blank\" rel=\"noopener noreferrer\">LinkedIn<\/a>&nbsp;where it has 60+ comments, 140+ reactions and 1,000+ views. Check out the discussion.<\/p>\n<\/div>\n<p><a href=\"https:\/\/www.datasciencecentral.com\/xn\/detail\/6448529:BlogPost:977742\">Go to Source<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Author: James Taylor I often see articles or posts that identify data integration or preparation as the key issues facing data science projects. This always [&hellip;] <span class=\"read-more-link\"><a class=\"read-more\" href=\"https:\/\/www.aiproblog.com\/index.php\/2020\/08\/20\/data-scientists-think-data-is-their-1-problem-heres-why-theyre-wrong\/\">Read More<\/a><\/span><\/p>\n","protected":false},"author":1,"featured_media":462,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_bbp_topic_count":0,"_bbp_reply_count":0,"_bbp_total_topic_count":0,"_bbp_total_reply_count":0,"_bbp_voice_count":0,"_bbp_anonymous_reply_count":0,"_bbp_topic_count_hidden":0,"_bbp_reply_count_hidden":0,"_bbp_forum_subforum_count":0,"footnotes":""},"categories":[26],"tags":[],"_links":{"self":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/3786"}],"collection":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/comments?post=3786"}],"version-history":[{"count":0,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/3786\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media\/466"}],"wp:attachment":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media?parent=3786"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/categories?post=3786"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/tags?post=3786"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}