{"id":1668,"date":"2019-02-02T06:33:27","date_gmt":"2019-02-02T06:33:27","guid":{"rendered":"https:\/\/www.aiproblog.com\/index.php\/2019\/02\/02\/the-difference-between-managing-large-and-small-data-science-teams\/"},"modified":"2019-02-02T06:33:27","modified_gmt":"2019-02-02T06:33:27","slug":"the-difference-between-managing-large-and-small-data-science-teams","status":"publish","type":"post","link":"https:\/\/www.aiproblog.com\/index.php\/2019\/02\/02\/the-difference-between-managing-large-and-small-data-science-teams\/","title":{"rendered":"The Difference Between Managing Large and Small Data Science Teams"},"content":{"rendered":"<p>Author: William Vorhies<\/p>\n<div>\n<p><strong><em>Summary:<\/em><\/strong><em>\u00a0 As advanced analytics and data science have matured into must-have skills, data science groups within large companies have themselves become much larger.\u00a0 This has led to some unique problems and solutions that you\u2019ll want to consider as your own DS group grows larger.<\/em><em>\u00a0<\/em><\/p>\n<p>\u00a0<a href=\"http:\/\/api.ning.com\/files\/Ux7BHEMo9GrbFwL7JjYYaLINhBWxilI2YZaVkMjTLL5NDYrmhKltX*5QfBP6l72kdTujstmAy300QvwOfG2h5wdjgMwLsn-G\/largegroupsDavidParkinscredited.jpg\" target=\"_self\"><img decoding=\"async\" src=\"http:\/\/api.ning.com\/files\/Ux7BHEMo9GrbFwL7JjYYaLINhBWxilI2YZaVkMjTLL5NDYrmhKltX*5QfBP6l72kdTujstmAy300QvwOfG2h5wdjgMwLsn-G\/largegroupsDavidParkinscredited.jpg?width=500\" width=\"500\" class=\"align-center\"><\/a><\/p>\n<p>It seems like only two or three years ago you wouldn\u2019t have had to ask this question.\u00a0 Unless you were Google, Amazon, or an equally big player your data science teams were small, maybe in the range of 3 to 12, and were still trying to find their place in your organization.<\/p>\n<p>Fast forward to today and it\u2019s not unusual to find teams of 20 or 40 or even more and that\u2019s a game changer.\u00a0 It\u2019s no longer like Cheers where everyone knows your name.\u00a0 In larger organizations more order, organization, and process becomes necessary.<\/p>\n<p>The large analytic platform providers clearly understand this.\u00a0 I\u2019m thinking IBM, Microsoft, SAS, Alteryx and similar.\u00a0 Over the last year or so there\u2019s been an increasing focus on elephant hunting, slang I\u2019m sure you\u2019ll recognize for trying to get a foothold where the big teams live.<\/p>\n<p>Here are some topics and questions that seem to arise as common ground from trying to manage larger DS orgs.<\/p>\n<p>\u00a0<\/p>\n<p><span style=\"font-size: 12pt;\"><strong>You\u2019re Going to Want to Standardize on a Process Which Probably Means Standardizing on a Platform<\/strong><\/span><\/p>\n<p>If you\u2019ve got 40 people in a data science team that implies a large number of projects and as a result a large number of models or product features to keep track of and maintain.\u00a0 You can\u2019t have everyone freelancing in tools and project structure or you\u2019ll never keep up.<\/p>\n<p>As you\u2019ve approached this scale you probably tried to adhere to a common process such as <a href=\"https:\/\/www.datasciencecentral.com\/profiles\/blogs\/crisp-dm-a-standard-methodology-to-ensure-a-good-outcome\"><em><u>CRISP-DM<\/u><\/em><\/a> and may even have written some internal standards about how that\u2019s implemented.\u00a0 Another common situation is for a DS group to have coalesced around a comprehensive platform.\u00a0<\/p>\n<p>Take Alteryx for example that enables the process from data blending through modeling.\u00a0 You\u2019re all using the platform so it\u2019s easy to communicate where you are in a modeling project and there can be project-by-project discussion of when and if, for example, you\u2019re going to use custom code as opposed to the built-in tools.<\/p>\n<p>That will get you part of the way there and some will be happy with this semi-formal level of formalization.\u00a0 However, the lessons we take from the project management process and application development disciplines like Agile are that more organization can be better and doesn\u2019t necessarily bog things down.<\/p>\n<p>Recently both IBM and Microsoft have created offerings to template this level of organization for you in hopes of getting you to focus on their DS and cloud offerings.\u00a0 IBM has the Data Science Experience and Watson Studio.\u00a0 Microsoft introduced the Team Data Science Process.<\/p>\n<p>\u00a0<\/p>\n<p><span style=\"font-size: 12pt;\"><strong>Some Examples from the Microsoft Team Data Science Process (TDSP)<\/strong><\/span><\/p>\n<p>Of the \u2018systems\u2019 we were able to identify, the Microsoft TDSP seems to be the most comprehensive, literally defining project steps and individual roles and responsibilities.\u00a0 TDSP includes:<\/p>\n<ul>\n<li>A data science lifecycle definition (high level project plan description).<\/li>\n<li>A standardized project structured (including even sample templates for things like project charters and reports).<\/li>\n<li>A list and description of the required infrastructure and resources.<\/li>\n<li>Tools and utilities for project execution (the Microsoft offerings this platform is intended to promote).<\/li>\n<\/ul>\n<p>The high level process diagram is not all that different from CRISP-DM but easy to use when describing the steps.\u00a0<a href=\"http:\/\/api.ning.com\/files\/Ux7BHEMo9GqoFBrPpLPVPZ9ZRme9*cXu8AIa4P8X2vTWrg*WIVSApmJpCgHK4bnSqTOFhlygdOfatfQQciKyAAjFmolqznMo\/MSTDSPLifecyclediagram.png\" target=\"_self\"><img decoding=\"async\" src=\"http:\/\/api.ning.com\/files\/Ux7BHEMo9GqoFBrPpLPVPZ9ZRme9*cXu8AIa4P8X2vTWrg*WIVSApmJpCgHK4bnSqTOFhlygdOfatfQQciKyAAjFmolqznMo\/MSTDSPLifecyclediagram.png?width=450\" width=\"450\" class=\"align-center\"><\/a><\/p>\n<p>Going down one more level, the TDSP even lays out specific roles and responsibilities for each common DS role including Solution Architect, Project Manager, Project Lead, and Data Scientists.\u00a0<a href=\"http:\/\/api.ning.com\/files\/Ux7BHEMo9Gp-6y0450UOfyyLZ0FYqFVyuL6mJRMTXgr2hmNd4Vhc4RT2ybCAUG*uF2TbCslgL2YAY2dyiTk*kKeYjn0fEALM\/MSTDSPtasks.png\" target=\"_self\"><img decoding=\"async\" src=\"http:\/\/api.ning.com\/files\/Ux7BHEMo9Gp-6y0450UOfyyLZ0FYqFVyuL6mJRMTXgr2hmNd4Vhc4RT2ybCAUG*uF2TbCslgL2YAY2dyiTk*kKeYjn0fEALM\/MSTDSPtasks.png?width=450\" width=\"450\" class=\"align-center\"><\/a><\/p>\n<p>The entire package is quite comprehensive and detailed.\u00a0 If you haven\u2019t addressed this level of detail for your organization this would be a good starting point.<\/p>\n<p>\u00a0<\/p>\n<p><span style=\"font-size: 12pt;\"><strong>Maintaining Data Pipelines<\/strong><\/span><\/p>\n<p>Another common pain point in larger DS organizations is acquiring and maintaining data pipelines.\u00a0 Probably as your organization grew this was originally assigned to junior data scientists.\u00a0 As you grew you realized this wasn\u2019t a good use of resources.<\/p>\n<p>Over the last two or three years the <a href=\"https:\/\/www.datasciencecentral.com\/profiles\/blogs\/understanding-the-changing-position-roles-in-data-science\"><em><u>separate discipline of Data Engineer<\/u><\/em><\/a> has emerged as a formal and separate supporting role for the data science process.\u00a0 There are a great many specialized skills in maintaining data pipelines ranging from MDM to more technical skills like creating data lakes or creating instances of Spark.<\/p>\n<p>It\u2019s an open conversation in each organization whether this task falls under the data science group or IT.\u00a0 It will depend on how many of these folks you need and whether they are dedicated to these data science tasks, and of course some internal politics.<\/p>\n<p>However, as data science increasingly becomes a team sport played by specialists this function is your major connection to IT and needs to be clearly defined and agreed.<\/p>\n<p>\u00a0<\/p>\n<p><span style=\"font-size: 12pt;\"><strong>Managing the Incoming Project Pipeline<\/strong><\/span><\/p>\n<p>Looking from the data science team upward to your internal customers, many data scientists find that more than a third of their time is spent meeting with LOB managers to define and vet new projects and then to report progress.<\/p>\n<p>This is also a process that will benefit from some formalization starting with an agreement at the executive level of how the benefit of various projects is to be calculated.\u00a0<\/p>\n<p>Some organizations are requiring an initial estimate of dollar benefit before undertaking new projects.\u00a0 This no doubt leads to some pretty optimistic projections from LOB managers but it\u2019s important in establishing a project charter to quantify the goals.<\/p>\n<p>Like any good project, this step leading up to the project charter should also include the measures and threshold values that will be used for determining customer acceptance.\u00a0 Yes these may be subject to some revision once you get underway, but it\u2019s important to put a stake in the ground so you can manage the priority of incoming projects.<\/p>\n<p>\u00a0<\/p>\n<p><span style=\"font-size: 12pt;\"><strong>Educating Internal Customers<\/strong><\/span><\/p>\n<p>When we think of the training needs of a DS group we often think just of what skills our data scientists need to do their work.\u00a0 However, there is also a requirement to educate your customers in what they can and can\u2019t expect, and what the process of engaging with you will look like.<\/p>\n<p>Some organizations take a minimalist approach to this providing say scheduled quarterly one-hour briefings and perhaps some written material for managers wanting to be educated.\u00a0<\/p>\n<p>Some take a more comprehensive approach of using each project win as a briefing opportunity to tell their story not only to their specific internal customer but in shortened form to other potential LOB customers.\u00a0 This serves as basic education for those that haven\u2019t yet engaged with you, and continuing education about your expanding capabilities as your group becomes more successful.<\/p>\n<p>Internal education of your data science team is just as important and much more under your direct control.\u00a0 Aside from the obvious opportunities for juniors to learn from seniors by participating directly in projects, some of the comprehensive platforms like the IBM Watson Studio have very specifically designed-in instructional materials and resources across a wide variety of DS topics.<\/p>\n<p>\u00a0<\/p>\n<p><span style=\"font-size: 12pt;\"><strong>Keeping the Creative Momentum<\/strong><\/span><\/p>\n<p>As organizations grow, there is a temptation to view the DS group as just another service provider.\u00a0 LOB managers define problems to be solved and the DS group delivers.\u00a0 Begins to sound a lot like order taking.<\/p>\n<p>The truth will continue to be that as your data scientists keep abreast of what\u2019s evolving in our profession, your group is best positioned to suggest creative solutions to problems that may not yet have even been identified.<\/p>\n<p>There\u2019s a temptation to divide every organization into just two levels of strategic development.\u00a0 Either you are in that exciting and less well defined period of figuring out how data and analytics will create competitive advantage, or you have passed over into the refinement phase where improvements are more incremental.<\/p>\n<p>As the lead or influencer in a large DS organization it\u2019s important to maintain that creative input to the organization.\u00a0 No organization is permanently in the initial stage or permanently in the incremental improvement phase.\u00a0 It would be good practice to carve out a specific amount of time and effort to gather the best knowledge and ideas from your DS team, and have a method of floating these up to the LOB and executive levels.<\/p>\n<p>No one in the company will ever know more about data science and what it can do than your group.\u00a0 Don\u2019t become order takers.\u00a0<\/p>\n<p>\u00a0<\/p>\n<p><a href=\"https:\/\/www.datasciencecentral.com\/profiles\/blog\/list?user=0h5qapp2gbuf8\"><em><u>Other articles by Bill Vorhies.<\/u><\/em><\/a><\/p>\n<p>\u00a0<\/p>\n<p>About the author:\u00a0 Bill Vorhies is Editorial Director for Data Science Central and has practiced as a data scientist since 2001.\u00a0 He can be reached at:<\/p>\n<p><a href=\"mailto:Bill@DataScienceCentral.com\">Bill@DataScienceCentral.com<\/a><\/p>\n<\/div>\n<p><a href=\"https:\/\/www.datasciencecentral.com\/xn\/detail\/6448529:BlogPost:738467\">Go to Source<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Author: William Vorhies Summary:\u00a0 As advanced analytics and data science have matured into must-have skills, data science groups within large companies have themselves become much [&hellip;] <span class=\"read-more-link\"><a class=\"read-more\" href=\"https:\/\/www.aiproblog.com\/index.php\/2019\/02\/02\/the-difference-between-managing-large-and-small-data-science-teams\/\">Read More<\/a><\/span><\/p>\n","protected":false},"author":1,"featured_media":473,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_bbp_topic_count":0,"_bbp_reply_count":0,"_bbp_total_topic_count":0,"_bbp_total_reply_count":0,"_bbp_voice_count":0,"_bbp_anonymous_reply_count":0,"_bbp_topic_count_hidden":0,"_bbp_reply_count_hidden":0,"_bbp_forum_subforum_count":0,"footnotes":""},"categories":[26],"tags":[],"_links":{"self":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/1668"}],"collection":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/comments?post=1668"}],"version-history":[{"count":0,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/1668\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media\/474"}],"wp:attachment":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media?parent=1668"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/categories?post=1668"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/tags?post=1668"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}