{"id":2368,"date":"2019-07-16T06:36:05","date_gmt":"2019-07-16T06:36:05","guid":{"rendered":"https:\/\/www.aiproblog.com\/index.php\/2019\/07\/16\/thinking-about-moving-up-to-automated-machine-learning-aml\/"},"modified":"2019-07-16T06:36:05","modified_gmt":"2019-07-16T06:36:05","slug":"thinking-about-moving-up-to-automated-machine-learning-aml","status":"publish","type":"post","link":"https:\/\/www.aiproblog.com\/index.php\/2019\/07\/16\/thinking-about-moving-up-to-automated-machine-learning-aml\/","title":{"rendered":"Thinking about Moving Up to Automated Machine Learning (AML)"},"content":{"rendered":"<p>Author: William Vorhies<\/p>\n<div>\n<p><strong><em>Summary:<\/em><\/strong> <em>Are you wondering about moving up to Automated Machine Learning (AML)?\u00a0 Here are some considerations to help guide you.<\/em><\/p>\n<p>\u00a0<\/p>\n<p><a href=\"https:\/\/storage.ning.com\/topology\/rest\/1.0\/file\/get\/3283178047?profile=original\" target=\"_blank\" rel=\"noopener noreferrer\"><img decoding=\"async\" src=\"https:\/\/storage.ning.com\/topology\/rest\/1.0\/file\/get\/3283178047?profile=RESIZE_710x\" width=\"400\" class=\"align-right\"><\/a>Are you wondering about moving up to <a href=\"https:\/\/www.datasciencecentral.com\/profiles\/blogs\/automated-machine-learning-aml-comes-of-age-almost\"><em><u>Automated Machine Learning<\/u><\/em><\/a> (AML)?\u00a0 Or perhaps you\u2019ve already made the decision but are wondering about the capabilities of individual platforms, their strengths and limitations and how to choose.\u00a0 Here are some considerations to help guide you.<\/p>\n<p><span style=\"font-size: 12pt;\"><strong>What\u2019s Your Motivation?<\/strong><\/span><\/p>\n<p>This is intended to be a little broader than business case and requirements.\u00a0 Chances are your broader motives fall into one or more of the following buckets and can certainly involve more than one at the same time.<\/p>\n<ol>\n<li><strong>Efficiency<\/strong><\/li>\n<\/ol>\n<p>So far the greatest motivation behind AML adoption has belonged to companies who are already deploying large numbers of ML models.\u00a0 If you\u2019re creating and managing dozens or even hundreds of models as is frequently the case in insurance, banking, and ecommerce then the ability to create more models and keep them refreshed is an obvious issue.\u00a0<\/p>\n<p>Cost savings are a top motivation as fewer data scientists can now do the work of many.\u00a0 Speed, that is time to benefit is also greatly enhanced especially in the model refresh and deploy cycle.<\/p>\n<ol start=\"2\">\n<li><strong>Broader Participation<\/strong><\/li>\n<\/ol>\n<p>Be aware that many of the up and coming AML platforms differentiate themselves based on audience.\u00a0 Those that appeal to your existing data science team offer easier and more complete access to choices in data prep, feature selection, model selection, and model tuning with their hyperparameters.<\/p>\n<p>The larger emerging camp seeks to make the process much easier for less experienced modelers.\u00a0 On the one hand this can be your first year data science hires who will rely more on the automated features than perhaps the more experienced team members.\u00a0 On the other hand there are platforms so completely automated that they encourage LOB Managers, analysts, and other citizen data scientists to participate directly in model building.<\/p>\n<p>Having more people directly participating in model building can seem like a very desirable objective.\u00a0 Be sure you have sufficient controls to prevent putting models into operation that haven\u2019t been fully vetted by your experienced data scientists.\u00a0 It\u2019s still possible for the operator of a fully automated tool to create a model that\u2019s not sufficiently accurate, won\u2019t generalize, or worse, predicts exactly the wrong thing.<\/p>\n<ol start=\"3\">\n<li><strong>Just Getting Started<\/strong><\/li>\n<\/ol>\n<p>If you\u2019re just getting started on your digital journey and don\u2019t yet have a dedicated data scientist or two, you might be tempted to sign up for an AML and give your LOB Managers and analysts enough training to get started.\u00a0 Don\u2019t go there.<\/p>\n<p>As in the last section, it\u2019s still possible for an inexperienced modeler to create a model that will leave you worse off than having no model at all.\u00a0 You\u2019re going to need some quality control before you turn new models loose on your customers or processes.<\/p>\n<p>\u00a0<\/p>\n<p><span style=\"font-size: 12pt;\"><strong>How Much Does Accuracy Count<\/strong><\/span><\/p>\n<p>In machine learning there is always a practical tradeoff between model accuracy and time to develop.\u00a0 Your data scientists will no doubt be happy to continue to deliver increasing incremental gains in model accuracy for days or weeks.\u00a0<\/p>\n<p>Still, it\u2019s important to understand the tradeoff between model accuracy and revenue or margin.\u00a0 It\u2019s not unusual for <a href=\"https:\/\/www.datasciencecentral.com\/profiles\/blogs\/the-value-of-accuracy-in-predictive-analytics\"><em><u>small gains in accuracy to create proportionately much larger gains<\/u><\/em><\/a> in campaign results.<\/p>\n<p>Your data science team lead no doubt understands this and has already put some controls in place.\u00a0 The real issue is whether the automated output of the AML platform meets your minimum requirements.<\/p>\n<p>Determining this will require some benchmarking during the selection process so that you have side-by-side comparisons.\u00a0 Most all AML platforms use multiple algorithms and teams of algorithms run in competition with one another to select the winners.\u00a0<\/p>\n<p>Accuracy within the AML may be less than optimum if the number of candidate algorithms is restricted to just a few.\u00a0 It\u2019s just as likely however that any shortfall in accuracy may have occurred in the automated data prep, cleansing, feature engineering, and feature selection.\u00a0 You\u2019ll need experienced members of your data science team to help you evaluate this issue.<\/p>\n<p>\u00a0<\/p>\n<p><span style=\"font-size: 12pt;\"><strong>Basic Feature Set<\/strong><\/span><\/p>\n<p>At this stage in market maturity, any AML you consider should offer all of the following automated capabilities:<\/p>\n<p><strong>Data Blending:<\/strong>\u00a0 The combination of data from different sources into a single file.\u00a0 This still requires the operator to specify things like inner or outer joins of data sets.\u00a0 The most advanced platforms may also be able to detect whether the data from two different sources with the same name (e.g. \u2018sales\u2019) has the same meaning.\u00a0 At this point however it\u2019s best to have either really robust data governance (and not many do) or to have modelers sufficiently intimate with the data that they can detect this sort of mismatch.<\/p>\n<p><strong>Data Prep and Cleansing:<\/strong>\u00a0 In this category is automated correction of data in incompatible formats (dates, values with embedded commas, etc.)\u00a0 Most AML data prep platforms do a good job at this.\u00a0 Cleansing is more complex.\u00a0 It involves for example the identification of outliers and how they are to be treated, the correction of badly skewed distributions, the conversion of categoricals into independent features, or even the compression of data ranges (typically -1 to 1) to create data sets as required for some specific types of algorithms like neural nets.<\/p>\n<p><strong>Feature Engineering:<\/strong>\u00a0 In concept feature engineering is simple.\u00a0 For example converting related variables into ratios (e.g. debt to income) or dates into number of days since other events have occurred (age of the account, days since last purchase, etc.).\u00a0 In automated form this frequently requires the AML to create all possible combination of these artificial features without regard for whether they are logical, and then let the algorithms figure out which are predictive (typically only a small fraction).\u00a0 Depending on how this is handled in the AML this can add a very large amount of compute overhead.\u00a0 You\u2019ll want to examine if this step creates any unforeseen requirements in time or compute cost.<\/p>\n<p><strong>Feature Selection and Modeling:<\/strong>\u00a0 These are traditionally thought of separately but I\u2019ve combined them here as AML platforms might.\u00a0 In traditional modeling feature selection can be a separate step that precedes model creation to make the modeling process more accurate and efficient.\u00a0 However, it\u2019s also possible to have the models consider all possible features and to automatically eliminate those which are least predictive.\u00a0<\/p>\n<p>Automated modeling typically involves running parallel contests on the data with different algorithms.\u00a0 During the contests the AML should also be varying the hyperparameters of the different models to attempt to achieve an optimum result.\u00a0 How feature selection, modeling, and hyperparameter tuning is handled by the platform will require your detailed attention during trials.<\/p>\n<p><strong>Model Deployment:<\/strong>\u00a0 Your AML should be able to automatically generate production code in your choice of language compatible with your operating systems (typically Python, C+, Java, or other popular production languages).\u00a0<\/p>\n<p><strong>Model Management and Refresh:<\/strong>\u00a0 The first time you deploy a model in your operating systems you will need to define exactly where it goes.\u00a0 Thereafter a complete AML should be able to monitor the model, determine when a refresh is appropriate, and with minimum human intervention refresh the model and automatically redeploy it.\u00a0 There are human quality control verifications in this process but once the model has been developed, refresh and redeploy should require only a small fraction of the original labor for original development.<\/p>\n<p>\u00a0<\/p>\n<p><span style=\"font-size: 12pt;\"><strong>Some Advanced Considerations<\/strong><\/span><\/p>\n<p><strong>Automation of the Entire Process:<\/strong>\u00a0 In a fully automated system, particularly one focused on maintaining and refreshing existing models it\u2019s important that the entire process can be programmatically defined.\u00a0 In this way the entire process from data capture through deployment and all the customized steps in between can be captured and repeated making the end-to-end process truly automated.<\/p>\n<p><strong>Data Types:<\/strong> Depending on your business you may have a variety of data inputs that may have special needs including unstructured or semi-structured text or image data, or streaming data.\u00a0 A few AML platforms can handle these more advanced requirements.\u00a0 A few AML platforms already have the ability to create deep learning CNN and RNN models though this type of modeling is not yet common in business.<\/p>\n<p><strong>Prepackaged Automation Libraries:<\/strong>\u00a0 During initial model development your data science team will have identified specific steps in the process that need particular attention.\u00a0 These might include data prep, feature selection, or hyperparameter optimization.\u00a0 Ideally your AML platform will include libraries or APIs of callable solutions that can shortcut data scientist labor on these tasks.<\/p>\n<p><strong>Training Data Requirements:<\/strong>\u00a0 Some algorithms that might be considered during the competition for best model may be particularly data hungry.\u00a0 You will want to understand the tradeoffs between including these algorithm types against the availability or cost of acquisition of sufficient training data.<\/p>\n<p><strong>On Premise Solution:<\/strong>\u00a0 Some AML platforms that are particularly compute intensive (as many are) are optimized for a SaaS cloud delivery solution.\u00a0 If your business requires an on-prem or private cloud solution for data security you\u2019ll need to identify the cost and complexity of this option.<\/p>\n<p>While AMLs are positioned for their simplicity, there are many factors to be considered before jumping in.\u00a0 You\u2019ll want help from your data scientist pros in selecting the right one.<\/p>\n<p>\u00a0<\/p>\n<p>\u00a0<\/p>\n<p><a href=\"https:\/\/www.datasciencecentral.com\/profiles\/blog\/list?user=0h5qapp2gbuf8\"><em><u>Other articles by Bill Vorhies<\/u><\/em><\/a><\/p>\n<p>\u00a0<\/p>\n<p>About the author:\u00a0 Bill is Contributing Editor for Data Science Central.\u00a0 Bill is also President &#038; Chief Data Scientist at Data-Magnum and has practiced as a data scientist since 2001.\u00a0 His articles have been read more than 2 million times.<\/p>\n<p>He can be reached at:<\/p>\n<p><a href=\"mailto:Bill@DataScienceCentral.com\">Bill@DataScienceCentral.com<\/a> <span>or<\/span> <a href=\"mailto:Bill@Data-Magnum.com\">Bill@Data-Magnum.com<\/a><\/p>\n<p><span>\u00a0<\/span><\/p>\n<\/div>\n<p><a href=\"https:\/\/www.datasciencecentral.com\/xn\/detail\/6448529:BlogPost:856978\">Go to Source<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Author: William Vorhies Summary: Are you wondering about moving up to Automated Machine Learning (AML)?\u00a0 Here are some considerations to help guide you. \u00a0 Are [&hellip;] <span class=\"read-more-link\"><a class=\"read-more\" href=\"https:\/\/www.aiproblog.com\/index.php\/2019\/07\/16\/thinking-about-moving-up-to-automated-machine-learning-aml\/\">Read More<\/a><\/span><\/p>\n","protected":false},"author":1,"featured_media":457,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_bbp_topic_count":0,"_bbp_reply_count":0,"_bbp_total_topic_count":0,"_bbp_total_reply_count":0,"_bbp_voice_count":0,"_bbp_anonymous_reply_count":0,"_bbp_topic_count_hidden":0,"_bbp_reply_count_hidden":0,"_bbp_forum_subforum_count":0,"footnotes":""},"categories":[26],"tags":[],"_links":{"self":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/2368"}],"collection":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/comments?post=2368"}],"version-history":[{"count":0,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/2368\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media\/461"}],"wp:attachment":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media?parent=2368"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/categories?post=2368"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/tags?post=2368"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}