{"id":2437,"date":"2019-08-06T06:30:11","date_gmt":"2019-08-06T06:30:11","guid":{"rendered":"https:\/\/www.aiproblog.com\/index.php\/2019\/08\/06\/automated-machine-learning-for-professionals-updated\/"},"modified":"2019-08-06T06:30:11","modified_gmt":"2019-08-06T06:30:11","slug":"automated-machine-learning-for-professionals-updated","status":"publish","type":"post","link":"https:\/\/www.aiproblog.com\/index.php\/2019\/08\/06\/automated-machine-learning-for-professionals-updated\/","title":{"rendered":"Automated Machine Learning for Professionals &#8211; Updated"},"content":{"rendered":"<p>Author: William Vorhies<\/p>\n<div>\n<p><strong><em>Summary:<\/em><\/strong> <em>As the Automated Machine Learning (AML) movement got underway a few years back there was an early branch between proprietary platforms and open source platforms.\u00a0 In this article we\u2019ll update you on leading open source AML tools.\u00a0 Since they continue to require fluency in Python or R we label them \u201cprofessional\u201d.<\/em><\/p>\n<p>\u00a0<\/p>\n<p><a href=\"https:\/\/storage.ning.com\/topology\/rest\/1.0\/file\/get\/3408614269?profile=original\" target=\"_blank\" rel=\"noopener noreferrer\"><img decoding=\"async\" src=\"https:\/\/storage.ning.com\/topology\/rest\/1.0\/file\/get\/3408614269?profile=RESIZE_710x\" width=\"300\" class=\"align-right\"><\/a>As the Automated Machine Learning (AML) movement got underway a few years back there was an early branch between proprietary platforms and open source platforms.\u00a0 Today, the primary difference between these is that the proprietary entries are largely code-free so that citizen data scientists \/ business analysts can use them in addition to data scientists.\u00a0 The open source versions are still reliant on your ability to code, or at least to copy code.\u00a0 And oh yes, open source is free.<\/p>\n<p>There is a kind of philosophical disharmony with AML that relies on code.\u00a0 AML after all was intended to make things simple and uniform.\u00a0 All the same there remains a solid core of data scientists who continue to prefer to hand code in Python or R and these open source apps will appeal mostly to them.\u00a0 Hence the appellation \u201cprofessional\u201d since it\u2019s pretty unlikely that any of your business analysts or other citizen data scientists are going to try to compete based on their competence with code.<\/p>\n<p>So although the user interface is not as \u2018slick\u2019, if your organization is an R or Python shop then these packages offer the consistency, speed, and economy while eliminating much of repetitive work entailed in model building.\u00a0 There are even a few of these packages that bridge over into Automated Deep Learning (ADL) if your needs are that complex.<\/p>\n<p>A fully featured <strong>proprietary<\/strong> app can generally provide all of the following features.<\/p>\n<ul>\n<li>Data Blending<\/li>\n<li>Data Prep and Cleansing<\/li>\n<li>Feature Engineering<\/li>\n<li>Feature Selection and Modeling<\/li>\n<li>Model Deployment<\/li>\n<li>Model Management and Refresh<\/li>\n<\/ul>\n<p>Advanced features might also include programmatic automation of the entire process for model refresh and update.\u00a0 They may also accommodate unstructured and semi-structured data.\u00a0 To see more about who\u2019s leading among the proprietary platforms try <a href=\"https:\/\/www.datasciencecentral.com\/profiles\/blogs\/automated-machine-learning-aml-comes-of-age-almost\"><em><u>our previous article here<\/u><\/em><\/a>.\u00a0<\/p>\n<p>It\u2019s unlikely you\u2019ll find all these capabilities in an open source app but possible that you can string several together to come close.\u00a0 Most focus on model selection and hyperparameter tuning and not (yet) the earlier and later tasks in the process.\u00a0 Here are a few of the leaders as mentioned by positive reviews among \u2018professional\u2019 users.<\/p>\n<p>\u00a0<\/p>\n<p><span style=\"font-size: 12pt;\"><strong>Most Complete Solutions<\/strong><\/span><\/p>\n<p><strong>\u00a0<\/strong><\/p>\n<p><strong>MLBox (Machine Learning Box)<\/strong><\/p>\n<p>MLBox is a powerful Python library that performs data cleaning, model selection, and hyperparameter tuning.\u00a0 Some users have done quite well in Kaggle competition scoring in the top 5%.\u00a0 It\u2019s reported to work best with Linux and somewhat less so with Mac and PC.<\/p>\n<p>\u00a0<\/p>\n<p><strong>Auto-SKlearn<\/strong><\/p>\n<p>Auto-SKlearn is built around the well-known scikit-learn Python library and is a relatively complete solution for supervised machine learning.\u00a0 It has placed well in a variety of recent AML contests and is a replacement for scikit-learn estimator.<\/p>\n<p>It will handle missing values, categoricals, sparse data, and rescaling with 14 preprocessing methods.\u00a0 It passes off from the preprocessing module to the classifier \/ regressor including 15 ML algorithms and Bayesian hyperparameter tuning for a total of 110 hyperparameters.\u00a0 It will also automatically construct ensembles.<\/p>\n<p>\u00a0<a href=\"https:\/\/storage.ning.com\/topology\/rest\/1.0\/file\/get\/2808335360?profile=original\" target=\"_blank\" rel=\"noopener noreferrer\"><img decoding=\"async\" src=\"https:\/\/storage.ning.com\/topology\/rest\/1.0\/file\/get\/2808335360?profile=RESIZE_710x\" width=\"500\" class=\"align-center\"><\/a><\/p>\n<p>Interestingly Auto-sklearn has been expanded to handle deep neural nets with an add-in package Auto-Net adding this as a 16<sup>th<\/sup> ML algorithm.<\/p>\n<p>\u00a0<\/p>\n<p><strong>TPOT (Tree-based Pipeline Optimization Tool)<\/strong><\/p>\n<p>TPOT is an open source extension of the scikit-learn Python library.\u00a0 It\u2019s billed as \u201cyour Data Science Assistant\u201d to automate the most tedious portions of model development.\u00a0 As its name specifies this is a tree-based classifier only with automation starting after cleaning but through the delivery of production ready python code.\u00a0<\/p>\n<p>\u00a0<a href=\"https:\/\/storage.ning.com\/topology\/rest\/1.0\/file\/get\/3408617478?profile=original\" target=\"_blank\" rel=\"noopener noreferrer\"><img decoding=\"async\" src=\"https:\/\/storage.ning.com\/topology\/rest\/1.0\/file\/get\/3408617478?profile=RESIZE_710x\" width=\"500\" class=\"align-center\"><\/a><\/p>\n<p>Perhaps TPOT\u2019s most interesting feature is that hyperparameter optimization is based on genetic programming.\u00a0 For the expert user all the assumptions are fully exposed so that you can continue to work the hyperparameters to test the optimization.\u00a0 In preprocessing TPOT can\u2019t yet handle categoricals which must be preprocessed into integer strings.<\/p>\n<p>\u00a0<\/p>\n<p><span style=\"font-size: 12pt;\"><strong>The Component Supermarket<\/strong><\/span><\/p>\n<p>If you want to extend the capability of these packages or try to piece together your own open-source platform, there are a variety to pick from.<\/p>\n<p>In Feature Selection and Engineering you might try:<\/p>\n<ul>\n<li>Boruta.py<\/li>\n<li>Categorical-Encoding<\/li>\n<li>Featuretools<\/li>\n<li>FeatureHub<\/li>\n<\/ul>\n<p>For Hyperparameter Optimization you might try:<\/p>\n<ul>\n<li>ENAS<\/li>\n<li>FAR-HO<\/li>\n<li>GPFlowOpt<\/li>\n<li>HORD<\/li>\n<li>Hyperopt<\/li>\n<li>Ray.tune<\/li>\n<li>Skopt<\/li>\n<\/ul>\n<p><strong>\u00a0<\/strong><\/p>\n<p><span style=\"font-size: 12pt;\"><strong>Open Source ADL (Automated Deep Learning)<\/strong><\/span><\/p>\n<p>Yes, even deep learning now has a few open source packages to make your exploration even easier.<\/p>\n<p>\u00a0<\/p>\n<p><strong>Auto-Keras<\/strong><\/p>\n<p>Auto-Keras provides functions to automatically search for architecture and hyperparameters of deep learning models.\u00a0 Keras is designed to simplify access to deep learning models by reducing code and will run on top of TensorFlow, CNTK, or Theano.<\/p>\n<p>\u00a0<\/p>\n<p><strong>Uber Ludwig<\/strong><\/p>\n<p>Ludwig is a TensorFlow based tool designed to let non-experts create DL models providing only two files, a CSV with the training data, and a YAML file defining inputs and outputs.\u00a0 In theory no code is required.<\/p>\n<p>Ludwig automatically runs a series of DL models which are compared in order to rapidly achieve a suitable final architecture, usually a task that is both time consuming and requires a highly skilled data scientist.<\/p>\n<p>According to the Uber development team, Ludwig is equally valuable for experienced users who have access to all the under-the-hood controls and provides visualizations to provide easy understanding of model performance and prediction.<\/p>\n<p>For both these ADL packages, feature engineering and selection remains a manual process.<\/p>\n<p>\u00a0<\/p>\n<p><span style=\"font-size: 12pt;\"><strong>Additional articles on Automated Machine Learning, Automated Deep Learning, and Other No-Code Solutions<\/strong><\/span><\/p>\n<p><a href=\"https:\/\/www.datasciencecentral.com\/profiles\/blogs\/thinking-about-moving-up-to-automated-machine-learning-aml\"><em><u>Thinking about Moving Up to Automated Machine Learning (AML)<\/u><\/em><\/a> <em>(July 2019)<\/em><\/p>\n<p><a href=\"https:\/\/www.datasciencecentral.com\/profiles\/blogs\/thinking-about-moving-up-to-automated-machine-learning-aml\"><em><u>Automated Machine Learning (AML) Comes of Age \u2013 Almost<\/u><\/em><\/a> <em>(July 2019)<\/em><\/p>\n<p><a href=\"https:\/\/www.datasciencecentral.com\/profiles\/blogs\/practicing-no-code-data-science\"><em><u>Practicing \u2018No Code\u2019 Data Science<\/u><\/em><\/a><em>\u00a0 (October 2018)<\/em><\/p>\n<p><a href=\"https:\/\/www.datasciencecentral.com\/profiles\/blogs\/what-s-new-in-data-prep\"><em><u>What\u2019s New in Data Prep<\/u><\/em><\/a><em><u>\u00a0<\/u> (September 2018)<\/em><\/p>\n<p><a href=\"https:\/\/www.datasciencecentral.com\/profiles\/blogs\/democratizing-deep-learning-the-stanford-dawn-project\"><em><u>Democratizing Deep Learning \u2013 The Stanford Dawn Project<\/u><\/em><\/a><em><u>\u00a0<\/u> (September 2018)<\/em><\/p>\n<p><a href=\"https:\/\/www.datasciencecentral.com\/profiles\/blogs\/transfer-learning-deep-learning-for-everyone\"><em><u>Transfer Learning \u2013Deep Learning for Everyone<\/u><\/em><\/a><em><u>\u00a0<\/u> (April 2018)<\/em><\/p>\n<p><a href=\"https:\/\/www.datasciencecentral.com\/profiles\/blogs\/automated-deep-learning-so-simple-anyone-can-do-it\"><em><u>Automated Deep Learning \u2013 So Simple Anyone Can Do It<\/u><\/em><\/a><em><u>\u00a0<\/u> (April 2018)<\/em><\/p>\n<p><a href=\"https:\/\/www.datasciencecentral.com\/profiles\/blogs\/next-generation-automated-machine-learning-aml\"><em><u>Next Generation Automated Machine Learning (AML)<\/u><\/em><\/a> <em>(April 2018)<\/em><\/p>\n<p><a href=\"https:\/\/www.datasciencecentral.com\/profiles\/blogs\/more-on-fully-automated-machine-learning\"><em><u>More on Fully Automated Machine Learning<\/u><\/em><\/a><em>\u00a0 (August 2017)<\/em><\/p>\n<p><a href=\"https:\/\/www.datasciencecentral.com\/profiles\/blogs\/automated-machine-learning-for-professionals\"><em><u>Automated Machine Learning for Professionals<\/u><\/em><\/a><em>\u00a0 (July 2017)<\/em><\/p>\n<p><a href=\"https:\/\/www.datasciencecentral.com\/profiles\/blogs\/data-scientists-automated-and-unemployed-by-2025-update\"><em><u>Data Scientists Automated and Unemployed by 2025 &#8211; Update!<\/u><\/em><\/a><em>\u00a0 (July 2017)<\/em><\/p>\n<p><a href=\"https:\/\/www.datasciencecentral.com\/profiles\/blogs\/data-scientists-automated-and-unemployed-by-2025\"><em><u>Data Scientists Automated and Unemployed by 2025!<\/u><\/em><\/a><em>\u00a0 (April 2016)<\/em><\/p>\n<p>\u00a0<\/p>\n<p><a href=\"https:\/\/www.datasciencecentral.com\/profiles\/blog\/list?user=0h5qapp2gbuf8\"><em><u>Other articles by Bill Vorhies<\/u><\/em><\/a><\/p>\n<p>\u00a0<\/p>\n<p>About the author:\u00a0 Bill is Contributing Editor for Data Science Central.\u00a0 Bill is also President &#038; Chief Data Scientist at Data-Magnum and has practiced as a data scientist since 2001.\u00a0 His articles have been read more than 2 million times.<\/p>\n<p>He can be reached at:<\/p>\n<p><a href=\"mailto:Bill@DataScienceCentral.com\">Bill@DataScienceCentral.com<\/a> <span>or<\/span> <a href=\"mailto:Bill@Data-Magnum.com\">Bill@Data-Magnum.com<\/a><\/p>\n<\/div>\n<p><a href=\"https:\/\/www.datasciencecentral.com\/xn\/detail\/6448529:BlogPost:865145\">Go to Source<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Author: William Vorhies Summary: As the Automated Machine Learning (AML) movement got underway a few years back there was an early branch between proprietary platforms [&hellip;] <span class=\"read-more-link\"><a class=\"read-more\" href=\"https:\/\/www.aiproblog.com\/index.php\/2019\/08\/06\/automated-machine-learning-for-professionals-updated\/\">Read More<\/a><\/span><\/p>\n","protected":false},"author":1,"featured_media":460,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_bbp_topic_count":0,"_bbp_reply_count":0,"_bbp_total_topic_count":0,"_bbp_total_reply_count":0,"_bbp_voice_count":0,"_bbp_anonymous_reply_count":0,"_bbp_topic_count_hidden":0,"_bbp_reply_count_hidden":0,"_bbp_forum_subforum_count":0,"footnotes":""},"categories":[26],"tags":[],"_links":{"self":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/2437"}],"collection":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/comments?post=2437"}],"version-history":[{"count":0,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/2437\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media\/470"}],"wp:attachment":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media?parent=2437"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/categories?post=2437"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/tags?post=2437"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}