{"id":1045,"date":"2018-09-13T19:00:31","date_gmt":"2018-09-13T19:00:31","guid":{"rendered":"https:\/\/www.aiproblog.com\/index.php\/2018\/09\/13\/how-to-develop-a-reusable-framework-to-spot-check-algorithms-in-python\/"},"modified":"2018-09-13T19:00:31","modified_gmt":"2018-09-13T19:00:31","slug":"how-to-develop-a-reusable-framework-to-spot-check-algorithms-in-python","status":"publish","type":"post","link":"https:\/\/www.aiproblog.com\/index.php\/2018\/09\/13\/how-to-develop-a-reusable-framework-to-spot-check-algorithms-in-python\/","title":{"rendered":"How to Develop a Reusable Framework to Spot-Check Algorithms in Python"},"content":{"rendered":"<p>Author: Jason Brownlee<\/p>\n<div>\n<p><a href=\"https:\/\/machinelearningmastery.com\/spot-check-classification-machine-learning-algorithms-python-scikit-learn\/\">Spot-checking algorithms<\/a> is a technique in applied machine learning designed to quickly and objectively provide a first set of results on a new predictive modeling problem.<\/p>\n<p>Unlike grid searching and other types of algorithm tuning that seek the optimal algorithm or optimal configuration for an algorithm, spot-checking is intended to evaluate a diverse set of algorithms rapidly and provide a rough first-cut result. This first cut result may be used to get an idea if a problem or problem representation is indeed predictable, and if so, the types of algorithms that may be worth investigating further for the problem.<\/p>\n<p>Spot-checking is an approach to help overcome the \u201c<a href=\"https:\/\/machinelearningmastery.com\/applied-machine-learning-is-hard\/\">hard problem<\/a>\u201d of applied machine learning and encourage you to clearly think about the <a href=\"https:\/\/machinelearningmastery.com\/applied-machine-learning-as-a-search-problem\/\">higher-order search problem<\/a> being performed in any machine learning project.<\/p>\n<p>In this tutorial, you will discover the usefulness of spot-checking algorithms on a new predictive modeling problem and how to develop a standard framework for spot-checking algorithms in python for classification and regression problems.<\/p>\n<p>After completing this tutorial, you will know:<\/p>\n<ul>\n<li>Spot-checking provides a way to quickly discover the types of algorithms that perform well on your predictive modeling problem.<\/li>\n<li>How to develop a generic framework for loading data, defining models, evaluating models, and summarizing results.<\/li>\n<li>How to apply the framework for classification and regression problems.<\/li>\n<\/ul>\n<p>Let\u2019s get started.<\/p>\n<div id=\"attachment_6195\" style=\"width: 650px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"size-full wp-image-6195\" src=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2018\/09\/How-to-Develop-a-Reusable-Framework-for-Spot-Check-Algorithms-in-Python.jpg\" alt=\"How to Develop a Reusable Framework for Spot-Check Algorithms in Python\" width=\"640\" height=\"360\" srcset=\"http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2018\/09\/How-to-Develop-a-Reusable-Framework-for-Spot-Check-Algorithms-in-Python.jpg 640w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2018\/09\/How-to-Develop-a-Reusable-Framework-for-Spot-Check-Algorithms-in-Python-300x169.jpg 300w\" sizes=\"(max-width: 640px) 100vw, 640px\"><\/p>\n<p class=\"wp-caption-text\">How to Develop a Reusable Framework for Spot-Check Algorithms in Python<br \/>Photo by <a href=\"https:\/\/www.flickr.com\/photos\/respres\/16216077206\/\">Jeff Turner<\/a>, some rights reserved.<\/p>\n<\/div>\n<h2>Tutorial Overview<\/h2>\n<p>This tutorial is divided into five parts; they are:<\/p>\n<ol>\n<li>Spot-Check Algorithms<\/li>\n<li>Spot-Checking Framework in Python<\/li>\n<li>Spot-Checking for Classification<\/li>\n<li>Spot-Checking for Regression<\/li>\n<li>Framework Extension<\/li>\n<\/ol>\n<h2>1. Spot-Check Algorithms<\/h2>\n<p>We cannot know beforehand what algorithms will perform well on a given predictive modeling problem.<\/p>\n<p>This is the <a href=\"https:\/\/machinelearningmastery.com\/applied-machine-learning-is-hard\/\">hard part of applied machine learning<\/a> that can only be resolved via systematic experimentation.<\/p>\n<p><a href=\"https:\/\/machinelearningmastery.com\/why-you-should-be-spot-checking-algorithms-on-your-machine-learning-problems\/\">Spot-checking<\/a> is an approach to this problem.<\/p>\n<p>It involves rapidly testing a large suite of diverse machine learning algorithms on a problem in order to quickly discover what algorithms might work and where to focus attention.<\/p>\n<ul>\n<li><strong>It is fast<\/strong>; it by-passes the days or weeks of preparation and analysis and playing with algorithms that may not ever lead to a result.<\/li>\n<li><strong>It is objective<\/strong>, allowing you to discover what might work well for a problem rather than going with what you used last time.<\/li>\n<li><strong>It gets results<\/strong>; you will actually fit models, make predictions and know if your problem can be predicted and what baseline skill may look like.<\/li>\n<\/ul>\n<p>Spot-checking may require that you work with a small sample of your dataset in order to turn around results quickly.<\/p>\n<p>Finally, the results from spot checking are a jumping-off point. A starting point. They suggest where to focus attention on the problem, not what the best algorithm might be. The process is designed to shake you out of typical thinking and analysis and instead focus on results.<\/p>\n<p>You can learn more about spot-checking in the post:<\/p>\n<ul>\n<li><a href=\"https:\/\/machinelearningmastery.com\/why-you-should-be-spot-checking-algorithms-on-your-machine-learning-problems\/\">Why You Should be Spot-Checking Algorithms on your Machine Learning Problems<\/a><\/li>\n<\/ul>\n<p>Now that we know what spot-checking is, let\u2019s look at how we can systematically perform spot-checking in Python.<\/p>\n<h2>2. Spot-Checking Framework in Python<\/h2>\n<p>In this section we will build a framework for a script that can be used for spot-checking machine learning algorithms on a classification or regression problem.<\/p>\n<p>There are four parts to the framework that we need to develop; they are:<\/p>\n<ul>\n<li>Load Dataset<\/li>\n<li>Define Models<\/li>\n<li>Evaluate Models<\/li>\n<li>Summarize Results<\/li>\n<\/ul>\n<p>Let\u2019s take a look at each in turn.<\/p>\n<h3>Load Dataset<\/h3>\n<p>The first step of the framework is to load the data.<\/p>\n<p>The function must be implemented for a given problem and be specialized to that problem. It will likely involve loading data from one or more CSV files.<\/p>\n<p>We will call this function <em>load_data()<\/em>; it will take no arguments and return the inputs (<em>X<\/em>) and outputs (<em>y<\/em>) for the prediction problem.<\/p>\n<pre class=\"crayon-plain-tag\"># load the dataset, returns X and y elements\r\ndef load_dataset():\r\n\tX, y = None, None\r\n\treturn X, y<\/pre>\n<\/p>\n<h3>Define Models<\/h3>\n<p>The next step is to define the models to evaluate on the predictive modeling problem.<\/p>\n<p>The models defined will be specific to the type predictive modeling problem, e.g. classification or regression.<\/p>\n<p>The defined models should be diverse, including a mixture of:<\/p>\n<ul>\n<li>Linear Models.<\/li>\n<li>Nonlinear Models.<\/li>\n<li>Ensemble Models.<\/li>\n<\/ul>\n<p>Each model should be a given a good chance to perform well on the problem. This might be mean providing a few variations of the model with different common or well known configurations that perform well on average.<\/p>\n<p>We will call this function <em>define_models()<\/em>. It will return a dictionary of model names mapped to scikit-learn model object. The name should be short, like \u2018<em>svm<\/em>\u2018 and may include a configuration detail, e.g. \u2018knn-7\u2019.<\/p>\n<p>The function will also take a dictionary as an optional argument; if not provided, a new dictionary is created and populated. If a dictionary is provided, models are added to it.<\/p>\n<p>This is to add flexibility if you would like to have multiple functions for defining models, or add a large number of models of a specific type with different configurations.<\/p>\n<pre class=\"crayon-plain-tag\"># create a dict of standard models to evaluate {name:object}\r\ndef define_models(models=dict()):\r\n\t# ...\r\n\treturn models<\/pre>\n<p>The idea is not to grid search model parameters; that can come later.<\/p>\n<p>Instead, each model should be given an opportunity to perform well (i.e. not optimally). This might mean trying many combinations of parameters in some cases, e.g. in the case of gradient boosting.<\/p>\n<h3>Evaluate Models<\/h3>\n<p>The next step is the evaluation of the defined models on the loaded dataset.<\/p>\n<p>The scikit-learn library provides the ability to pipeline models during evaluation. This allows the data to be transformed prior to being used to fit a model, and this is done in a correct way such that the transforms are prepared on the training data and applied to the test data.<\/p>\n<p>We can define a function that prepares a given model prior to evaluation to allow specific transforms to be used during the spot checking process. They will be performed in a blanket way to all models. This can be useful to perform operations such as standardization, normalization, and feature selection.<\/p>\n<p>We will define a function named <em>make_pipeline()<\/em> that takes a defined model and returns a pipeline. Below is an example of preparing a pipeline that will first standardize the input data, then normalize it prior to fitting the model.<\/p>\n<pre class=\"crayon-plain-tag\"># create a feature preparation pipeline for a model\r\ndef make_pipeline(model):\r\n\tsteps = list()\r\n\t# standardization\r\n\tsteps.append(('standardize', StandardScaler()))\r\n\t# normalization\r\n\tsteps.append(('normalize', MinMaxScaler()))\r\n\t# the model\r\n\tsteps.append(('model', model))\r\n\t# create pipeline\r\n\tpipeline = Pipeline(steps=steps)\r\n\treturn pipeline<\/pre>\n<p>This function can be expanded to add other transforms, or simplified to return the provided model with no transforms.<\/p>\n<p>Now we need to evaluate a prepared model.<\/p>\n<p>We will use a standard of evaluating models using k-fold cross-validation. The evaluation of each defined model will result in a list of results. This is because 10 different versions of the model will have been fit and evaluated, resulting in a list of k scores.<\/p>\n<p>We will define a function named <em>evaluate_model()<\/em> that will take the data, a defined model, a number of folds, and a performance metric used to evaluate the results. It will return the list of scores.<\/p>\n<p>The function calls <em>make_pipeline()<\/em> for the defined model to prepare any data transforms required, then calls the <a href=\"http:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.model_selection.cross_val_score.html\">cross_val_score()<\/a> scikit-learn function. Importantly, the <em>n_jobs<\/em>\u00a0argument is set to -1 to allow the model evaluations to occur in parallel, harnessing as many cores as you have available on your hardware.<\/p>\n<pre class=\"crayon-plain-tag\"># evaluate a single model\r\ndef evaluate_model(X, y, model, folds, metric):\r\n\t# create the pipeline\r\n\tpipeline = make_pipeline(model)\r\n\t# evaluate model\r\n\tscores = cross_val_score(pipeline, X, y, scoring=metric, cv=folds, n_jobs=-1)\r\n\treturn scores<\/pre>\n<p>It is possible for the evaluation of a model to fail with an exception. I have seen this especially in the case of some\u00a0models from the statsmodels library.<\/p>\n<p>It is also possible for the evaluation of a model to result in a lot of warning messages. I have seen this especially in the case of using XGBoost models.<\/p>\n<p>We do not care about exceptions or warnings when spot checking. We only want to know what does work and what works well. Therefore, we can trap exceptions and ignore all warnings when evaluating each model.<\/p>\n<p>The function named <em>robust_evaluate_model()<\/em> implements this behavior. The <em>evaluate_model()<\/em> is called in a way that traps exceptions and ignores warnings. If an exception occurs and no result was possible for a given model, a <em>None<\/em> result is returned.<\/p>\n<pre class=\"crayon-plain-tag\"># evaluate a model and try to trap errors and and hide warnings\r\ndef robust_evaluate_model(X, y, model, folds, metric):\r\n\tscores = None\r\n\ttry:\r\n\t\twith warnings.catch_warnings():\r\n\t\t\twarnings.filterwarnings(\"ignore\")\r\n\t\t\tscores = evaluate_model(X, y, model, folds, metric)\r\n\texcept:\r\n\t\tscores = None\r\n\treturn scores<\/pre>\n<p>Finally, we can define the top-level function for evaluating the list of defined models.<\/p>\n<p>We will define a function named <em>evaluate_models()<\/em> that takes the dictionary of models as an argument and returns a dictionary of model names to lists of results.<\/p>\n<p>The number of folds in the cross-validation process can be specified by an optional argument that defaults to 10. The metric calculated on the predictions from the model can also be specified by an optional argument and defaults to classification accuracy.<\/p>\n<p>For a full list of supported metrics, see this list:<\/p>\n<ul>\n<li><a href=\"http:\/\/scikit-learn.org\/stable\/modules\/model_evaluation.html#scoring-parameter\">The scoring parameter: defining model evaluation rules, scikit-learn<\/a>.<\/li>\n<\/ul>\n<p>Any None results are skipped and not added to the dictionary of results.<\/p>\n<p>Importantly, we provide some verbose output, summarizing the mean and standard deviation of each model after it was evaluated. This is helpful if the spot checking process on your dataset takes minutes to hours.<\/p>\n<pre class=\"crayon-plain-tag\"># evaluate a dict of models {name:object}, returns {name:score}\r\ndef evaluate_models(X, y, models, folds=10, metric='accuracy'):\r\n\tresults = dict()\r\n\tfor name, model in models.items():\r\n\t\t# evaluate the model\r\n\t\tscores = robust_evaluate_model(X, y, model, folds, metric)\r\n\t\t# show process\r\n\t\tif scores is not None:\r\n\t\t\t# store a result\r\n\t\t\tresults[name] = scores\r\n\t\t\tmean_score, std_score = mean(scores), std(scores)\r\n\t\t\tprint('>%s: %.3f (+\/-%.3f)' % (name, mean_score, std_score))\r\n\t\telse:\r\n\t\t\tprint('>%s: error' % name)\r\n\treturn results<\/pre>\n<p>Note that if for some reason you want to see warnings and errors, you can update the <em>evaluate_models()<\/em> to call the <em>evaluate_model()<\/em> function directly, by-passing the robust error handling. I find this useful when testing out new methods or method configurations that fail silently.<\/p>\n<h3>Summarize Results<\/h3>\n<p>Finally, we can evaluate the results.<\/p>\n<p>Really, we only want to know what algorithms performed well.<\/p>\n<p>Two useful ways to summarize the results are:<\/p>\n<ol>\n<li>Line summaries of the mean and standard deviation of the top 10 performing algorithms.<\/li>\n<li>Box and whisker plots of the top 10 performing algorithms.<\/li>\n<\/ol>\n<p>The line summaries are quick and precise, although assume a well behaving Gaussian distribution, which may not be reasonable.<\/p>\n<p>The box and whisker plots assume no distribution and provide a visual way to directly compare the distribution of scores across models in terms of median performance and spread of scores.<\/p>\n<p>We will define a function named <em>summarize_results()<\/em> that takes the dictionary of results, prints the summary of results, and creates a boxplot image that is saved to file. The function takes an argument to specify if the evaluation score is maximizing, which by default is <em>True<\/em>. The number of results to summarize can also be provided as an optional parameter, which defaults to 10.<\/p>\n<p>The function first orders the scores before printing the summary and creating the box and whisker plot.<\/p>\n<pre class=\"crayon-plain-tag\"># print and plot the top n results\r\ndef summarize_results(results, maximize=True, top_n=10):\r\n\t# check for no results\r\n\tif len(results) == 0:\r\n\t\tprint('no results')\r\n\t\treturn\r\n\t# determine how many results to summarize\r\n\tn = min(top_n, len(results))\r\n\t# create a list of (name, mean(scores)) tuples\r\n\tmean_scores = [(k,mean(v)) for k,v in results.items()]\r\n\t# sort tuples by mean score\r\n\tmean_scores = sorted(mean_scores, key=lambda x: x[1])\r\n\t# reverse for descending order (e.g. for accuracy)\r\n\tif maximize:\r\n\t\tmean_scores = list(reversed(mean_scores))\r\n\t# retrieve the top n for summarization\r\n\tnames = [x[0] for x in mean_scores[:n]]\r\n\tscores = [results[x[0]] for x in mean_scores[:n]]\r\n\t# print the top n\r\n\tprint()\r\n\tfor i in range(n):\r\n\t\tname = names[i]\r\n\t\tmean_score, std_score = mean(results[name]), std(results[name])\r\n\t\tprint('Rank=%d, Name=%s, Score=%.3f (+\/- %.3f)' % (i+1, name, mean_score, std_score))\r\n\t# boxplot for the top n\r\n\tpyplot.boxplot(scores, labels=names)\r\n\t_, labels = pyplot.xticks()\r\n\tpyplot.setp(labels, rotation=90)\r\n\tpyplot.savefig('spotcheck.png')<\/pre>\n<p>Now that we have specialized a framework for spot-checking algorithms in Python, let\u2019s look at how we can apply it to a classification problem.<\/p>\n<h2>3. Spot-Checking for Classification<\/h2>\n<p>We will generate a binary classification problem using the <a href=\"http:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.datasets.make_classification.html\">make_classification() function<\/a>.<\/p>\n<p>The function will generate 1,000 samples with 20 variables, with some redundant variables and two classes.<\/p>\n<pre class=\"crayon-plain-tag\"># load the dataset, returns X and y elements\r\ndef load_dataset():\r\n\treturn make_classification(n_samples=1000, n_classes=2, random_state=1)<\/pre>\n<p>As a classification problem, we will try a suite of classification algorithms, specifically:<\/p>\n<h3>Linear Algorithms<\/h3>\n<ul>\n<li>Logistic Regression<\/li>\n<li>Ridge Regression<\/li>\n<li>Stochastic Gradient Descent Classifier<\/li>\n<li>Passive Aggressive Classifier<\/li>\n<\/ul>\n<p>I tried LDA and QDA, but they sadly crashed down in the C-code somewhere.<\/p>\n<h3>Nonlinear Algorithms<\/h3>\n<ul>\n<li>k-Nearest Neighbors<\/li>\n<li>Classification and Regression Trees<\/li>\n<li>Extra Tree<\/li>\n<li>Support Vector Machine<\/li>\n<li>Naive Bayes<\/li>\n<\/ul>\n<h3>Ensemble Algorithms<\/h3>\n<ul>\n<li>AdaBoost<\/li>\n<li>Bagged Decision Trees<\/li>\n<li>Random Forest<\/li>\n<li>Extra Trees<\/li>\n<li>Gradient Boosting Machine<\/li>\n<\/ul>\n<p>Further, I added multiple configurations for a few of the algorithms like Ridge, kNN, and SVM in order to give them a good chance on the problem.<\/p>\n<p>The full <em>define_models()<\/em> function is listed below.<\/p>\n<pre class=\"crayon-plain-tag\"># create a dict of standard models to evaluate {name:object}\r\ndef define_models(models=dict()):\r\n\t# linear models\r\n\tmodels['logistic'] = LogisticRegression()\r\n\talpha = [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]\r\n\tfor a in alpha:\r\n\t\tmodels['ridge-'+str(a)] = RidgeClassifier(alpha=a)\r\n\tmodels['sgd'] = SGDClassifier(max_iter=1000, tol=1e-3)\r\n\tmodels['pa'] = PassiveAggressiveClassifier(max_iter=1000, tol=1e-3)\r\n\t# non-linear models\r\n\tn_neighbors = range(1, 21)\r\n\tfor k in n_neighbors:\r\n\t\tmodels['knn-'+str(k)] = KNeighborsClassifier(n_neighbors=k)\r\n\tmodels['cart'] = DecisionTreeClassifier()\r\n\tmodels['extra'] = ExtraTreeClassifier()\r\n\tmodels['svml'] = SVC(kernel='linear')\r\n\tmodels['svmp'] = SVC(kernel='poly')\r\n\tc_values = [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]\r\n\tfor c in c_values:\r\n\t\tmodels['svmr'+str(c)] = SVC(C=c)\r\n\tmodels['bayes'] = GaussianNB()\r\n\t# ensemble models\r\n\tn_trees = 100\r\n\tmodels['ada'] = AdaBoostClassifier(n_estimators=n_trees)\r\n\tmodels['bag'] = BaggingClassifier(n_estimators=n_trees)\r\n\tmodels['rf'] = RandomForestClassifier(n_estimators=n_trees)\r\n\tmodels['et'] = ExtraTreesClassifier(n_estimators=n_trees)\r\n\tmodels['gbm'] = GradientBoostingClassifier(n_estimators=n_trees)\r\n\tprint('Defined %d models' % len(models))\r\n\treturn models<\/pre>\n<p>That\u2019s it; we are now ready to spot check algorithms on the problem.<\/p>\n<p>The complete example is listed below.<\/p>\n<pre class=\"crayon-plain-tag\"># binary classification spot check script\r\nimport warnings\r\nfrom numpy import mean\r\nfrom numpy import std\r\nfrom matplotlib import pyplot\r\nfrom sklearn.datasets import make_classification\r\nfrom sklearn.model_selection import cross_val_score\r\nfrom sklearn.preprocessing import StandardScaler\r\nfrom sklearn.preprocessing import MinMaxScaler\r\nfrom sklearn.pipeline import Pipeline\r\nfrom sklearn.linear_model import LogisticRegression\r\nfrom sklearn.linear_model import RidgeClassifier\r\nfrom sklearn.linear_model import SGDClassifier\r\nfrom sklearn.linear_model import PassiveAggressiveClassifier\r\nfrom sklearn.neighbors import KNeighborsClassifier\r\nfrom sklearn.tree import DecisionTreeClassifier\r\nfrom sklearn.tree import ExtraTreeClassifier\r\nfrom sklearn.svm import SVC\r\nfrom sklearn.naive_bayes import GaussianNB\r\nfrom sklearn.ensemble import AdaBoostClassifier\r\nfrom sklearn.ensemble import BaggingClassifier\r\nfrom sklearn.ensemble import RandomForestClassifier\r\nfrom sklearn.ensemble import ExtraTreesClassifier\r\nfrom sklearn.ensemble import GradientBoostingClassifier\r\n\r\n# load the dataset, returns X and y elements\r\ndef load_dataset():\r\n\treturn make_classification(n_samples=1000, n_classes=2, random_state=1)\r\n\r\n# create a dict of standard models to evaluate {name:object}\r\ndef define_models(models=dict()):\r\n\t# linear models\r\n\tmodels['logistic'] = LogisticRegression()\r\n\talpha = [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]\r\n\tfor a in alpha:\r\n\t\tmodels['ridge-'+str(a)] = RidgeClassifier(alpha=a)\r\n\tmodels['sgd'] = SGDClassifier(max_iter=1000, tol=1e-3)\r\n\tmodels['pa'] = PassiveAggressiveClassifier(max_iter=1000, tol=1e-3)\r\n\t# non-linear models\r\n\tn_neighbors = range(1, 21)\r\n\tfor k in n_neighbors:\r\n\t\tmodels['knn-'+str(k)] = KNeighborsClassifier(n_neighbors=k)\r\n\tmodels['cart'] = DecisionTreeClassifier()\r\n\tmodels['extra'] = ExtraTreeClassifier()\r\n\tmodels['svml'] = SVC(kernel='linear')\r\n\tmodels['svmp'] = SVC(kernel='poly')\r\n\tc_values = [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]\r\n\tfor c in c_values:\r\n\t\tmodels['svmr'+str(c)] = SVC(C=c)\r\n\tmodels['bayes'] = GaussianNB()\r\n\t# ensemble models\r\n\tn_trees = 100\r\n\tmodels['ada'] = AdaBoostClassifier(n_estimators=n_trees)\r\n\tmodels['bag'] = BaggingClassifier(n_estimators=n_trees)\r\n\tmodels['rf'] = RandomForestClassifier(n_estimators=n_trees)\r\n\tmodels['et'] = ExtraTreesClassifier(n_estimators=n_trees)\r\n\tmodels['gbm'] = GradientBoostingClassifier(n_estimators=n_trees)\r\n\tprint('Defined %d models' % len(models))\r\n\treturn models\r\n\r\n# create a feature preparation pipeline for a model\r\ndef make_pipeline(model):\r\n\tsteps = list()\r\n\t# standardization\r\n\tsteps.append(('standardize', StandardScaler()))\r\n\t# normalization\r\n\tsteps.append(('normalize', MinMaxScaler()))\r\n\t# the model\r\n\tsteps.append(('model', model))\r\n\t# create pipeline\r\n\tpipeline = Pipeline(steps=steps)\r\n\treturn pipeline\r\n\r\n# evaluate a single model\r\ndef evaluate_model(X, y, model, folds, metric):\r\n\t# create the pipeline\r\n\tpipeline = make_pipeline(model)\r\n\t# evaluate model\r\n\tscores = cross_val_score(pipeline, X, y, scoring=metric, cv=folds, n_jobs=-1)\r\n\treturn scores\r\n\r\n# evaluate a model and try to trap errors and and hide warnings\r\ndef robust_evaluate_model(X, y, model, folds, metric):\r\n\tscores = None\r\n\ttry:\r\n\t\twith warnings.catch_warnings():\r\n\t\t\twarnings.filterwarnings(\"ignore\")\r\n\t\t\tscores = evaluate_model(X, y, model, folds, metric)\r\n\texcept:\r\n\t\tscores = None\r\n\treturn scores\r\n\r\n# evaluate a dict of models {name:object}, returns {name:score}\r\ndef evaluate_models(X, y, models, folds=10, metric='accuracy'):\r\n\tresults = dict()\r\n\tfor name, model in models.items():\r\n\t\t# evaluate the model\r\n\t\tscores = robust_evaluate_model(X, y, model, folds, metric)\r\n\t\t# show process\r\n\t\tif scores is not None:\r\n\t\t\t# store a result\r\n\t\t\tresults[name] = scores\r\n\t\t\tmean_score, std_score = mean(scores), std(scores)\r\n\t\t\tprint('>%s: %.3f (+\/-%.3f)' % (name, mean_score, std_score))\r\n\t\telse:\r\n\t\t\tprint('>%s: error' % name)\r\n\treturn results\r\n\r\n# print and plot the top n results\r\ndef summarize_results(results, maximize=True, top_n=10):\r\n\t# check for no results\r\n\tif len(results) == 0:\r\n\t\tprint('no results')\r\n\t\treturn\r\n\t# determine how many results to summarize\r\n\tn = min(top_n, len(results))\r\n\t# create a list of (name, mean(scores)) tuples\r\n\tmean_scores = [(k,mean(v)) for k,v in results.items()]\r\n\t# sort tuples by mean score\r\n\tmean_scores = sorted(mean_scores, key=lambda x: x[1])\r\n\t# reverse for descending order (e.g. for accuracy)\r\n\tif maximize:\r\n\t\tmean_scores = list(reversed(mean_scores))\r\n\t# retrieve the top n for summarization\r\n\tnames = [x[0] for x in mean_scores[:n]]\r\n\tscores = [results[x[0]] for x in mean_scores[:n]]\r\n\t# print the top n\r\n\tprint()\r\n\tfor i in range(n):\r\n\t\tname = names[i]\r\n\t\tmean_score, std_score = mean(results[name]), std(results[name])\r\n\t\tprint('Rank=%d, Name=%s, Score=%.3f (+\/- %.3f)' % (i+1, name, mean_score, std_score))\r\n\t# boxplot for the top n\r\n\tpyplot.boxplot(scores, labels=names)\r\n\t_, labels = pyplot.xticks()\r\n\tpyplot.setp(labels, rotation=90)\r\n\tpyplot.savefig('spotcheck.png')\r\n\r\n# load dataset\r\nX, y = load_dataset()\r\n# get model list\r\nmodels = define_models()\r\n# evaluate models\r\nresults = evaluate_models(X, y, models)\r\n# summarize results\r\nsummarize_results(results)<\/pre>\n<p>Running the example prints one line per evaluated model, ending with a summary of the top 10 performing algorithms on the problem.<\/p>\n<p>We can see that ensembles of decision trees performed the best for this problem. This suggests a few things:<\/p>\n<ul>\n<li>Ensembles of decision trees might be a good place to focus attention.<\/li>\n<li>Gradient boosting will likely do well if further tuned.<\/li>\n<li>A \u201cgood\u201d performance on the problem is about 86% accuracy.<\/li>\n<li>The relatively high performance of ridge regression suggests the need for feature selection.<\/li>\n<\/ul>\n<pre class=\"crayon-plain-tag\">...\r\n>bag: 0.862 (+\/-0.034)\r\n>rf: 0.865 (+\/-0.033)\r\n>et: 0.858 (+\/-0.035)\r\n>gbm: 0.867 (+\/-0.044)\r\n\r\nRank=1, Name=gbm, Score=0.867 (+\/- 0.044)\r\nRank=2, Name=rf, Score=0.865 (+\/- 0.033)\r\nRank=3, Name=bag, Score=0.862 (+\/- 0.034)\r\nRank=4, Name=et, Score=0.858 (+\/- 0.035)\r\nRank=5, Name=ada, Score=0.850 (+\/- 0.035)\r\nRank=6, Name=ridge-0.9, Score=0.848 (+\/- 0.038)\r\nRank=7, Name=ridge-0.8, Score=0.848 (+\/- 0.038)\r\nRank=8, Name=ridge-0.7, Score=0.848 (+\/- 0.038)\r\nRank=9, Name=ridge-0.6, Score=0.848 (+\/- 0.038)\r\nRank=10, Name=ridge-0.5, Score=0.848 (+\/- 0.038)<\/pre>\n<p>A box and whisker plot is also created to summarize the results of the top 10 well performing algorithms.<\/p>\n<p>The plot shows the elevation of the methods comprised of ensembles of decision trees. The plot enforces the notion that further attention on these methods would be a good idea.<\/p>\n<div id=\"attachment_6192\" style=\"width: 650px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"size-full wp-image-6192\" src=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2018\/07\/Boxplot-of-top-10-Spot-Checking-Algorithms-on-a-Classification-Problem.png\" alt=\"Boxplot of top 10 Spot-Checking Algorithms on a Classification Problem\" width=\"640\" height=\"480\" srcset=\"http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2018\/07\/Boxplot-of-top-10-Spot-Checking-Algorithms-on-a-Classification-Problem.png 640w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2018\/07\/Boxplot-of-top-10-Spot-Checking-Algorithms-on-a-Classification-Problem-300x225.png 300w\" sizes=\"(max-width: 640px) 100vw, 640px\"><\/p>\n<p class=\"wp-caption-text\">Boxplot of top 10 Spot-Checking Algorithms on a Classification Problem<\/p>\n<\/div>\n<p>If this were a real classification problem, I would follow-up with further spot checks, such as:<\/p>\n<ul>\n<li>Spot check with various different feature selection methods.<\/li>\n<li>Spot check without data scaling methods.<\/li>\n<li>Spot check with a course grid of configurations for gradient boosting in sklearn or XGBoost.<\/li>\n<\/ul>\n<p>Next, we will see how we can apply the framework to a regression problem.<\/p>\n<h2>4. Spot-Checking for Regression<\/h2>\n<p>We can explore the same framework for regression predictive modeling problems with only very minor changes.<\/p>\n<p>We can use the <a href=\"http:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.datasets.make_regression.html#sklearn.datasets.make_regression\">make_regression() function<\/a> to generate a contrived regression problem with 1,000 examples and 50 features, some of them redundant.<\/p>\n<p>The defined <em>load_dataset()<\/em> function is listed below.<\/p>\n<pre class=\"crayon-plain-tag\"># load the dataset, returns X and y elements\r\ndef load_dataset():\r\n\treturn make_regression(n_samples=1000, n_features=50, noise=0.1, random_state=1)<\/pre>\n<p>We can then specify a <em>get_models()<\/em> function that defines a suite of regression methods.<\/p>\n<p>Scikit-learn does offer a wide range of linear regression methods, which is excellent. Not all of them may be required on your problem. I would recommend a minimum of linear regression and elastic net, the latter with a good suite of alpha and lambda parameters.<\/p>\n<p>Nevertheless, we will test the full suite of methods on this problem, including:<\/p>\n<h3>Linear Algorithms<\/h3>\n<ul>\n<li>Linear Regression<\/li>\n<li>Lasso Regression<\/li>\n<li>Ridge Regression<\/li>\n<li>Elastic Net Regression<\/li>\n<li>Huber Regression<\/li>\n<li>LARS Regression<\/li>\n<li>Lasso LARS Regression<\/li>\n<li>Passive Aggressive Regression<\/li>\n<li>RANSAC Regressor<\/li>\n<li>Stochastic Gradient Descent Regression<\/li>\n<li>Theil Regression<\/li>\n<\/ul>\n<h3>Nonlinear Algorithms<\/h3>\n<ul>\n<li>k-Nearest Neighbors<\/li>\n<li>Classification and Regression Tree<\/li>\n<li>Extra Tree<\/li>\n<li>Support Vector Regression<\/li>\n<\/ul>\n<h3>Ensemble Algorithms<\/h3>\n<ul>\n<li>AdaBoost<\/li>\n<li>Bagged Decision Trees<\/li>\n<li>Random Forest<\/li>\n<li>Extra Trees<\/li>\n<li>Gradient Boosting Machines<\/li>\n<\/ul>\n<p>The full <em>get_models()<\/em> function is listed below.<\/p>\n<pre class=\"crayon-plain-tag\"># create a dict of standard models to evaluate {name:object}\r\ndef get_models(models=dict()):\r\n\t# linear models\r\n\tmodels['lr'] = LinearRegression()\r\n\talpha = [0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]\r\n\tfor a in alpha:\r\n\t\tmodels['lasso-'+str(a)] = Lasso(alpha=a)\r\n\tfor a in alpha:\r\n\t\tmodels['ridge-'+str(a)] = Ridge(alpha=a)\r\n\tfor a1 in alpha:\r\n\t\tfor a2 in alpha:\r\n\t\t\tname = 'en-' + str(a1) + '-' + str(a2)\r\n\t\t\tmodels[name] = ElasticNet(a1, a2)\r\n\tmodels['huber'] = HuberRegressor()\r\n\tmodels['lars'] = Lars()\r\n\tmodels['llars'] = LassoLars()\r\n\tmodels['pa'] = PassiveAggressiveRegressor(max_iter=1000, tol=1e-3)\r\n\tmodels['ranscac'] = RANSACRegressor()\r\n\tmodels['sgd'] = SGDRegressor(max_iter=1000, tol=1e-3)\r\n\tmodels['theil'] = TheilSenRegressor()\r\n\t# non-linear models\r\n\tn_neighbors = range(1, 21)\r\n\tfor k in n_neighbors:\r\n\t\tmodels['knn-'+str(k)] = KNeighborsRegressor(n_neighbors=k)\r\n\tmodels['cart'] = DecisionTreeRegressor()\r\n\tmodels['extra'] = ExtraTreeRegressor()\r\n\tmodels['svml'] = SVR(kernel='linear')\r\n\tmodels['svmp'] = SVR(kernel='poly')\r\n\tc_values = [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]\r\n\tfor c in c_values:\r\n\t\tmodels['svmr'+str(c)] = SVR(C=c)\r\n\t# ensemble models\r\n\tn_trees = 100\r\n\tmodels['ada'] = AdaBoostRegressor(n_estimators=n_trees)\r\n\tmodels['bag'] = BaggingRegressor(n_estimators=n_trees)\r\n\tmodels['rf'] = RandomForestRegressor(n_estimators=n_trees)\r\n\tmodels['et'] = ExtraTreesRegressor(n_estimators=n_trees)\r\n\tmodels['gbm'] = GradientBoostingRegressor(n_estimators=n_trees)\r\n\tprint('Defined %d models' % len(models))\r\n\treturn models<\/pre>\n<p>By default, the framework uses classification accuracy as the method for evaluating model predictions.<\/p>\n<p>This does not make sense for regression, and we can change this something more meaningful for regression, such as mean squared error. We can do this by passing the <em>metric=\u2019neg_mean_squared_error\u2019<\/em> argument when calling <em>evaluate_models()<\/em> function.<\/p>\n<pre class=\"crayon-plain-tag\"># evaluate models\r\nresults = evaluate_models(models, metric='neg_mean_squared_error')<\/pre>\n<p>Note that by default scikit-learn inverts error scores so that that are maximizing instead of minimizing. This is why the mean squared error is negative and will have a negative sign when summarized. Because the score is inverted, we can continue to assume that we are maximizing scores in the <em>summarize_results()<\/em> function and do not need to specify <em>maximize=False<\/em> as we might expect when using an error metric.<\/p>\n<p>The complete code example is listed below.<\/p>\n<pre class=\"crayon-plain-tag\"># regression spot check script\r\nimport warnings\r\nfrom numpy import mean\r\nfrom numpy import std\r\nfrom matplotlib import pyplot\r\nfrom sklearn.datasets import make_regression\r\nfrom sklearn.model_selection import cross_val_score\r\nfrom sklearn.preprocessing import StandardScaler\r\nfrom sklearn.preprocessing import MinMaxScaler\r\nfrom sklearn.pipeline import Pipeline\r\nfrom sklearn.linear_model import LinearRegression\r\nfrom sklearn.linear_model import Lasso\r\nfrom sklearn.linear_model import Ridge\r\nfrom sklearn.linear_model import ElasticNet\r\nfrom sklearn.linear_model import HuberRegressor\r\nfrom sklearn.linear_model import Lars\r\nfrom sklearn.linear_model import LassoLars\r\nfrom sklearn.linear_model import PassiveAggressiveRegressor\r\nfrom sklearn.linear_model import RANSACRegressor\r\nfrom sklearn.linear_model import SGDRegressor\r\nfrom sklearn.linear_model import TheilSenRegressor\r\nfrom sklearn.neighbors import KNeighborsRegressor\r\nfrom sklearn.tree import DecisionTreeRegressor\r\nfrom sklearn.tree import ExtraTreeRegressor\r\nfrom sklearn.svm import SVR\r\nfrom sklearn.ensemble import AdaBoostRegressor\r\nfrom sklearn.ensemble import BaggingRegressor\r\nfrom sklearn.ensemble import RandomForestRegressor\r\nfrom sklearn.ensemble import ExtraTreesRegressor\r\nfrom sklearn.ensemble import GradientBoostingRegressor\r\n\r\n# load the dataset, returns X and y elements\r\ndef load_dataset():\r\n\treturn make_regression(n_samples=1000, n_features=50, noise=0.1, random_state=1)\r\n\r\n# create a dict of standard models to evaluate {name:object}\r\ndef get_models(models=dict()):\r\n\t# linear models\r\n\tmodels['lr'] = LinearRegression()\r\n\talpha = [0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]\r\n\tfor a in alpha:\r\n\t\tmodels['lasso-'+str(a)] = Lasso(alpha=a)\r\n\tfor a in alpha:\r\n\t\tmodels['ridge-'+str(a)] = Ridge(alpha=a)\r\n\tfor a1 in alpha:\r\n\t\tfor a2 in alpha:\r\n\t\t\tname = 'en-' + str(a1) + '-' + str(a2)\r\n\t\t\tmodels[name] = ElasticNet(a1, a2)\r\n\tmodels['huber'] = HuberRegressor()\r\n\tmodels['lars'] = Lars()\r\n\tmodels['llars'] = LassoLars()\r\n\tmodels['pa'] = PassiveAggressiveRegressor(max_iter=1000, tol=1e-3)\r\n\tmodels['ranscac'] = RANSACRegressor()\r\n\tmodels['sgd'] = SGDRegressor(max_iter=1000, tol=1e-3)\r\n\tmodels['theil'] = TheilSenRegressor()\r\n\t# non-linear models\r\n\tn_neighbors = range(1, 21)\r\n\tfor k in n_neighbors:\r\n\t\tmodels['knn-'+str(k)] = KNeighborsRegressor(n_neighbors=k)\r\n\tmodels['cart'] = DecisionTreeRegressor()\r\n\tmodels['extra'] = ExtraTreeRegressor()\r\n\tmodels['svml'] = SVR(kernel='linear')\r\n\tmodels['svmp'] = SVR(kernel='poly')\r\n\tc_values = [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]\r\n\tfor c in c_values:\r\n\t\tmodels['svmr'+str(c)] = SVR(C=c)\r\n\t# ensemble models\r\n\tn_trees = 100\r\n\tmodels['ada'] = AdaBoostRegressor(n_estimators=n_trees)\r\n\tmodels['bag'] = BaggingRegressor(n_estimators=n_trees)\r\n\tmodels['rf'] = RandomForestRegressor(n_estimators=n_trees)\r\n\tmodels['et'] = ExtraTreesRegressor(n_estimators=n_trees)\r\n\tmodels['gbm'] = GradientBoostingRegressor(n_estimators=n_trees)\r\n\tprint('Defined %d models' % len(models))\r\n\treturn models\r\n\r\n# create a feature preparation pipeline for a model\r\ndef make_pipeline(model):\r\n\tsteps = list()\r\n\t# standardization\r\n\tsteps.append(('standardize', StandardScaler()))\r\n\t# normalization\r\n\tsteps.append(('normalize', MinMaxScaler()))\r\n\t# the model\r\n\tsteps.append(('model', model))\r\n\t# create pipeline\r\n\tpipeline = Pipeline(steps=steps)\r\n\treturn pipeline\r\n\r\n# evaluate a single model\r\ndef evaluate_model(X, y, model, folds, metric):\r\n\t# create the pipeline\r\n\tpipeline = make_pipeline(model)\r\n\t# evaluate model\r\n\tscores = cross_val_score(pipeline, X, y, scoring=metric, cv=folds, n_jobs=-1)\r\n\treturn scores\r\n\r\n# evaluate a model and try to trap errors and and hide warnings\r\ndef robust_evaluate_model(X, y, model, folds, metric):\r\n\tscores = None\r\n\ttry:\r\n\t\twith warnings.catch_warnings():\r\n\t\t\twarnings.filterwarnings(\"ignore\")\r\n\t\t\tscores = evaluate_model(X, y, model, folds, metric)\r\n\texcept:\r\n\t\tscores = None\r\n\treturn scores\r\n\r\n# evaluate a dict of models {name:object}, returns {name:score}\r\ndef evaluate_models(X, y, models, folds=10, metric='accuracy'):\r\n\tresults = dict()\r\n\tfor name, model in models.items():\r\n\t\t# evaluate the model\r\n\t\tscores = robust_evaluate_model(X, y, model, folds, metric)\r\n\t\t# show process\r\n\t\tif scores is not None:\r\n\t\t\t# store a result\r\n\t\t\tresults[name] = scores\r\n\t\t\tmean_score, std_score = mean(scores), std(scores)\r\n\t\t\tprint('>%s: %.3f (+\/-%.3f)' % (name, mean_score, std_score))\r\n\t\telse:\r\n\t\t\tprint('>%s: error' % name)\r\n\treturn results\r\n\r\n# print and plot the top n results\r\ndef summarize_results(results, maximize=True, top_n=10):\r\n\t# check for no results\r\n\tif len(results) == 0:\r\n\t\tprint('no results')\r\n\t\treturn\r\n\t# determine how many results to summarize\r\n\tn = min(top_n, len(results))\r\n\t# create a list of (name, mean(scores)) tuples\r\n\tmean_scores = [(k,mean(v)) for k,v in results.items()]\r\n\t# sort tuples by mean score\r\n\tmean_scores = sorted(mean_scores, key=lambda x: x[1])\r\n\t# reverse for descending order (e.g. for accuracy)\r\n\tif maximize:\r\n\t\tmean_scores = list(reversed(mean_scores))\r\n\t# retrieve the top n for summarization\r\n\tnames = [x[0] for x in mean_scores[:n]]\r\n\tscores = [results[x[0]] for x in mean_scores[:n]]\r\n\t# print the top n\r\n\tprint()\r\n\tfor i in range(n):\r\n\t\tname = names[i]\r\n\t\tmean_score, std_score = mean(results[name]), std(results[name])\r\n\t\tprint('Rank=%d, Name=%s, Score=%.3f (+\/- %.3f)' % (i+1, name, mean_score, std_score))\r\n\t# boxplot for the top n\r\n\tpyplot.boxplot(scores, labels=names)\r\n\t_, labels = pyplot.xticks()\r\n\tpyplot.setp(labels, rotation=90)\r\n\tpyplot.savefig('spotcheck.png')\r\n\r\n# load dataset\r\nX, y = load_dataset()\r\n# get model list\r\nmodels = get_models()\r\n# evaluate models\r\nresults = evaluate_models(X, y, models, metric='neg_mean_squared_error')\r\n# summarize results\r\nsummarize_results(results)<\/pre>\n<p>Running the example summarizes the performance of each model evaluated, then prints the performance of the top 10 well performing algorithms.<\/p>\n<p>We can see that many of the linear algorithms perhaps found the same optimal solution on this problem. Notably those methods that performed well use regularization as a type of feature selection, allowing them to zoom in on the optimal solution.<\/p>\n<p>This would suggest the importance of feature selection when modeling this problem and that linear methods would be the area to focus, at least for now.<\/p>\n<p>Reviewing the printed scores of evaluated models also shows how poorly nonlinear and ensemble algorithms performed on this problem.<\/p>\n<pre class=\"crayon-plain-tag\">...\r\n>bag: -6118.084 (+\/-1558.433)\r\n>rf: -6127.169 (+\/-1594.392)\r\n>et: -5017.062 (+\/-1037.673)\r\n>gbm: -2347.807 (+\/-500.364)\r\n\r\nRank=1, Name=lars, Score=-0.011 (+\/- 0.001)\r\nRank=2, Name=ranscac, Score=-0.011 (+\/- 0.001)\r\nRank=3, Name=lr, Score=-0.011 (+\/- 0.001)\r\nRank=4, Name=ridge-0.0, Score=-0.011 (+\/- 0.001)\r\nRank=5, Name=en-0.0-0.1, Score=-0.011 (+\/- 0.001)\r\nRank=6, Name=en-0.0-0.8, Score=-0.011 (+\/- 0.001)\r\nRank=7, Name=en-0.0-0.2, Score=-0.011 (+\/- 0.001)\r\nRank=8, Name=en-0.0-0.7, Score=-0.011 (+\/- 0.001)\r\nRank=9, Name=en-0.0-0.0, Score=-0.011 (+\/- 0.001)\r\nRank=10, Name=en-0.0-0.3, Score=-0.011 (+\/- 0.001)<\/pre>\n<p>A box and whisker plot is created, not really adding value to the analysis of results in this case.<\/p>\n<div id=\"attachment_6193\" style=\"width: 650px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"size-full wp-image-6193\" src=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2018\/07\/Boxplot-of-top-10-Spot-Checking-Algorithms-on-a-Regression-Problem.png\" alt=\"Boxplot of top 10 Spot-Checking Algorithms on a Regression Problem\" width=\"640\" height=\"480\" srcset=\"http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2018\/07\/Boxplot-of-top-10-Spot-Checking-Algorithms-on-a-Regression-Problem.png 640w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2018\/07\/Boxplot-of-top-10-Spot-Checking-Algorithms-on-a-Regression-Problem-300x225.png 300w\" sizes=\"(max-width: 640px) 100vw, 640px\"><\/p>\n<p class=\"wp-caption-text\">Boxplot of top 10 Spot-Checking Algorithms on a Regression Problem<\/p>\n<\/div>\n<h2>5. Framework Extension<\/h2>\n<p>In this section, we explore some handy extensions of the spot check framework.<\/p>\n<h3>Course Grid Search for Gradient Boosting<\/h3>\n<p>I find myself using XGBoost and gradient boosting a lot for straight-forward classification and regression problems.<\/p>\n<p>As such, I like to use a course grid across standard configuration parameters of the method when spot checking.<\/p>\n<p>Below is a function to do this that can be used directly in the spot checking framework.<\/p>\n<pre class=\"crayon-plain-tag\"># define gradient boosting models\r\ndef define_gbm_models(models=dict(), use_xgb=True):\r\n\t# define config ranges\r\n\trates = [0.001, 0.01, 0.1]\r\n\ttrees = [50, 100]\r\n\tss = [0.5, 0.7, 1.0]\r\n\tdepth = [3, 7, 9]\r\n\t# add configurations\r\n\tfor l in rates:\r\n\t\tfor e in trees:\r\n\t\t\tfor s in ss:\r\n\t\t\t\tfor d in depth:\r\n\t\t\t\t\tcfg = [l, e, s, d]\r\n\t\t\t\t\tif use_xgb:\r\n\t\t\t\t\t\tname = 'xgb-' + str(cfg)\r\n\t\t\t\t\t\tmodels[name] = XGBClassifier(learning_rate=l, n_estimators=e, subsample=s, max_depth=d)\r\n\t\t\t\t\telse:\r\n\t\t\t\t\t\tname = 'gbm-' + str(cfg)\r\n\t\t\t\t\t\tmodels[name] = GradientBoostingClassifier(learning_rate=l, n_estimators=e, subsample=s, max_depth=d)\r\n\tprint('Defined %d models' % len(models))\r\n\treturn models<\/pre>\n<p>By default, the function will use XGBoost models, but can use the sklearn gradient boosting model if the <em>use_xgb<\/em> argument to the function is set to <em>False<\/em>.<\/p>\n<p>Again, we are not trying to optimally tune GBM on the problem, only very quickly find an area in the configuration space that may be worth investigating further.<\/p>\n<p>This function can be used directly on classification and regression problems with only a minor change from \u201c<em>XGBClassifier<\/em>\u201d to \u201c<em>XGBRegressor<\/em>\u201d and \u201c<em>GradientBoostingClassifier<\/em>\u201d to \u201c<em>GradientBoostingRegressor<\/em>\u201c. For example:<\/p>\n<pre class=\"crayon-plain-tag\"># define gradient boosting models\r\ndef get_gbm_models(models=dict(), use_xgb=True):\r\n\t# define config ranges\r\n\trates = [0.001, 0.01, 0.1]\r\n\ttrees = [50, 100]\r\n\tss = [0.5, 0.7, 1.0]\r\n\tdepth = [3, 7, 9]\r\n\t# add configurations\r\n\tfor l in rates:\r\n\t\tfor e in trees:\r\n\t\t\tfor s in ss:\r\n\t\t\t\tfor d in depth:\r\n\t\t\t\t\tcfg = [l, e, s, d]\r\n\t\t\t\t\tif use_xgb:\r\n\t\t\t\t\t\tname = 'xgb-' + str(cfg)\r\n\t\t\t\t\t\tmodels[name] = XGBRegressor(learning_rate=l, n_estimators=e, subsample=s, max_depth=d)\r\n\t\t\t\t\telse:\r\n\t\t\t\t\t\tname = 'gbm-' + str(cfg)\r\n\t\t\t\t\t\tmodels[name] = GradientBoostingXGBRegressor(learning_rate=l, n_estimators=e, subsample=s, max_depth=d)\r\n\tprint('Defined %d models' % len(models))\r\n\treturn models<\/pre>\n<p>To make this concrete, below is the binary classification example updated to also define XGBoost models.<\/p>\n<pre class=\"crayon-plain-tag\"># binary classification spot check script\r\nimport warnings\r\nfrom numpy import mean\r\nfrom numpy import std\r\nfrom matplotlib import pyplot\r\nfrom sklearn.datasets import make_classification\r\nfrom sklearn.model_selection import cross_val_score\r\nfrom sklearn.preprocessing import StandardScaler\r\nfrom sklearn.preprocessing import MinMaxScaler\r\nfrom sklearn.pipeline import Pipeline\r\nfrom sklearn.linear_model import LogisticRegression\r\nfrom sklearn.linear_model import RidgeClassifier\r\nfrom sklearn.linear_model import SGDClassifier\r\nfrom sklearn.linear_model import PassiveAggressiveClassifier\r\nfrom sklearn.neighbors import KNeighborsClassifier\r\nfrom sklearn.tree import DecisionTreeClassifier\r\nfrom sklearn.tree import ExtraTreeClassifier\r\nfrom sklearn.svm import SVC\r\nfrom sklearn.naive_bayes import GaussianNB\r\nfrom sklearn.ensemble import AdaBoostClassifier\r\nfrom sklearn.ensemble import BaggingClassifier\r\nfrom sklearn.ensemble import RandomForestClassifier\r\nfrom sklearn.ensemble import ExtraTreesClassifier\r\nfrom sklearn.ensemble import GradientBoostingClassifier\r\nfrom xgboost import XGBClassifier\r\n\r\n# load the dataset, returns X and y elements\r\ndef load_dataset():\r\n\treturn make_classification(n_samples=1000, n_classes=2, random_state=1)\r\n\r\n# create a dict of standard models to evaluate {name:object}\r\ndef define_models(models=dict()):\r\n\t# linear models\r\n\tmodels['logistic'] = LogisticRegression()\r\n\talpha = [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]\r\n\tfor a in alpha:\r\n\t\tmodels['ridge-'+str(a)] = RidgeClassifier(alpha=a)\r\n\tmodels['sgd'] = SGDClassifier(max_iter=1000, tol=1e-3)\r\n\tmodels['pa'] = PassiveAggressiveClassifier(max_iter=1000, tol=1e-3)\r\n\t# non-linear models\r\n\tn_neighbors = range(1, 21)\r\n\tfor k in n_neighbors:\r\n\t\tmodels['knn-'+str(k)] = KNeighborsClassifier(n_neighbors=k)\r\n\tmodels['cart'] = DecisionTreeClassifier()\r\n\tmodels['extra'] = ExtraTreeClassifier()\r\n\tmodels['svml'] = SVC(kernel='linear')\r\n\tmodels['svmp'] = SVC(kernel='poly')\r\n\tc_values = [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]\r\n\tfor c in c_values:\r\n\t\tmodels['svmr'+str(c)] = SVC(C=c)\r\n\tmodels['bayes'] = GaussianNB()\r\n\t# ensemble models\r\n\tn_trees = 100\r\n\tmodels['ada'] = AdaBoostClassifier(n_estimators=n_trees)\r\n\tmodels['bag'] = BaggingClassifier(n_estimators=n_trees)\r\n\tmodels['rf'] = RandomForestClassifier(n_estimators=n_trees)\r\n\tmodels['et'] = ExtraTreesClassifier(n_estimators=n_trees)\r\n\tmodels['gbm'] = GradientBoostingClassifier(n_estimators=n_trees)\r\n\tprint('Defined %d models' % len(models))\r\n\treturn models\r\n\r\n# define gradient boosting models\r\ndef define_gbm_models(models=dict(), use_xgb=True):\r\n\t# define config ranges\r\n\trates = [0.001, 0.01, 0.1]\r\n\ttrees = [50, 100]\r\n\tss = [0.5, 0.7, 1.0]\r\n\tdepth = [3, 7, 9]\r\n\t# add configurations\r\n\tfor l in rates:\r\n\t\tfor e in trees:\r\n\t\t\tfor s in ss:\r\n\t\t\t\tfor d in depth:\r\n\t\t\t\t\tcfg = [l, e, s, d]\r\n\t\t\t\t\tif use_xgb:\r\n\t\t\t\t\t\tname = 'xgb-' + str(cfg)\r\n\t\t\t\t\t\tmodels[name] = XGBClassifier(learning_rate=l, n_estimators=e, subsample=s, max_depth=d)\r\n\t\t\t\t\telse:\r\n\t\t\t\t\t\tname = 'gbm-' + str(cfg)\r\n\t\t\t\t\t\tmodels[name] = GradientBoostingClassifier(learning_rate=l, n_estimators=e, subsample=s, max_depth=d)\r\n\tprint('Defined %d models' % len(models))\r\n\treturn models\r\n\r\n# create a feature preparation pipeline for a model\r\ndef make_pipeline(model):\r\n\tsteps = list()\r\n\t# standardization\r\n\tsteps.append(('standardize', StandardScaler()))\r\n\t# normalization\r\n\tsteps.append(('normalize', MinMaxScaler()))\r\n\t# the model\r\n\tsteps.append(('model', model))\r\n\t# create pipeline\r\n\tpipeline = Pipeline(steps=steps)\r\n\treturn pipeline\r\n\r\n# evaluate a single model\r\ndef evaluate_model(X, y, model, folds, metric):\r\n\t# create the pipeline\r\n\tpipeline = make_pipeline(model)\r\n\t# evaluate model\r\n\tscores = cross_val_score(pipeline, X, y, scoring=metric, cv=folds, n_jobs=-1)\r\n\treturn scores\r\n\r\n# evaluate a model and try to trap errors and and hide warnings\r\ndef robust_evaluate_model(X, y, model, folds, metric):\r\n\tscores = None\r\n\ttry:\r\n\t\twith warnings.catch_warnings():\r\n\t\t\twarnings.filterwarnings(\"ignore\")\r\n\t\t\tscores = evaluate_model(X, y, model, folds, metric)\r\n\texcept:\r\n\t\tscores = None\r\n\treturn scores\r\n\r\n# evaluate a dict of models {name:object}, returns {name:score}\r\ndef evaluate_models(X, y, models, folds=10, metric='accuracy'):\r\n\tresults = dict()\r\n\tfor name, model in models.items():\r\n\t\t# evaluate the model\r\n\t\tscores = robust_evaluate_model(X, y, model, folds, metric)\r\n\t\t# show process\r\n\t\tif scores is not None:\r\n\t\t\t# store a result\r\n\t\t\tresults[name] = scores\r\n\t\t\tmean_score, std_score = mean(scores), std(scores)\r\n\t\t\tprint('>%s: %.3f (+\/-%.3f)' % (name, mean_score, std_score))\r\n\t\telse:\r\n\t\t\tprint('>%s: error' % name)\r\n\treturn results\r\n\r\n# print and plot the top n results\r\ndef summarize_results(results, maximize=True, top_n=10):\r\n\t# check for no results\r\n\tif len(results) == 0:\r\n\t\tprint('no results')\r\n\t\treturn\r\n\t# determine how many results to summarize\r\n\tn = min(top_n, len(results))\r\n\t# create a list of (name, mean(scores)) tuples\r\n\tmean_scores = [(k,mean(v)) for k,v in results.items()]\r\n\t# sort tuples by mean score\r\n\tmean_scores = sorted(mean_scores, key=lambda x: x[1])\r\n\t# reverse for descending order (e.g. for accuracy)\r\n\tif maximize:\r\n\t\tmean_scores = list(reversed(mean_scores))\r\n\t# retrieve the top n for summarization\r\n\tnames = [x[0] for x in mean_scores[:n]]\r\n\tscores = [results[x[0]] for x in mean_scores[:n]]\r\n\t# print the top n\r\n\tprint()\r\n\tfor i in range(n):\r\n\t\tname = names[i]\r\n\t\tmean_score, std_score = mean(results[name]), std(results[name])\r\n\t\tprint('Rank=%d, Name=%s, Score=%.3f (+\/- %.3f)' % (i+1, name, mean_score, std_score))\r\n\t# boxplot for the top n\r\n\tpyplot.boxplot(scores, labels=names)\r\n\t_, labels = pyplot.xticks()\r\n\tpyplot.setp(labels, rotation=90)\r\n\tpyplot.savefig('spotcheck.png')\r\n\r\n# load dataset\r\nX, y = load_dataset()\r\n# get model list\r\nmodels = define_models()\r\n# add gbm models\r\nmodels = define_gbm_models(models)\r\n# evaluate models\r\nresults = evaluate_models(X, y, models)\r\n# summarize results\r\nsummarize_results(results)<\/pre>\n<p>Running the example shows that indeed some XGBoost models perform well on the problem.<\/p>\n<pre class=\"crayon-plain-tag\">...\r\n>xgb-[0.1, 100, 1.0, 3]: 0.864 (+\/-0.044)\r\n>xgb-[0.1, 100, 1.0, 7]: 0.865 (+\/-0.036)\r\n>xgb-[0.1, 100, 1.0, 9]: 0.867 (+\/-0.039)\r\n\r\nRank=1, Name=xgb-[0.1, 50, 1.0, 3], Score=0.872 (+\/- 0.039)\r\nRank=2, Name=et, Score=0.869 (+\/- 0.033)\r\nRank=3, Name=xgb-[0.1, 50, 1.0, 9], Score=0.868 (+\/- 0.038)\r\nRank=4, Name=xgb-[0.1, 100, 1.0, 9], Score=0.867 (+\/- 0.039)\r\nRank=5, Name=xgb-[0.01, 50, 1.0, 3], Score=0.867 (+\/- 0.035)\r\nRank=6, Name=xgb-[0.1, 50, 1.0, 7], Score=0.867 (+\/- 0.037)\r\nRank=7, Name=xgb-[0.001, 100, 0.7, 9], Score=0.866 (+\/- 0.040)\r\nRank=8, Name=xgb-[0.01, 100, 1.0, 3], Score=0.866 (+\/- 0.037)\r\nRank=9, Name=xgb-[0.001, 100, 0.7, 3], Score=0.866 (+\/- 0.034)\r\nRank=10, Name=xgb-[0.01, 50, 0.7, 3], Score=0.866 (+\/- 0.034)<\/pre>\n<\/p>\n<div id=\"attachment_6194\" style=\"width: 650px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"size-full wp-image-6194\" src=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2018\/07\/Boxplot-of-top-10-Spot-Checking-Algorithms-on-a-Classification-Problem-with-XGBoost.png\" alt=\"Boxplot of top 10 Spot-Checking Algorithms on a Classification Problem with XGBoost\" width=\"640\" height=\"480\" srcset=\"http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2018\/07\/Boxplot-of-top-10-Spot-Checking-Algorithms-on-a-Classification-Problem-with-XGBoost.png 640w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2018\/07\/Boxplot-of-top-10-Spot-Checking-Algorithms-on-a-Classification-Problem-with-XGBoost-300x225.png 300w\" sizes=\"(max-width: 640px) 100vw, 640px\"><\/p>\n<p class=\"wp-caption-text\">Boxplot of top 10 Spot-Checking Algorithms on a Classification Problem with XGBoost<\/p>\n<\/div>\n<h3>Repeated Evaluations<\/h3>\n<p>The above results also highlight the noisy nature of the evaluations, e.g. the results of extra trees in this run are different from the run above (0.858 vs 0.869).<\/p>\n<p>We are using k-fold cross-validation to produce a population of scores, but the population is small and the calculated mean will be noisy.<\/p>\n<p>This is fine as long as we take the spot-check results as a starting point and not definitive results of an algorithm on the problem. This is hard to do; it takes discipline in the practitioner.<\/p>\n<p>Alternately, you may want to adapt the framework such that the model evaluation scheme better matches the model evaluation scheme you intend to use for your specific problem.<\/p>\n<p>For example, when evaluating stochastic algorithms like bagged or boosted decision trees, it is a good idea to run each experiment multiple times on the same train\/test sets (called repeats) in order to account for the stochastic nature of the learning algorithm.<\/p>\n<p>We can update the <em>evaluate_model()<\/em> function to repeat the evaluation of a given model n-times, with a different split of the data each time, then return all scores. For example, three repeats of 10-fold cross-validation will result in 30 scores from each to calculate a mean performance of a model.<\/p>\n<pre class=\"crayon-plain-tag\"># evaluate a single model\r\ndef evaluate_model(X, y, model, folds, repeats, metric):\r\n\t# create the pipeline\r\n\tpipeline = make_pipeline(model)\r\n\t# evaluate model\r\n\tscores = list()\r\n\t# repeat model evaluation n times\r\n\tfor _ in range(repeats):\r\n\t\t# perform run\r\n\t\tscores_r = cross_val_score(pipeline, X, y, scoring=metric, cv=folds, n_jobs=-1)\r\n\t\t# add scores to list\r\n\t\tscores += scores_r.tolist()\r\n\treturn scores<\/pre>\n<p>Alternately, you may prefer to calculate a mean score from each k-fold cross-validation run, then calculate a grand mean of all runs, as described in:<\/p>\n<ul>\n<li><a href=\"https:\/\/machinelearningmastery.com\/evaluate-skill-deep-learning-models\/\">How to Evaluate the Skill of Deep Learning Models<\/a><\/li>\n<\/ul>\n<p>We can then update the <em>robust_evaluate_model()<\/em> function to pass down the repeats argument and the <em>evaluate_models()<\/em> function to define a default, such as 3.<\/p>\n<p>A complete example of the binary classification example with three repeats of model evaluation is listed below.<\/p>\n<pre class=\"crayon-plain-tag\"># binary classification spot check script\r\nimport warnings\r\nfrom numpy import mean\r\nfrom numpy import std\r\nfrom matplotlib import pyplot\r\nfrom sklearn.datasets import make_classification\r\nfrom sklearn.model_selection import cross_val_score\r\nfrom sklearn.preprocessing import StandardScaler\r\nfrom sklearn.preprocessing import MinMaxScaler\r\nfrom sklearn.pipeline import Pipeline\r\nfrom sklearn.linear_model import LogisticRegression\r\nfrom sklearn.linear_model import RidgeClassifier\r\nfrom sklearn.linear_model import SGDClassifier\r\nfrom sklearn.linear_model import PassiveAggressiveClassifier\r\nfrom sklearn.neighbors import KNeighborsClassifier\r\nfrom sklearn.tree import DecisionTreeClassifier\r\nfrom sklearn.tree import ExtraTreeClassifier\r\nfrom sklearn.svm import SVC\r\nfrom sklearn.naive_bayes import GaussianNB\r\nfrom sklearn.ensemble import AdaBoostClassifier\r\nfrom sklearn.ensemble import BaggingClassifier\r\nfrom sklearn.ensemble import RandomForestClassifier\r\nfrom sklearn.ensemble import ExtraTreesClassifier\r\nfrom sklearn.ensemble import GradientBoostingClassifier\r\n\r\n# load the dataset, returns X and y elements\r\ndef load_dataset():\r\n\treturn make_classification(n_samples=1000, n_classes=2, random_state=1)\r\n\r\n# create a dict of standard models to evaluate {name:object}\r\ndef define_models(models=dict()):\r\n\t# linear models\r\n\tmodels['logistic'] = LogisticRegression()\r\n\talpha = [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]\r\n\tfor a in alpha:\r\n\t\tmodels['ridge-'+str(a)] = RidgeClassifier(alpha=a)\r\n\tmodels['sgd'] = SGDClassifier(max_iter=1000, tol=1e-3)\r\n\tmodels['pa'] = PassiveAggressiveClassifier(max_iter=1000, tol=1e-3)\r\n\t# non-linear models\r\n\tn_neighbors = range(1, 21)\r\n\tfor k in n_neighbors:\r\n\t\tmodels['knn-'+str(k)] = KNeighborsClassifier(n_neighbors=k)\r\n\tmodels['cart'] = DecisionTreeClassifier()\r\n\tmodels['extra'] = ExtraTreeClassifier()\r\n\tmodels['svml'] = SVC(kernel='linear')\r\n\tmodels['svmp'] = SVC(kernel='poly')\r\n\tc_values = [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]\r\n\tfor c in c_values:\r\n\t\tmodels['svmr'+str(c)] = SVC(C=c)\r\n\tmodels['bayes'] = GaussianNB()\r\n\t# ensemble models\r\n\tn_trees = 100\r\n\tmodels['ada'] = AdaBoostClassifier(n_estimators=n_trees)\r\n\tmodels['bag'] = BaggingClassifier(n_estimators=n_trees)\r\n\tmodels['rf'] = RandomForestClassifier(n_estimators=n_trees)\r\n\tmodels['et'] = ExtraTreesClassifier(n_estimators=n_trees)\r\n\tmodels['gbm'] = GradientBoostingClassifier(n_estimators=n_trees)\r\n\tprint('Defined %d models' % len(models))\r\n\treturn models\r\n\r\n# create a feature preparation pipeline for a model\r\ndef make_pipeline(model):\r\n\tsteps = list()\r\n\t# standardization\r\n\tsteps.append(('standardize', StandardScaler()))\r\n\t# normalization\r\n\tsteps.append(('normalize', MinMaxScaler()))\r\n\t# the model\r\n\tsteps.append(('model', model))\r\n\t# create pipeline\r\n\tpipeline = Pipeline(steps=steps)\r\n\treturn pipeline\r\n\r\n# evaluate a single model\r\ndef evaluate_model(X, y, model, folds, repeats, metric):\r\n\t# create the pipeline\r\n\tpipeline = make_pipeline(model)\r\n\t# evaluate model\r\n\tscores = list()\r\n\t# repeat model evaluation n times\r\n\tfor _ in range(repeats):\r\n\t\t# perform run\r\n\t\tscores_r = cross_val_score(pipeline, X, y, scoring=metric, cv=folds, n_jobs=-1)\r\n\t\t# add scores to list\r\n\t\tscores += scores_r.tolist()\r\n\treturn scores\r\n\r\n# evaluate a model and try to trap errors and hide warnings\r\ndef robust_evaluate_model(X, y, model, folds, repeats, metric):\r\n\tscores = None\r\n\ttry:\r\n\t\twith warnings.catch_warnings():\r\n\t\t\twarnings.filterwarnings(\"ignore\")\r\n\t\t\tscores = evaluate_model(X, y, model, folds, repeats, metric)\r\n\texcept:\r\n\t\tscores = None\r\n\treturn scores\r\n\r\n# evaluate a dict of models {name:object}, returns {name:score}\r\ndef evaluate_models(X, y, models, folds=10, repeats=3, metric='accuracy'):\r\n\tresults = dict()\r\n\tfor name, model in models.items():\r\n\t\t# evaluate the model\r\n\t\tscores = robust_evaluate_model(X, y, model, folds, repeats, metric)\r\n\t\t# show process\r\n\t\tif scores is not None:\r\n\t\t\t# store a result\r\n\t\t\tresults[name] = scores\r\n\t\t\tmean_score, std_score = mean(scores), std(scores)\r\n\t\t\tprint('>%s: %.3f (+\/-%.3f)' % (name, mean_score, std_score))\r\n\t\telse:\r\n\t\t\tprint('>%s: error' % name)\r\n\treturn results\r\n\r\n# print and plot the top n results\r\ndef summarize_results(results, maximize=True, top_n=10):\r\n\t# check for no results\r\n\tif len(results) == 0:\r\n\t\tprint('no results')\r\n\t\treturn\r\n\t# determine how many results to summarize\r\n\tn = min(top_n, len(results))\r\n\t# create a list of (name, mean(scores)) tuples\r\n\tmean_scores = [(k,mean(v)) for k,v in results.items()]\r\n\t# sort tuples by mean score\r\n\tmean_scores = sorted(mean_scores, key=lambda x: x[1])\r\n\t# reverse for descending order (e.g. for accuracy)\r\n\tif maximize:\r\n\t\tmean_scores = list(reversed(mean_scores))\r\n\t# retrieve the top n for summarization\r\n\tnames = [x[0] for x in mean_scores[:n]]\r\n\tscores = [results[x[0]] for x in mean_scores[:n]]\r\n\t# print the top n\r\n\tprint()\r\n\tfor i in range(n):\r\n\t\tname = names[i]\r\n\t\tmean_score, std_score = mean(results[name]), std(results[name])\r\n\t\tprint('Rank=%d, Name=%s, Score=%.3f (+\/- %.3f)' % (i+1, name, mean_score, std_score))\r\n\t# boxplot for the top n\r\n\tpyplot.boxplot(scores, labels=names)\r\n\t_, labels = pyplot.xticks()\r\n\tpyplot.setp(labels, rotation=90)\r\n\tpyplot.savefig('spotcheck.png')\r\n\r\n# load dataset\r\nX, y = load_dataset()\r\n# get model list\r\nmodels = define_models()\r\n# evaluate models\r\nresults = evaluate_models(X, y, models)\r\n# summarize results\r\nsummarize_results(results)<\/pre>\n<p>Running the example produces a more robust estimate of the scores.<\/p>\n<pre class=\"crayon-plain-tag\">...\r\n>bag: 0.861 (+\/-0.037)\r\n>rf: 0.859 (+\/-0.036)\r\n>et: 0.869 (+\/-0.035)\r\n>gbm: 0.867 (+\/-0.044)\r\n\r\nRank=1, Name=et, Score=0.869 (+\/- 0.035)\r\nRank=2, Name=gbm, Score=0.867 (+\/- 0.044)\r\nRank=3, Name=bag, Score=0.861 (+\/- 0.037)\r\nRank=4, Name=rf, Score=0.859 (+\/- 0.036)\r\nRank=5, Name=ada, Score=0.850 (+\/- 0.035)\r\nRank=6, Name=ridge-0.9, Score=0.848 (+\/- 0.038)\r\nRank=7, Name=ridge-0.8, Score=0.848 (+\/- 0.038)\r\nRank=8, Name=ridge-0.7, Score=0.848 (+\/- 0.038)\r\nRank=9, Name=ridge-0.6, Score=0.848 (+\/- 0.038)\r\nRank=10, Name=ridge-0.5, Score=0.848 (+\/- 0.038)<\/pre>\n<p>There will still be some variance in the reported means, but less than a single run of k-fold cross-validation.<\/p>\n<p>The number of repeats may be increased to further reduce this variance, at the cost of longer run times, and perhaps against the intent of spot checking.<\/p>\n<h3>Varied Input Representations<\/h3>\n<p>I am a big fan of avoiding assumptions and recommendations for data representations prior to fitting models.<\/p>\n<p>Instead, I like to also spot-check multiple representations and transforms of input data, which I refer to as views. I explain this more in the post:<\/p>\n<ul>\n<li><a href=\"https:\/\/machinelearningmastery.com\/how-to-get-the-most-from-your-machine-learning-data\/\">How to Get the Most From Your Machine Learning Data<\/a><\/li>\n<\/ul>\n<p>We can update the framework to spot-check multiple different representations for each model.<\/p>\n<p>One way to do this is to update the <em>evaluate_models()<\/em> function so that we can provide a list of <em>make_pipeline()<\/em> functions that can be used for each defined model.<\/p>\n<pre class=\"crayon-plain-tag\"># evaluate a dict of models {name:object}, returns {name:score}\r\ndef evaluate_models(X, y, models, pipe_funcs, folds=10, metric='accuracy'):\r\n\tresults = dict()\r\n\tfor name, model in models.items():\r\n\t\t# evaluate model under each preparation function\r\n\t\tfor i in range(len(pipe_funcs)):\r\n\t\t\t# evaluate the model\r\n\t\t\tscores = robust_evaluate_model(X, y, model, folds, metric, pipe_funcs[i])\r\n\t\t\t# update name\r\n\t\t\trun_name = str(i) + name\r\n\t\t\t# show process\r\n\t\t\tif scores is not None:\r\n\t\t\t\t# store a result\r\n\t\t\t\tresults[run_name] = scores\r\n\t\t\t\tmean_score, std_score = mean(scores), std(scores)\r\n\t\t\t\tprint('>%s: %.3f (+\/-%.3f)' % (run_name, mean_score, std_score))\r\n\t\t\telse:\r\n\t\t\t\tprint('>%s: error' % run_name)\r\n\treturn results<\/pre>\n<p>The chosen pipeline function can then be passed along down to the <em>robust_evaluate_model()<\/em> function and to the <em>evaluate_model()<\/em> function where it can be used.<\/p>\n<p>We can then define a bunch of different pipeline functions; for example:<\/p>\n<pre class=\"crayon-plain-tag\"># no transforms pipeline\r\ndef pipeline_none(model):\r\n\treturn model\r\n\r\n# standardize transform pipeline\r\ndef pipeline_standardize(model):\r\n\tsteps = list()\r\n\t# standardization\r\n\tsteps.append(('standardize', StandardScaler()))\r\n\t# the model\r\n\tsteps.append(('model', model))\r\n\t# create pipeline\r\n\tpipeline = Pipeline(steps=steps)\r\n\treturn pipeline\r\n\r\n# normalize transform pipeline\r\ndef pipeline_normalize(model):\r\n\tsteps = list()\r\n\t# normalization\r\n\tsteps.append(('normalize', MinMaxScaler()))\r\n\t# the model\r\n\tsteps.append(('model', model))\r\n\t# create pipeline\r\n\tpipeline = Pipeline(steps=steps)\r\n\treturn pipeline\r\n\r\n# standardize and normalize pipeline\r\ndef pipeline_std_norm(model):\r\n\tsteps = list()\r\n\t# standardization\r\n\tsteps.append(('standardize', StandardScaler()))\r\n\t# normalization\r\n\tsteps.append(('normalize', MinMaxScaler()))\r\n\t# the model\r\n\tsteps.append(('model', model))\r\n\t# create pipeline\r\n\tpipeline = Pipeline(steps=steps)\r\n\treturn pipeline<\/pre>\n<p>Then create a list of these function names that can be provided to the <em>evaluate_models()<\/em> function.<\/p>\n<pre class=\"crayon-plain-tag\"># define transform pipelines\r\npipelines = [pipeline_none, pipeline_standardize, pipeline_normalize, pipeline_std_norm]<\/pre>\n<p>The complete example of the classification case updated to spot check pipeline transforms is listed below.<\/p>\n<pre class=\"crayon-plain-tag\"># binary classification spot check script\r\nimport warnings\r\nfrom numpy import mean\r\nfrom numpy import std\r\nfrom matplotlib import pyplot\r\nfrom sklearn.datasets import make_classification\r\nfrom sklearn.model_selection import cross_val_score\r\nfrom sklearn.preprocessing import StandardScaler\r\nfrom sklearn.preprocessing import MinMaxScaler\r\nfrom sklearn.pipeline import Pipeline\r\nfrom sklearn.linear_model import LogisticRegression\r\nfrom sklearn.linear_model import RidgeClassifier\r\nfrom sklearn.linear_model import SGDClassifier\r\nfrom sklearn.linear_model import PassiveAggressiveClassifier\r\nfrom sklearn.neighbors import KNeighborsClassifier\r\nfrom sklearn.tree import DecisionTreeClassifier\r\nfrom sklearn.tree import ExtraTreeClassifier\r\nfrom sklearn.svm import SVC\r\nfrom sklearn.naive_bayes import GaussianNB\r\nfrom sklearn.ensemble import AdaBoostClassifier\r\nfrom sklearn.ensemble import BaggingClassifier\r\nfrom sklearn.ensemble import RandomForestClassifier\r\nfrom sklearn.ensemble import ExtraTreesClassifier\r\nfrom sklearn.ensemble import GradientBoostingClassifier\r\n\r\n# load the dataset, returns X and y elements\r\ndef load_dataset():\r\n\treturn make_classification(n_samples=1000, n_classes=2, random_state=1)\r\n\r\n# create a dict of standard models to evaluate {name:object}\r\ndef define_models(models=dict()):\r\n\t# linear models\r\n\tmodels['logistic'] = LogisticRegression()\r\n\talpha = [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]\r\n\tfor a in alpha:\r\n\t\tmodels['ridge-'+str(a)] = RidgeClassifier(alpha=a)\r\n\tmodels['sgd'] = SGDClassifier(max_iter=1000, tol=1e-3)\r\n\tmodels['pa'] = PassiveAggressiveClassifier(max_iter=1000, tol=1e-3)\r\n\t# non-linear models\r\n\tn_neighbors = range(1, 21)\r\n\tfor k in n_neighbors:\r\n\t\tmodels['knn-'+str(k)] = KNeighborsClassifier(n_neighbors=k)\r\n\tmodels['cart'] = DecisionTreeClassifier()\r\n\tmodels['extra'] = ExtraTreeClassifier()\r\n\tmodels['svml'] = SVC(kernel='linear')\r\n\tmodels['svmp'] = SVC(kernel='poly')\r\n\tc_values = [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]\r\n\tfor c in c_values:\r\n\t\tmodels['svmr'+str(c)] = SVC(C=c)\r\n\tmodels['bayes'] = GaussianNB()\r\n\t# ensemble models\r\n\tn_trees = 100\r\n\tmodels['ada'] = AdaBoostClassifier(n_estimators=n_trees)\r\n\tmodels['bag'] = BaggingClassifier(n_estimators=n_trees)\r\n\tmodels['rf'] = RandomForestClassifier(n_estimators=n_trees)\r\n\tmodels['et'] = ExtraTreesClassifier(n_estimators=n_trees)\r\n\tmodels['gbm'] = GradientBoostingClassifier(n_estimators=n_trees)\r\n\tprint('Defined %d models' % len(models))\r\n\treturn models\r\n\r\n# no transforms pipeline\r\ndef pipeline_none(model):\r\n\treturn model\r\n\r\n# standardize transform pipeline\r\ndef pipeline_standardize(model):\r\n\tsteps = list()\r\n\t# standardization\r\n\tsteps.append(('standardize', StandardScaler()))\r\n\t# the model\r\n\tsteps.append(('model', model))\r\n\t# create pipeline\r\n\tpipeline = Pipeline(steps=steps)\r\n\treturn pipeline\r\n\r\n# normalize transform pipeline\r\ndef pipeline_normalize(model):\r\n\tsteps = list()\r\n\t# normalization\r\n\tsteps.append(('normalize', MinMaxScaler()))\r\n\t# the model\r\n\tsteps.append(('model', model))\r\n\t# create pipeline\r\n\tpipeline = Pipeline(steps=steps)\r\n\treturn pipeline\r\n\r\n# standardize and normalize pipeline\r\ndef pipeline_std_norm(model):\r\n\tsteps = list()\r\n\t# standardization\r\n\tsteps.append(('standardize', StandardScaler()))\r\n\t# normalization\r\n\tsteps.append(('normalize', MinMaxScaler()))\r\n\t# the model\r\n\tsteps.append(('model', model))\r\n\t# create pipeline\r\n\tpipeline = Pipeline(steps=steps)\r\n\treturn pipeline\r\n\r\n# evaluate a single model\r\ndef evaluate_model(X, y, model, folds, metric, pipe_func):\r\n\t# create the pipeline\r\n\tpipeline = pipe_func(model)\r\n\t# evaluate model\r\n\tscores = cross_val_score(pipeline, X, y, scoring=metric, cv=folds, n_jobs=-1)\r\n\treturn scores\r\n\r\n# evaluate a model and try to trap errors and and hide warnings\r\ndef robust_evaluate_model(X, y, model, folds, metric, pipe_func):\r\n\tscores = None\r\n\ttry:\r\n\t\twith warnings.catch_warnings():\r\n\t\t\twarnings.filterwarnings(\"ignore\")\r\n\t\t\tscores = evaluate_model(X, y, model, folds, metric, pipe_func)\r\n\texcept:\r\n\t\tscores = None\r\n\treturn scores\r\n\r\n# evaluate a dict of models {name:object}, returns {name:score}\r\ndef evaluate_models(X, y, models, pipe_funcs, folds=10, metric='accuracy'):\r\n\tresults = dict()\r\n\tfor name, model in models.items():\r\n\t\t# evaluate model under each preparation function\r\n\t\tfor i in range(len(pipe_funcs)):\r\n\t\t\t# evaluate the model\r\n\t\t\tscores = robust_evaluate_model(X, y, model, folds, metric, pipe_funcs[i])\r\n\t\t\t# update name\r\n\t\t\trun_name = str(i) + name\r\n\t\t\t# show process\r\n\t\t\tif scores is not None:\r\n\t\t\t\t# store a result\r\n\t\t\t\tresults[run_name] = scores\r\n\t\t\t\tmean_score, std_score = mean(scores), std(scores)\r\n\t\t\t\tprint('>%s: %.3f (+\/-%.3f)' % (run_name, mean_score, std_score))\r\n\t\t\telse:\r\n\t\t\t\tprint('>%s: error' % run_name)\r\n\treturn results\r\n\r\n# print and plot the top n results\r\ndef summarize_results(results, maximize=True, top_n=10):\r\n\t# check for no results\r\n\tif len(results) == 0:\r\n\t\tprint('no results')\r\n\t\treturn\r\n\t# determine how many results to summarize\r\n\tn = min(top_n, len(results))\r\n\t# create a list of (name, mean(scores)) tuples\r\n\tmean_scores = [(k,mean(v)) for k,v in results.items()]\r\n\t# sort tuples by mean score\r\n\tmean_scores = sorted(mean_scores, key=lambda x: x[1])\r\n\t# reverse for descending order (e.g. for accuracy)\r\n\tif maximize:\r\n\t\tmean_scores = list(reversed(mean_scores))\r\n\t# retrieve the top n for summarization\r\n\tnames = [x[0] for x in mean_scores[:n]]\r\n\tscores = [results[x[0]] for x in mean_scores[:n]]\r\n\t# print the top n\r\n\tprint()\r\n\tfor i in range(n):\r\n\t\tname = names[i]\r\n\t\tmean_score, std_score = mean(results[name]), std(results[name])\r\n\t\tprint('Rank=%d, Name=%s, Score=%.3f (+\/- %.3f)' % (i+1, name, mean_score, std_score))\r\n\t# boxplot for the top n\r\n\tpyplot.boxplot(scores, labels=names)\r\n\t_, labels = pyplot.xticks()\r\n\tpyplot.setp(labels, rotation=90)\r\n\tpyplot.savefig('spotcheck.png')\r\n\r\n# load dataset\r\nX, y = load_dataset()\r\n# get model list\r\nmodels = define_models()\r\n# define transform pipelines\r\npipelines = [pipeline_none, pipeline_standardize, pipeline_normalize, pipeline_std_norm]\r\n# evaluate models\r\nresults = evaluate_models(X, y, models, pipelines)\r\n# summarize results\r\nsummarize_results(results)<\/pre>\n<p>Running the example shows that we differentiate the results for each pipeline by adding the pipeline number to the beginning of the algorithm description name, e.g. \u2018<em>0rf<\/em>\u2018 means RF with the first pipeline, which is no transforms.<\/p>\n<p>The ensembles of trees algorithms perform well on this problem, and these algorithms are invariant to data scaling. This means that their results on each pipeline will be similar (or the same) and in turn they will crowd out other algorithms in the top-10 list.<\/p>\n<pre class=\"crayon-plain-tag\">...\r\n>0gbm: 0.865 (+\/-0.044)\r\n>1gbm: 0.865 (+\/-0.044)\r\n>2gbm: 0.865 (+\/-0.044)\r\n>3gbm: 0.865 (+\/-0.044)\r\n\r\nRank=1, Name=3rf, Score=0.870 (+\/- 0.034)\r\nRank=2, Name=2rf, Score=0.870 (+\/- 0.034)\r\nRank=3, Name=1rf, Score=0.870 (+\/- 0.034)\r\nRank=4, Name=0rf, Score=0.870 (+\/- 0.034)\r\nRank=5, Name=3bag, Score=0.866 (+\/- 0.039)\r\nRank=6, Name=2bag, Score=0.866 (+\/- 0.039)\r\nRank=7, Name=1bag, Score=0.866 (+\/- 0.039)\r\nRank=8, Name=0bag, Score=0.866 (+\/- 0.039)\r\nRank=9, Name=3gbm, Score=0.865 (+\/- 0.044)\r\nRank=10, Name=2gbm, Score=0.865 (+\/- 0.044)<\/pre>\n<\/p>\n<h2>Further Reading<\/h2>\n<p>This section provides more resources on the topic if you are looking to go deeper.<\/p>\n<ul>\n<li><a href=\"https:\/\/machinelearningmastery.com\/why-you-should-be-spot-checking-algorithms-on-your-machine-learning-problems\/\">Why you should be Spot-Checking Algorithms on your Machine Learning Problems<\/a><\/li>\n<li><a href=\"https:\/\/machinelearningmastery.com\/spot-check-classification-machine-learning-algorithms-python-scikit-learn\/\">Spot-Check Classification Machine Learning Algorithms in Python with scikit-learn<\/a><\/li>\n<li><a href=\"https:\/\/machinelearningmastery.com\/spot-check-regression-machine-learning-algorithms-python-scikit-learn\/\">Spot-Check Regression Machine Learning Algorithms in Python with scikit-learn<\/a><\/li>\n<li><a href=\"https:\/\/machinelearningmastery.com\/evaluate-skill-deep-learning-models\/\">How to Evaluate the Skill of Deep Learning Models<\/a><\/li>\n<li><a href=\"https:\/\/machinelearningmastery.com\/applied-machine-learning-is-hard\/\">Why Applied Machine Learning Is Hard<\/a><\/li>\n<li><a href=\"https:\/\/machinelearningmastery.com\/applied-machine-learning-as-a-search-problem\/\">A Gentle Introduction to Applied Machine Learning as a Search Problem<\/a><\/li>\n<li><a href=\"https:\/\/machinelearningmastery.com\/how-to-get-the-most-from-your-machine-learning-data\/\">How to Get the Most From Your Machine Learning Data<\/a><\/li>\n<\/ul>\n<h2>Summary<\/h2>\n<p>In this tutorial, you discovered the usefulness of spot-checking algorithms on a new predictive modeling problem and how to develop a standard framework for spot-checking algorithms in python for classification and regression problems.<\/p>\n<p>Specifically, you learned:<\/p>\n<ul>\n<li>Spot-checking provides a way to quickly discover the types of algorithms that perform well on your predictive modeling problem.<\/li>\n<li>How to develop a generic framework for loading data, defining models, evaluating models, and summarizing results.<\/li>\n<li>How to apply the framework for classification and regression problems.<\/li>\n<\/ul>\n<p>Have you used this framework or do you have some further suggestions to improve it?<br \/>\nLet me know in the comments.<\/p>\n<p>Do you have any questions?<br \/>\nAsk your questions in the comments below and I will do my best to answer.<\/p>\n<p>The post <a rel=\"nofollow\" href=\"https:\/\/machinelearningmastery.com\/spot-check-machine-learning-algorithms-in-python\/\">How to Develop a Reusable Framework to Spot-Check Algorithms in Python<\/a> appeared first on <a rel=\"nofollow\" href=\"https:\/\/machinelearningmastery.com\/\">Machine Learning Mastery<\/a>.<\/p>\n<\/div>\n<p><a href=\"https:\/\/machinelearningmastery.com\/spot-check-machine-learning-algorithms-in-python\/\">Go to Source<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Author: Jason Brownlee Spot-checking algorithms is a technique in applied machine learning designed to quickly and objectively provide a first set of results on a [&hellip;] <span class=\"read-more-link\"><a class=\"read-more\" href=\"https:\/\/www.aiproblog.com\/index.php\/2018\/09\/13\/how-to-develop-a-reusable-framework-to-spot-check-algorithms-in-python\/\">Read More<\/a><\/span><\/p>\n","protected":false},"author":1,"featured_media":1046,"comment_status":"registered_only","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_bbp_topic_count":0,"_bbp_reply_count":0,"_bbp_total_topic_count":0,"_bbp_total_reply_count":0,"_bbp_voice_count":0,"_bbp_anonymous_reply_count":0,"_bbp_topic_count_hidden":0,"_bbp_reply_count_hidden":0,"_bbp_forum_subforum_count":0,"footnotes":""},"categories":[24],"tags":[],"_links":{"self":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/1045"}],"collection":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/comments?post=1045"}],"version-history":[{"count":0,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/1045\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media\/1046"}],"wp:attachment":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media?parent=1045"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/categories?post=1045"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/tags?post=1045"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}