{"id":3926,"date":"2020-10-01T19:00:05","date_gmt":"2020-10-01T19:00:05","guid":{"rendered":"https:\/\/www.aiproblog.com\/index.php\/2020\/10\/01\/gaussian-processes-for-classification-with-python\/"},"modified":"2020-10-01T19:00:05","modified_gmt":"2020-10-01T19:00:05","slug":"gaussian-processes-for-classification-with-python","status":"publish","type":"post","link":"https:\/\/www.aiproblog.com\/index.php\/2020\/10\/01\/gaussian-processes-for-classification-with-python\/","title":{"rendered":"Gaussian Processes for Classification With Python"},"content":{"rendered":"<p>Author: Jason Brownlee<\/p>\n<div>\n<p>The <strong>Gaussian Processes Classifier<\/strong> is a classification machine learning algorithm.<\/p>\n<p>Gaussian Processes are a generalization of the Gaussian probability distribution and can be used as the basis for sophisticated non-parametric machine learning algorithms for classification and regression.<\/p>\n<p>They are a type of kernel model, like SVMs, and unlike SVMs, they are capable of predicting highly calibrated class membership probabilities, although the choice and configuration of the kernel used at the heart of the method can be challenging.<\/p>\n<p>In this tutorial, you will discover the Gaussian Processes Classifier classification machine learning algorithm.<\/p>\n<p>After completing this tutorial, you will know:<\/p>\n<ul>\n<li>The Gaussian Processes Classifier is a non-parametric algorithm that can be applied to binary classification tasks.<\/li>\n<li>How to fit, evaluate, and make predictions with the Gaussian Processes Classifier model with Scikit-Learn.<\/li>\n<li>How to tune the hyperparameters of the Gaussian Processes Classifier algorithm on a given dataset.<\/li>\n<\/ul>\n<p>Let&rsquo;s get started.<\/p>\n<div id=\"attachment_10690\" style=\"width: 810px\" class=\"wp-caption aligncenter\"><img decoding=\"async\" aria-describedby=\"caption-attachment-10690\" loading=\"lazy\" class=\"size-full wp-image-10690\" src=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2020\/08\/Gaussian-Processes-for-Classification-With-Python.jpg\" alt=\"Gaussian Processes for Classification With Python\" width=\"800\" height=\"533\" srcset=\"http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2020\/08\/Gaussian-Processes-for-Classification-With-Python.jpg 800w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2020\/08\/Gaussian-Processes-for-Classification-With-Python-300x200.jpg 300w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2020\/08\/Gaussian-Processes-for-Classification-With-Python-768x512.jpg 768w\" sizes=\"(max-width: 800px) 100vw, 800px\"><\/p>\n<p id=\"caption-attachment-10690\" class=\"wp-caption-text\">Gaussian Processes for Classification With Python<br \/>Photo by <a href=\"https:\/\/flickr.com\/photos\/67415843@N05\/37583657721\/\">Mark Kao<\/a>, some rights reserved.<\/p>\n<\/div>\n<h2>Tutorial Overview<\/h2>\n<p>This tutorial is divided into three parts; they are:<\/p>\n<ol>\n<li>Gaussian Processes for Classification<\/li>\n<li>Gaussian Processes With Scikit-Learn<\/li>\n<li>Tune Gaussian Processes Hyperparameters<\/li>\n<\/ol>\n<h2>Gaussian Processes for Classification<\/h2>\n<p>Gaussian Processes, or GP for short, are a generalization of the <a href=\"https:\/\/machinelearningmastery.com\/continuous-probability-distributions-for-machine-learning\/\">Gaussian probability distribution<\/a> (e.g. the bell-shaped function).<\/p>\n<p>Gaussian probability distribution functions summarize the distribution of random variables, whereas Gaussian processes summarize the properties of the functions, e.g. the parameters of the functions. As such, you can think of Gaussian processes as one level of abstraction or indirection above Gaussian functions.<\/p>\n<blockquote>\n<p>A Gaussian process is a generalization of the Gaussian probability distribution. Whereas a probability distribution describes random variables which are scalars or vectors (for multivariate distributions), a stochastic process governs the properties of functions.<\/p>\n<\/blockquote>\n<p>&mdash; Page 2, <a href=\"https:\/\/amzn.to\/3aY1nsu\">Gaussian Processes for Machine Learning<\/a>, 2006.<\/p>\n<p>Gaussian processes can be used as a machine learning algorithm for classification predictive modeling.<\/p>\n<p>Gaussian processes are a type of kernel method, like SVMs, although they are able to predict highly calibrated probabilities, unlike SVMs.<\/p>\n<p>Gaussian processes require specifying a kernel that controls how examples relate to each other; specifically, it defines the covariance function of the data. This is called the latent function or the &ldquo;<em>nuisance<\/em>&rdquo; function.<\/p>\n<blockquote>\n<p>The latent function f plays the role of a nuisance function: we do not observe values of f itself (we observe only the inputs X and the class labels y) and we are not particularly interested in the values of f &hellip;<\/p>\n<\/blockquote>\n<p>&mdash; Page 40, <a href=\"https:\/\/amzn.to\/3aY1nsu\">Gaussian Processes for Machine Learning<\/a>, 2006.<\/p>\n<p>The way that examples are grouped using the kernel controls how the model &ldquo;<em>perceives<\/em>&rdquo; the examples, given that it assumes that examples that are &ldquo;<em>close<\/em>&rdquo; to each other have the same class label.<\/p>\n<p>Therefore, it is important to both test different kernel functions for the model and different configurations for sophisticated kernel functions.<\/p>\n<blockquote>\n<p>&hellip; a covariance function is the crucial ingredient in a Gaussian process predictor, as it encodes our assumptions about the function which we wish to learn.<\/p>\n<\/blockquote>\n<p>&mdash; Page 79, <a href=\"https:\/\/amzn.to\/3aY1nsu\">Gaussian Processes for Machine Learning<\/a>, 2006.<\/p>\n<p>It also requires a link function that interprets the internal representation and predicts the probability of class membership. The logistic function can be used, allowing the modeling of a <a href=\"https:\/\/machinelearningmastery.com\/discrete-probability-distributions-for-machine-learning\/\">Binomial probability distribution<\/a> for binary classification.<\/p>\n<blockquote>\n<p>For the binary discriminative case one simple idea is to turn the output of a regression model into a class probability using a response function (the inverse of a link function), which &ldquo;squashes&rdquo; its argument, which can lie in the domain (&minus;inf, inf), into the range [0, 1], guaranteeing a valid probabilistic interpretation.<\/p>\n<\/blockquote>\n<p>&mdash; Page 35, <a href=\"https:\/\/amzn.to\/3aY1nsu\">Gaussian Processes for Machine Learning<\/a>, 2006.<\/p>\n<p>Gaussian processes and Gaussian processes for classification is a complex topic.<\/p>\n<p>To learn more see the text:<\/p>\n<ul>\n<li><a href=\"https:\/\/amzn.to\/3aY1nsu\">Gaussian Processes for Machine Learning<\/a>, 2006.<\/li>\n<\/ul>\n<h2>Gaussian Processes With Scikit-Learn<\/h2>\n<p>The Gaussian Processes Classifier is available in the scikit-learn Python machine learning library via the <a href=\"https:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.gaussian_process.GaussianProcessClassifier.html\">GaussianProcessClassifier class<\/a>.<\/p>\n<p>The class allows you to specify the kernel to use via the &ldquo;<em>kernel<\/em>&rdquo; argument and defaults to 1 * RBF(1.0), e.g. a RBF kernel.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">...\r\n# define model\r\nmodel = GaussianProcessClassifier(kernel=1*RBF(1.0))<\/pre>\n<p>Given that a kernel is specified, the model will attempt to best configure the kernel for the training dataset.<\/p>\n<p>This is controlled via setting an &ldquo;<em>optimizer<\/em>&ldquo;, the number of iterations for the optimizer via the &ldquo;<em>max_iter_predict<\/em>&ldquo;, and the number of repeats of this optimization process performed in an attempt to overcome local optima &ldquo;<em>n_restarts_optimizer<\/em>&ldquo;.<\/p>\n<p>By default, a single optimization run is performed, and this can be turned off by setting &ldquo;<em>optimize<\/em>&rdquo; to <em>None<\/em>.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">...\r\n# define model\r\nmodel = GaussianProcessClassifier(optimizer=None)<\/pre>\n<p>We can demonstrate the Gaussian Processes Classifier with a worked example.<\/p>\n<p>First, let&rsquo;s define a synthetic classification dataset.<\/p>\n<p>We will use the <a href=\"https:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.datasets.make_classification.html\">make_classification() function<\/a> to create a dataset with 100 examples, each with 20 input variables.<\/p>\n<p>The example below creates and summarizes the dataset.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\"># test classification dataset\r\nfrom sklearn.datasets import make_classification\r\n# define dataset\r\nX, y = make_classification(n_samples=100, n_features=20, n_informative=15, n_redundant=5, random_state=1)\r\n# summarize the dataset\r\nprint(X.shape, y.shape)<\/pre>\n<p>Running the example creates the dataset and confirms the number of rows and columns of the dataset.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">(100, 20) (100,)<\/pre>\n<p>We can fit and evaluate a Gaussian Processes Classifier model using repeated stratified k-fold cross-validation via the <a href=\"https:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.model_selection.RepeatedStratifiedKFold.html\">RepeatedStratifiedKFold class<\/a>. We will use 10 folds and three repeats in the test harness.<\/p>\n<p>We will use the default configuration.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">...\r\n# create the model\r\nmodel = GaussianProcessClassifier()<\/pre>\n<p>The complete example of evaluating the Gaussian Processes Classifier model for the synthetic binary classification task is listed below.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\"># evaluate a gaussian process classifier model on the dataset\r\nfrom numpy import mean\r\nfrom numpy import std\r\nfrom sklearn.datasets import make_classification\r\nfrom sklearn.model_selection import cross_val_score\r\nfrom sklearn.model_selection import RepeatedStratifiedKFold\r\nfrom sklearn.gaussian_process import GaussianProcessClassifier\r\n# define dataset\r\nX, y = make_classification(n_samples=100, n_features=20, n_informative=15, n_redundant=5, random_state=1)\r\n# define model\r\nmodel = GaussianProcessClassifier()\r\n# define model evaluation method\r\ncv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1)\r\n# evaluate model\r\nscores = cross_val_score(model, X, y, scoring='accuracy', cv=cv, n_jobs=-1)\r\n# summarize result\r\nprint('Mean Accuracy: %.3f (%.3f)' % (mean(scores), std(scores)))<\/pre>\n<p>Running the example evaluates the Gaussian Processes Classifier algorithm on the synthetic dataset and reports the average accuracy across the three repeats of 10-fold cross-validation.<\/p>\n<p>Your specific results may vary given the stochastic nature of the learning algorithm. Consider running the example a few times.<\/p>\n<p>In this case, we can see that the model achieved a mean accuracy of about 79.0 percent.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">Mean Accuracy: 0.790 (0.101)<\/pre>\n<p>We may decide to use the Gaussian Processes Classifier as our final model and make predictions on new data.<\/p>\n<p>This can be achieved by fitting the model pipeline on all available data and calling the <em>predict()<\/em> function passing in a new row of data.<\/p>\n<p>We can demonstrate this with a complete example listed below.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\"># make a prediction with a gaussian process classifier model on the dataset\r\nfrom sklearn.datasets import make_classification\r\nfrom sklearn.gaussian_process import GaussianProcessClassifier\r\n# define dataset\r\nX, y = make_classification(n_samples=100, n_features=20, n_informative=15, n_redundant=5, random_state=1)\r\n# define model\r\nmodel = GaussianProcessClassifier()\r\n# fit model\r\nmodel.fit(X, y)\r\n# define new data\r\nrow = [2.47475454,0.40165523,1.68081787,2.88940715,0.91704519,-3.07950644,4.39961206,0.72464273,-4.86563631,-6.06338084,-1.22209949,-0.4699618,1.01222748,-0.6899355,-0.53000581,6.86966784,-3.27211075,-6.59044146,-2.21290585,-3.139579]\r\n# make a prediction\r\nyhat = model.predict([row])\r\n# summarize prediction\r\nprint('Predicted Class: %d' % yhat)<\/pre>\n<p>Running the example fits the model and makes a class label prediction for a new row of data.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">Predicted Class: 0<\/pre>\n<p>Next, we can look at configuring the model hyperparameters.<\/p>\n<h2>Tune Gaussian Processes Hyperparameters<\/h2>\n<p>The hyperparameters for the Gaussian Processes Classifier method must be configured for your specific dataset.<\/p>\n<p>Perhaps the most important hyperparameter is the kernel controlled via the &ldquo;<em>kernel<\/em>&rdquo; argument. The scikit-learn library provides many built-in kernels that can be used.<\/p>\n<p>Perhaps some of the more common examples include:<\/p>\n<ul>\n<li>RBF<\/li>\n<li>DotProduct<\/li>\n<li>Matern<\/li>\n<li>RationalQuadratic<\/li>\n<li>WhiteKernel<\/li>\n<\/ul>\n<p>You can learn more about the kernels offered by the library here:<\/p>\n<ul>\n<li><a href=\"https:\/\/scikit-learn.org\/stable\/modules\/gaussian_process.html#kernels-for-gaussian-processes\">Kernels for Gaussian Processes, Scikit-Learn User Guide<\/a>.<\/li>\n<\/ul>\n<p>We will evaluate the performance of the Gaussian Processes Classifier with each of these common kernels, using default arguments.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">...\r\n# define grid\r\ngrid = dict()\r\ngrid['kernel'] = [1*RBF(), 1*DotProduct(), 1*Matern(), 1*RationalQuadratic(), 1*WhiteKernel()]<\/pre>\n<p>The example below demonstrates this using the <a href=\"https:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.model_selection.GridSearchCV.html\">GridSearchCV class<\/a> with a grid of values we have defined.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\"># grid search kernel for gaussian process classifier\r\nfrom sklearn.datasets import make_classification\r\nfrom sklearn.model_selection import GridSearchCV\r\nfrom sklearn.model_selection import RepeatedStratifiedKFold\r\nfrom sklearn.gaussian_process import GaussianProcessClassifier\r\nfrom sklearn.gaussian_process.kernels import RBF\r\nfrom sklearn.gaussian_process.kernels import DotProduct\r\nfrom sklearn.gaussian_process.kernels import Matern\r\nfrom sklearn.gaussian_process.kernels import RationalQuadratic\r\nfrom sklearn.gaussian_process.kernels import WhiteKernel\r\n# define dataset\r\nX, y = make_classification(n_samples=100, n_features=20, n_informative=15, n_redundant=5, random_state=1)\r\n# define model\r\nmodel = GaussianProcessClassifier()\r\n# define model evaluation method\r\ncv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1)\r\n# define grid\r\ngrid = dict()\r\ngrid['kernel'] = [1*RBF(), 1*DotProduct(), 1*Matern(),  1*RationalQuadratic(), 1*WhiteKernel()]\r\n# define search\r\nsearch = GridSearchCV(model, grid, scoring='accuracy', cv=cv, n_jobs=-1)\r\n# perform the search\r\nresults = search.fit(X, y)\r\n# summarize best\r\nprint('Best Mean Accuracy: %.3f' % results.best_score_)\r\nprint('Best Config: %s' % results.best_params_)\r\n# summarize all\r\nmeans = results.cv_results_['mean_test_score']\r\nparams = results.cv_results_['params']\r\nfor mean, param in zip(means, params):\r\n    print(\"&gt;%.3f with: %r\" % (mean, param))<\/pre>\n<p>Running the example will evaluate each combination of configurations using repeated cross-validation.<\/p>\n<p>Your specific results may vary given the stochastic nature of the learning algorithm. Try running the example a few times.<\/p>\n<p>In this case, we can see that the <em>RationalQuadratic<\/em> kernel achieved a lift in performance with an accuracy of about 91.3 percent as compared to 79.0 percent achieved with the RBF kernel in the previous section.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">Best Mean Accuracy: 0.913\r\nBest Config: {'kernel': 1**2 * RationalQuadratic(alpha=1, length_scale=1)}\r\n&gt;0.790 with: {'kernel': 1**2 * RBF(length_scale=1)}\r\n&gt;0.800 with: {'kernel': 1**2 * DotProduct(sigma_0=1)}\r\n&gt;0.830 with: {'kernel': 1**2 * Matern(length_scale=1, nu=1.5)}\r\n&gt;0.913 with: {'kernel': 1**2 * RationalQuadratic(alpha=1, length_scale=1)}\r\n&gt;0.510 with: {'kernel': 1**2 * WhiteKernel(noise_level=1)}<\/pre>\n<\/p>\n<h2>Further Reading<\/h2>\n<p>This section provides more resources on the topic if you are looking to go deeper.<\/p>\n<h3>Books<\/h3>\n<ul>\n<li><a href=\"https:\/\/amzn.to\/3aY1nsu\">Gaussian Processes for Machine Learning<\/a>, 2006.<\/li>\n<li><a href=\"http:\/\/www.gaussianprocess.org\/gpml\/\">Gaussian Processes for Machine Learning, Homepage<\/a>.<\/li>\n<li><a href=\"https:\/\/amzn.to\/2V8wc6Y\">Machine Learning: A Probabilistic Perspective<\/a>, 2012.<\/li>\n<li><a href=\"https:\/\/amzn.to\/34qHQOW\">Pattern Recognition and Machine Learning<\/a>, 2006.<\/li>\n<\/ul>\n<h3>APIs<\/h3>\n<ul>\n<li><a href=\"https:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.gaussian_process.GaussianProcessClassifier.html\">sklearn.gaussian_process.GaussianProcessClassifier API<\/a>.<\/li>\n<li><a href=\"https:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.gaussian_process.GaussianProcessRegressor.html\">sklearn.gaussian_process.GaussianProcessRegressor API<\/a>.<\/li>\n<li><a href=\"https:\/\/scikit-learn.org\/stable\/modules\/gaussian_process.html\">Gaussian Processes, Scikit-Learn User Guide<\/a>.<\/li>\n<li><a href=\"https:\/\/scikit-learn.org\/stable\/modules\/classes.html#module-sklearn.gaussian_process\">Gaussian Process Kernels API<\/a>.<\/li>\n<\/ul>\n<h3>Articles<\/h3>\n<ul>\n<li><a href=\"https:\/\/en.wikipedia.org\/wiki\/Gaussian_process\">Gaussian process, Wikipedia<\/a>.<\/li>\n<\/ul>\n<h2>Summary<\/h2>\n<p>In this tutorial, you discovered the Gaussian Processes Classifier classification machine learning algorithm.<\/p>\n<p>Specifically, you learned:<\/p>\n<ul>\n<li>The Gaussian Processes Classifier is a non-parametric algorithm that can be applied to binary classification tasks.<\/li>\n<li>How to fit, evaluate, and make predictions with the Gaussian Processes Classifier model with Scikit-Learn.<\/li>\n<li>How to tune the hyperparameters of the Gaussian Processes Classifier algorithm on a given dataset.<\/li>\n<\/ul>\n<p><strong>Do you have any questions?<\/strong><br \/>\nAsk your questions in the comments below and I will do my best to answer.<\/p>\n<p>The post <a rel=\"nofollow\" href=\"https:\/\/machinelearningmastery.com\/gaussian-processes-for-classification-with-python\/\">Gaussian Processes for Classification With Python<\/a> appeared first on <a rel=\"nofollow\" href=\"https:\/\/machinelearningmastery.com\/\">Machine Learning Mastery<\/a>.<\/p>\n<\/div>\n<p><a href=\"https:\/\/machinelearningmastery.com\/gaussian-processes-for-classification-with-python\/\">Go to Source<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Author: Jason Brownlee The Gaussian Processes Classifier is a classification machine learning algorithm. Gaussian Processes are a generalization of the Gaussian probability distribution and can [&hellip;] <span class=\"read-more-link\"><a class=\"read-more\" href=\"https:\/\/www.aiproblog.com\/index.php\/2020\/10\/01\/gaussian-processes-for-classification-with-python\/\">Read More<\/a><\/span><\/p>\n","protected":false},"author":1,"featured_media":3927,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_bbp_topic_count":0,"_bbp_reply_count":0,"_bbp_total_topic_count":0,"_bbp_total_reply_count":0,"_bbp_voice_count":0,"_bbp_anonymous_reply_count":0,"_bbp_topic_count_hidden":0,"_bbp_reply_count_hidden":0,"_bbp_forum_subforum_count":0,"footnotes":""},"categories":[24],"tags":[],"_links":{"self":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/3926"}],"collection":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/comments?post=3926"}],"version-history":[{"count":0,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/3926\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media\/3927"}],"wp:attachment":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media?parent=3926"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/categories?post=3926"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/tags?post=3926"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}