{"id":2816,"date":"2019-11-14T18:00:21","date_gmt":"2019-11-14T18:00:21","guid":{"rendered":"https:\/\/www.aiproblog.com\/index.php\/2019\/11\/14\/how-to-connect-model-input-data-with-predictions-for-machine-learning\/"},"modified":"2019-11-14T18:00:21","modified_gmt":"2019-11-14T18:00:21","slug":"how-to-connect-model-input-data-with-predictions-for-machine-learning","status":"publish","type":"post","link":"https:\/\/www.aiproblog.com\/index.php\/2019\/11\/14\/how-to-connect-model-input-data-with-predictions-for-machine-learning\/","title":{"rendered":"How to Connect Model Input Data With Predictions for Machine Learning"},"content":{"rendered":"<p>Author: Jason Brownlee<\/p>\n<div>\n<p>Fitting a model to a training dataset is so easy today with libraries like scikit-learn.<\/p>\n<p>A model can be fit and evaluated on a dataset in just a few lines of code. It is so easy that it has become a problem.<\/p>\n<p>The same few lines of code are repeated again and again and it may not be obvious how to actually use the model to make a prediction. Or, if a prediction is made, how to relate the predicted values to the actual input values.<\/p>\n<p>I know that this is the case because I get many emails with the question:<\/p>\n<blockquote>\n<p><strong>How do I connect the predicted values with the input data?<\/strong><\/p>\n<\/blockquote>\n<p>This a common problem.<\/p>\n<p>In this tutorial, you will discover how to relate the predicted values with the inputs to a machine learning model.<\/p>\n<p>After completing this tutorial, you will know:<\/p>\n<ul>\n<li>How to fit and evaluate the model on a training dataset.<\/li>\n<li>How to use the fit model to make predictions one at a time and in batches.<\/li>\n<li>How to connect the predicted values with the inputs to the model.<\/li>\n<\/ul>\n<p>Let\u2019s get started.<\/p>\n<div id=\"attachment_9012\" style=\"width: 650px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-9012\" class=\"size-full wp-image-9012\" src=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2019\/11\/How-to-Connect-Model-Input-Data-With-Predictions-for-Machine-Learning.jpg\" alt=\"How to Connect Model Input Data With Predictions for Machine Learning\" width=\"640\" height=\"360\" srcset=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2019\/11\/How-to-Connect-Model-Input-Data-With-Predictions-for-Machine-Learning.jpg 640w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2019\/11\/How-to-Connect-Model-Input-Data-With-Predictions-for-Machine-Learning-300x169.jpg 300w\" sizes=\"(max-width: 640px) 100vw, 640px\"><\/p>\n<p id=\"caption-attachment-9012\" class=\"wp-caption-text\">How to Connect Model Input Data With Predictions for Machine Learning<br \/>Photo by <a href=\"https:\/\/www.flickr.com\/photos\/ian-arlett\/30798942798\/\">Ian D. Keating<\/a>, some rights reserved.<\/p>\n<\/div>\n<h2>Tutorial Overview<\/h2>\n<p>This tutorial is divided into three parts; they are:<\/p>\n<ol>\n<li>Prepare a Training Dataset<\/li>\n<li>How to Fit a Model on the Training Dataset<\/li>\n<li>How to Connect Predictions With Inputs to the Model<\/li>\n<\/ol>\n<h2>Prepare a Training Dataset<\/h2>\n<p>Let\u2019s start off by defining a dataset that we can use with our model.<\/p>\n<p>You may have your own dataset in a CSV file or in a NumPy array in memory.<\/p>\n<p>In this case, we will use a simple two-class or binary classification problem with two numerical input variables.<\/p>\n<ul>\n<li><strong>Inputs<\/strong>: Two numerical input variables:<\/li>\n<li><strong>Outputs<\/strong>: A class label as either a 0 or 1.<\/li>\n<\/ul>\n<p>We can use the <a href=\"https:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.datasets.make_blobs.html\">make_blobs() scikit-learn function<\/a> to create this dataset with 1,000 examples.<\/p>\n<p>The example below creates the dataset with separate arrays for the input (<em>X<\/em>) and outputs (<em>y<\/em>).<\/p>\n<pre class=\"crayon-plain-tag\"># example of creating a test dataset\r\nfrom sklearn.datasets.samples_generator import make_blobs\r\n# create the inputs and outputs\r\nX, y = make_blobs(n_samples=1000, centers=2, n_features=2, random_state=2)\r\n# summarize the shape of the arrays\r\nprint(X.shape, y.shape)<\/pre>\n<p>Running the example creates the dataset and prints the shape of each of the arrays.<\/p>\n<p>We can see that there are 1,000 rows for the 1,000 samples in the dataset. We can also see that the input data has two columns for the two input variables and that the output array is one long array of class labels for each of the rows in the input data.<\/p>\n<pre class=\"crayon-plain-tag\">(1000, 2) (1000,)<\/pre>\n<p>Next, we will fit a model on this training dataset.<\/p>\n<h2>How to Fit a Model on the Training Dataset<\/h2>\n<p>Now that we have a training dataset, we can fit a model on the data.<\/p>\n<p>This means that we will provide all of the training data to a learning algorithm and let the learning algorithm to discover the mapping between the inputs and the output class label that minimizes the prediction error.<\/p>\n<p>In this case, because it is a two-class problem, we will try the logistic regression classification algorithm.<\/p>\n<p>This can be achieved using the <a href=\"https:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.linear_model.LogisticRegression.html\">LogisticRegression class<\/a> from scikit-learn.<\/p>\n<p>First, the model must be defined with any specific configuration we require. In this case, we will use the efficient \u2018<em>lbfgs<\/em>\u2018 solver.<\/p>\n<p>Next, the model is fit on the training dataset by calling the <em>fit()<\/em> function and passing in the training dataset.<\/p>\n<p>Finally, we can evaluate the model by first using it to make predictions on the training dataset by calling <em>predict()<\/em> and then comparing the predictions to the expected class labels and calculating the accuracy.<\/p>\n<p>The complete example is listed below.<\/p>\n<pre class=\"crayon-plain-tag\"># fit a logistic regression on the training dataset\r\nfrom sklearn.linear_model import LogisticRegression\r\nfrom sklearn.datasets.samples_generator import make_blobs\r\nfrom sklearn.metrics import accuracy_score\r\n# create the inputs and outputs\r\nX, y = make_blobs(n_samples=1000, centers=2, n_features=2, random_state=2)\r\n# define model\r\nmodel = LogisticRegression(solver='lbfgs')\r\n# fit model\r\nmodel.fit(X, y)\r\n# make predictions\r\nyhat = model.predict(X)\r\n# evaluate predictions\r\nacc = accuracy_score(y, yhat)\r\nprint(acc)<\/pre>\n<p>Running the example fits the model on the training dataset and then prints the classification accuracy.<\/p>\n<p>In this case, we can see that the model has a 100% classification accuracy on the training dataset.<\/p>\n<pre class=\"crayon-plain-tag\">1.0<\/pre>\n<p>Now that we know how to fit and evaluate a model on the training dataset, let\u2019s get to the root of the question.<\/p>\n<p><em>How do you connect inputs of the model to the outputs?<\/em><\/p>\n<h2>How to Connect Predictions With Inputs to the Model<\/h2>\n<p>A fit machine learning model takes inputs and makes a prediction.<\/p>\n<p>This could be one row of data at a time; for example:<\/p>\n<ul>\n<li><strong>Input<\/strong>: 2.12309797 -1.41131072<\/li>\n<li><strong>Output<\/strong>: 1<\/li>\n<\/ul>\n<p>This is straightforward with our model.<\/p>\n<p>For example, we can make a prediction with an array input and get one output and we know that the two are directly connected.<\/p>\n<p>The input must be defined as an array of numbers, specifically 1 row with 2 columns. We can achieve this by defining the example as a list of rows with a list of columns for each row; for example:<\/p>\n<pre class=\"crayon-plain-tag\">...\r\n# define input\r\nnew_input = [[2.12309797, -1.41131072]]<\/pre>\n<p>We can then provide this as input to the model and make a prediction.<\/p>\n<pre class=\"crayon-plain-tag\">...\r\n# get prediction for new input\r\nnew_output = model.predict(new_input)<\/pre>\n<p>Tying this together with fitting the model from the previous section, the complete example is listed below.<\/p>\n<pre class=\"crayon-plain-tag\"># make a single prediction with the model\r\nfrom sklearn.linear_model import LogisticRegression\r\nfrom sklearn.datasets.samples_generator import make_blobs\r\n# create the inputs and outputs\r\nX, y = make_blobs(n_samples=1000, centers=2, n_features=2, random_state=2)\r\n# define model\r\nmodel = LogisticRegression(solver='lbfgs')\r\n# fit model\r\nmodel.fit(X, y)\r\n# define input\r\nnew_input = [[2.12309797, -1.41131072]]\r\n# get prediction for new input\r\nnew_output = model.predict(new_input)\r\n# summarize input and output\r\nprint(new_input, new_output)<\/pre>\n<p>Running the example defines the new input and makes a prediction, then prints both the input and the output.<\/p>\n<p>We can see that in this case, the model predicts class label 1 for the inputs.<\/p>\n<pre class=\"crayon-plain-tag\">[[2.12309797, -1.41131072]] [1]<\/pre>\n<p>If we were using the model in our own application, this usage of the model would allow us to directly relate the inputs and outputs for each prediction made.<\/p>\n<p>If we needed to replace the labels 0 and 1 with something meaningful like \u201c<em>spam<\/em>\u201d and \u201c<em>not spam<\/em>\u201c, we could do that with a simple if-statement.<\/p>\n<p>So far so good.<\/p>\n<p><em><strong>What happens when the model is used to make multiple predictions at once?<\/strong><\/em><\/p>\n<p>That is, how do we relate the predictions to the inputs when multiple rows or multiple samples are provided to the model at once?<\/p>\n<p>For example, we could make a prediction for each of the 1,000 examples in the training dataset as we did in the previous section when evaluating the model. In this case, the model would make 1,000 distinct predictions and return an array of 1,000 integer values. One prediction for each of the 1,000 input rows of data.<\/p>\n<p>Importantly, the order of the predictions in the output array matches the order of rows provided as input to the model when making a prediction. This means that the input row at index 0 matches the prediction at index 0; the same is true for index 1, index 2, all the way to index 999.<\/p>\n<p>Therefore, we can relate the inputs and outputs directly based on their index, with the knowledge that the order is preserved when making a prediction on many rows of inputs.<\/p>\n<p>Let\u2019s make this concrete with an example.<\/p>\n<p>First, we can make a prediction for each row of input in the training dataset:<\/p>\n<pre class=\"crayon-plain-tag\">...\r\n# make predictions on the entire training dataset\r\nyhat = model.predict(X)<\/pre>\n<p>We can then step through the indexes and access the input and the predicted output for each.<\/p>\n<p>This shows precisely how to connect the predictions with the input rows. For example, the input at row 0 and the prediction at index 0:<\/p>\n<pre class=\"crayon-plain-tag\">...\r\nprint(X[0], yhat[0])<\/pre>\n<p>In this case, we will just look at the first 10 rows and their predictions.<\/p>\n<pre class=\"crayon-plain-tag\">...\r\n# connect predictions with outputs\r\nfor i in range(10):\r\n\tprint(X[i], yhat[i])<\/pre>\n<p>Tying this together, the complete example of making a prediction for each row in the training data and connecting the predictions with the inputs is listed below.<\/p>\n<pre class=\"crayon-plain-tag\"># make a single prediction with the model\r\nfrom sklearn.linear_model import LogisticRegression\r\nfrom sklearn.datasets.samples_generator import make_blobs\r\n# create the inputs and outputs\r\nX, y = make_blobs(n_samples=1000, centers=2, n_features=2, random_state=2)\r\n# define model\r\nmodel = LogisticRegression(solver='lbfgs')\r\n# fit model\r\nmodel.fit(X, y)\r\n# make predictions on the entire training dataset\r\nyhat = model.predict(X)\r\n# connect predictions with outputs\r\nfor i in range(10):\r\n\tprint(X[i], yhat[i])<\/pre>\n<p>Running the example, the model makes 1,000 predictions for the 1,000 rows in the training dataset, then connects the inputs to the predicted values for the first 10 examples.<\/p>\n<p>This provides a template that you can use and adapt for your own predictive modeling projects to connect predictions to the input rows via their row index.<\/p>\n<pre class=\"crayon-plain-tag\">[ 1.23839154 -2.8475005 ] 1\r\n[-1.25884111 -8.57055785] 0\r\n[ -0.86599821 -10.50446358] 0\r\n[ 0.59831673 -1.06451727] 1\r\n[ 2.12309797 -1.41131072] 1\r\n[-1.53722693 -9.61845366] 0\r\n[ 0.92194131 -0.68709327] 1\r\n[-1.31478732 -8.78528161] 0\r\n[ 1.57989896 -1.462412  ] 1\r\n[ 1.36989667 -1.3964704 ] 1<\/pre>\n<\/p>\n<h2>Further Reading<\/h2>\n<p>This section provides more resources on the topic if you are looking to go deeper.<\/p>\n<h3>Posts<\/h3>\n<ul>\n<li><a href=\"https:\/\/machinelearningmastery.com\/machine-learning-in-python-step-by-step\/\">Your First Machine Learning Project in Python Step-By-Step<\/a><\/li>\n<li><a href=\"https:\/\/machinelearningmastery.com\/make-predictions-scikit-learn\/\">How to Make Predictions with scikit-learn<\/a><\/li>\n<\/ul>\n<h3>APIs<\/h3>\n<ul>\n<li><a href=\"https:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.datasets.make_blobs.html\">sklearn.datasets.make_blobs API<\/a><\/li>\n<li><a href=\"https:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.metrics.accuracy_score.html\">sklearn.metrics.accuracy_score API<\/a><\/li>\n<li><a href=\"https:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.linear_model.LogisticRegression.html\">sklearn.linear_model.LogisticRegression API<\/a><\/li>\n<\/ul>\n<h2>Summary<\/h2>\n<p>In this tutorial, you discovered how to relate the predicted values with the inputs to a machine learning model.<\/p>\n<p>Specifically, you learned:<\/p>\n<ul>\n<li>How to fit and evaluate the model on a training dataset.<\/li>\n<li>How to use the fit model to make predictions one at a time and in batches.<\/li>\n<li>How to connect the predicted values with the inputs to the model.<\/li>\n<\/ul>\n<p>Do you have any questions?<br \/>\nAsk your questions in the comments below and I will do my best to answer.<\/p>\n<p>The post <a rel=\"nofollow\" href=\"https:\/\/machinelearningmastery.com\/how-to-connect-model-input-data-with-predictions-for-machine-learning\/\">How to Connect Model Input Data With Predictions for Machine Learning<\/a> appeared first on <a rel=\"nofollow\" href=\"https:\/\/machinelearningmastery.com\/\">Machine Learning Mastery<\/a>.<\/p>\n<\/div>\n<p><a href=\"https:\/\/machinelearningmastery.com\/how-to-connect-model-input-data-with-predictions-for-machine-learning\/\">Go to Source<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Author: Jason Brownlee Fitting a model to a training dataset is so easy today with libraries like scikit-learn. A model can be fit and evaluated [&hellip;] <span class=\"read-more-link\"><a class=\"read-more\" href=\"https:\/\/www.aiproblog.com\/index.php\/2019\/11\/14\/how-to-connect-model-input-data-with-predictions-for-machine-learning\/\">Read More<\/a><\/span><\/p>\n","protected":false},"author":1,"featured_media":2817,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_bbp_topic_count":0,"_bbp_reply_count":0,"_bbp_total_topic_count":0,"_bbp_total_reply_count":0,"_bbp_voice_count":0,"_bbp_anonymous_reply_count":0,"_bbp_topic_count_hidden":0,"_bbp_reply_count_hidden":0,"_bbp_forum_subforum_count":0,"footnotes":""},"categories":[24],"tags":[],"_links":{"self":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/2816"}],"collection":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/comments?post=2816"}],"version-history":[{"count":0,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/2816\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media\/2817"}],"wp:attachment":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media?parent=2816"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/categories?post=2816"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/tags?post=2816"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}