{"id":4257,"date":"2021-01-03T18:00:59","date_gmt":"2021-01-03T18:00:59","guid":{"rendered":"https:\/\/www.aiproblog.com\/index.php\/2021\/01\/03\/semi-supervised-learning-with-label-spreading\/"},"modified":"2021-01-03T18:00:59","modified_gmt":"2021-01-03T18:00:59","slug":"semi-supervised-learning-with-label-spreading","status":"publish","type":"post","link":"https:\/\/www.aiproblog.com\/index.php\/2021\/01\/03\/semi-supervised-learning-with-label-spreading\/","title":{"rendered":"Semi-Supervised Learning With Label Spreading"},"content":{"rendered":"<p>Author: Jason Brownlee<\/p>\n<div>\n<p><strong>Semi-supervised learning<\/strong> refers to algorithms that attempt to make use of both labeled and unlabeled training data.<\/p>\n<p>Semi-supervised learning algorithms are unlike supervised learning algorithms that are only able to learn from labeled training data.<\/p>\n<p>A popular approach to semi-supervised learning is to create a graph that connects examples in the training dataset and propagates known labels through the edges of the graph to label unlabeled examples. An example of this approach to semi-supervised learning is the <strong>label spreading algorithm<\/strong> for classification predictive modeling.<\/p>\n<p>In this tutorial, you will discover how to apply the label spreading algorithm to a semi-supervised learning classification dataset.<\/p>\n<p>After completing this tutorial, you will know:<\/p>\n<ul>\n<li>An intuition for how the label spreading semi-supervised learning algorithm works.<\/li>\n<li>How to develop a semi-supervised classification dataset and establish a baseline in performance with a supervised learning algorithm.<\/li>\n<li>How to develop and evaluate a label spreading algorithm and use the model output to train a supervised learning algorithm.<\/li>\n<\/ul>\n<p>Let&rsquo;s get started.<\/p>\n<div id=\"attachment_12026\" style=\"width: 809px\" class=\"wp-caption aligncenter\"><img decoding=\"async\" aria-describedby=\"caption-attachment-12026\" loading=\"lazy\" class=\"size-full wp-image-12026\" src=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2021\/04\/Semi-Supervised-Learning-With-Label-Spreading.jpg\" alt=\"Semi-Supervised Learning With Label Spreading\" width=\"799\" height=\"533\" srcset=\"http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2021\/04\/Semi-Supervised-Learning-With-Label-Spreading.jpg 799w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2021\/04\/Semi-Supervised-Learning-With-Label-Spreading-300x200.jpg 300w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2021\/04\/Semi-Supervised-Learning-With-Label-Spreading-768x512.jpg 768w\" sizes=\"(max-width: 799px) 100vw, 799px\"><\/p>\n<p id=\"caption-attachment-12026\" class=\"wp-caption-text\">Semi-Supervised Learning With Label Spreading<br \/>Photo by <a href=\"https:\/\/www.flickr.com\/photos\/91261194@N06\/44897768224\/\">Jernej Furman<\/a>, some rights reserved.<\/p>\n<\/div>\n<h2>Tutorial Overview<\/h2>\n<p>This tutorial is divided into three parts; they are:<\/p>\n<ol>\n<li>Label Spreading Algorithm<\/li>\n<li>Semi-Supervised Classification Dataset<\/li>\n<li>Label Spreading for Semi-Supervised Learning<\/li>\n<\/ol>\n<h2>Label Spreading Algorithm<\/h2>\n<p>Label Spreading is a semi-supervised learning algorithm.<\/p>\n<p>The algorithm was introduced by Dengyong Zhou, et al. in their 2003 paper titled &ldquo;<a href=\"https:\/\/proceedings.neurips.cc\/paper\/2003\/file\/87682805257e619d49b8e0dfdc14affa-Paper.pdf\">Learning With Local And Global Consistency<\/a>.&rdquo;<\/p>\n<p>The intuition for the broader approach of semi-supervised learning is that nearby points in the input space should have the same label, and points in the same structure or manifold in the input space should have the same label.<\/p>\n<blockquote>\n<p>The key to semi-supervised learning problems is the prior assumption of consistency, which means: (1) nearby points are likely to have the same label; and (2) points on the same structure typically referred to as a cluster or a manifold) are likely to have the same label.<\/p>\n<\/blockquote>\n<p>&mdash; <a href=\"https:\/\/proceedings.neurips.cc\/paper\/2003\/file\/87682805257e619d49b8e0dfdc14affa-Paper.pdf\">Learning With Local And Global Consistency<\/a>, 2003.<\/p>\n<p>The label spreading is inspired by a technique from experimental psychology called spreading activation networks.<\/p>\n<blockquote>\n<p>This algorithm can be understood intuitively in terms of spreading activation networks from experimental psychology.<\/p>\n<\/blockquote>\n<p>&mdash; <a href=\"https:\/\/proceedings.neurips.cc\/paper\/2003\/file\/87682805257e619d49b8e0dfdc14affa-Paper.pdf\">Learning With Local And Global Consistency<\/a>, 2003.<\/p>\n<p>Points in the dataset are connected in a graph based on their relative distances in the input space. The weight matrix of the graph is normalized symmetrically, much like <a href=\"https:\/\/machinelearningmastery.com\/clustering-algorithms-with-python\/\">spectral clustering<\/a>. Information is passed through the graph, which is adapted to capture the structure in the input space.<\/p>\n<p>The approach is very similar to the label propagation algorithm for semi-supervised learning.<\/p>\n<blockquote>\n<p>Another similar label propagation algorithm was given by Zhou et al.: at each step a node i receives a contribution from its neighbors j (weighted by the normalized weight of the edge (i,j)), and an additional small contribution given by its initial value<\/p>\n<\/blockquote>\n<p>&mdash; Page 196, <a href=\"https:\/\/amzn.to\/3fVfO3O\">Semi-Supervised Learning<\/a>, 2006.<\/p>\n<p>After convergence, labels are applied based on nodes that passed on the most information.<\/p>\n<blockquote>\n<p>Finally, the label of each unlabeled point is set to be the class of which it has received most information during the iteration process.<\/p>\n<\/blockquote>\n<p>&mdash; <a href=\"https:\/\/proceedings.neurips.cc\/paper\/2003\/file\/87682805257e619d49b8e0dfdc14affa-Paper.pdf\">Learning With Local And Global Consistency<\/a>, 2003.<\/p>\n<p>Now that we are familiar with the label spreading algorithm, let&rsquo;s look at how we might use it on a project. First, we must define a semi-supervised classification dataset.<\/p>\n<h2>Semi-Supervised Classification Dataset<\/h2>\n<p>In this section, we will define a dataset for semis-supervised learning and establish a baseline in performance on the dataset.<\/p>\n<p>First, we can define a synthetic classification dataset using the <a href=\"https:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.datasets.make_classification.html\">make_classification() function<\/a>.<\/p>\n<p>We will define the dataset with two classes (binary classification) and two input variables and 1,000 examples.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">...\r\n# define dataset\r\nX, y = make_classification(n_samples=1000, n_features=2, n_informative=2, n_redundant=0, random_state=1)<\/pre>\n<p>Next, we will split the dataset into train and test datasets with an equal 50-50 split (e.g. 500 rows in each).<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">...\r\n# split into train and test\r\nX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.50, random_state=1, stratify=y)<\/pre>\n<p>Finally, we will split the training dataset in half again into a portion that will have labels and a portion that we will pretend is unlabeled.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">...\r\n# split train into labeled and unlabeled\r\nX_train_lab, X_test_unlab, y_train_lab, y_test_unlab = train_test_split(X_train, y_train, test_size=0.50, random_state=1, stratify=y_train)<\/pre>\n<p>Tying this together, the complete example of preparing the semi-supervised learning dataset is listed below.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\"># prepare semi-supervised learning dataset\r\nfrom sklearn.datasets import make_classification\r\nfrom sklearn.model_selection import train_test_split\r\n# define dataset\r\nX, y = make_classification(n_samples=1000, n_features=2, n_informative=2, n_redundant=0, random_state=1)\r\n# split into train and test\r\nX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.50, random_state=1, stratify=y)\r\n# split train into labeled and unlabeled\r\nX_train_lab, X_test_unlab, y_train_lab, y_test_unlab = train_test_split(X_train, y_train, test_size=0.50, random_state=1, stratify=y_train)\r\n# summarize training set size\r\nprint('Labeled Train Set:', X_train_lab.shape, y_train_lab.shape)\r\nprint('Unlabeled Train Set:', X_test_unlab.shape, y_test_unlab.shape)\r\n# summarize test set size\r\nprint('Test Set:', X_test.shape, y_test.shape)<\/pre>\n<p>Running the example prepares the dataset and then summarizes the shape of each of the three portions.<\/p>\n<p>The results confirm that we have a test dataset of 500 rows, a labeled training dataset of 250 rows, and 250 rows of unlabeled data.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">Labeled Train Set: (250, 2) (250,)\r\nUnlabeled Train Set: (250, 2) (250,)\r\nTest Set: (500, 2) (500,)<\/pre>\n<p>A supervised learning algorithm will only have 250 rows from which to train a model.<\/p>\n<p>A semi-supervised learning algorithm will have the 250 labeled rows as well as the 250 unlabeled rows that could be used in numerous ways to improve the labeled training dataset.<\/p>\n<p>Next, we can establish a baseline in performance on the semi-supervised learning dataset using a supervised learning algorithm fit only on the labeled training data.<\/p>\n<p>This is important because we would expect a semi-supervised learning algorithm to outperform a supervised learning algorithm fit on the labeled data alone. If this is not the case, then the semi-supervised learning algorithm does not have skill.<\/p>\n<p>In this case, we will use a logistic regression algorithm fit on the labeled portion of the training dataset.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">...\r\n# define model\r\nmodel = LogisticRegression()\r\n# fit model on labeled dataset\r\nmodel.fit(X_train_lab, y_train_lab)<\/pre>\n<p>The model can then be used to make predictions on the entire holdout test dataset and evaluated using classification accuracy.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">...\r\n# make predictions on hold out test set\r\nyhat = model.predict(X_test)\r\n# calculate score for test set\r\nscore = accuracy_score(y_test, yhat)\r\n# summarize score\r\nprint('Accuracy: %.3f' % (score*100))<\/pre>\n<p>Tying this together, the complete example of evaluating a supervised learning algorithm on the semi-supervised learning dataset is listed below.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\"># baseline performance on the semi-supervised learning dataset\r\nfrom sklearn.datasets import make_classification\r\nfrom sklearn.model_selection import train_test_split\r\nfrom sklearn.metrics import accuracy_score\r\nfrom sklearn.linear_model import LogisticRegression\r\n# define dataset\r\nX, y = make_classification(n_samples=1000, n_features=2, n_informative=2, n_redundant=0, random_state=1)\r\n# split into train and test\r\nX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.50, random_state=1, stratify=y)\r\n# split train into labeled and unlabeled\r\nX_train_lab, X_test_unlab, y_train_lab, y_test_unlab = train_test_split(X_train, y_train, test_size=0.50, random_state=1, stratify=y_train)\r\n# define model\r\nmodel = LogisticRegression()\r\n# fit model on labeled dataset\r\nmodel.fit(X_train_lab, y_train_lab)\r\n# make predictions on hold out test set\r\nyhat = model.predict(X_test)\r\n# calculate score for test set\r\nscore = accuracy_score(y_test, yhat)\r\n# summarize score\r\nprint('Accuracy: %.3f' % (score*100))<\/pre>\n<p>Running the algorithm fits the model on the labeled training dataset and evaluates it on the holdout dataset and prints the classification accuracy.<\/p>\n<p><strong>Note<\/strong>: Your <a href=\"https:\/\/machinelearningmastery.com\/different-results-each-time-in-machine-learning\/\">results may vary<\/a> given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.<\/p>\n<p>In this case, we can see that the algorithm achieved a classification accuracy of about 84.8 percent.<\/p>\n<p>We would expect an effective semi-supervised learning algorithm to achieve a better accuracy than this.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">Accuracy: 84.800<\/pre>\n<p>Next, let&rsquo;s explore how to apply the label spreading algorithm to the dataset.<\/p>\n<h2>Label Spreading for Semi-Supervised Learning<\/h2>\n<p>The label spreading algorithm is available in the scikit-learn Python machine learning library via the <a href=\"https:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.semi_supervised.LabelSpreading.html\">LabelSpreading class<\/a>.<\/p>\n<p>The model can be fit just like any other classification model by calling the <em>fit()<\/em> function and used to make predictions for new data via the <em>predict()<\/em> function.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">...\r\n# define model\r\nmodel = LabelSpreading()\r\n# fit model on training dataset\r\nmodel.fit(..., ...)\r\n# make predictions on hold out test set\r\nyhat = model.predict(...)<\/pre>\n<p>Importantly, the training dataset provided to the <em>fit()<\/em> function must include labeled examples that are ordinal encoded (as per normal) and unlabeled examples marked with a label of -1.<\/p>\n<p>The model will then determine a label for the unlabeled examples as part of fitting the model.<\/p>\n<p>After the model is fit, the estimated labels for the labeled and unlabeled data in the training dataset is available via the &ldquo;<em>transduction_<\/em>&rdquo; attribute on the <em>LabelSpreading<\/em> class.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">...\r\n# get labels for entire training dataset data\r\ntran_labels = model.transduction_<\/pre>\n<p>Now that we are familiar with how to use the label spreading algorithm in scikit-learn, let&rsquo;s look at how we might apply it to our semi-supervised learning dataset.<\/p>\n<p>First, we must prepare the training dataset.<\/p>\n<p>We can concatenate the input data of the training dataset into a single array.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">...\r\n# create the training dataset input\r\nX_train_mixed = concatenate((X_train_lab, X_test_unlab))<\/pre>\n<p>We can then create a list of -1 valued (unlabeled) for each row in the unlabeled portion of the training dataset.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">...\r\n# create \"no label\" for unlabeled data\r\nnolabel = [-1 for _ in range(len(y_test_unlab))]<\/pre>\n<p>This list can then be concatenated with the labels from the labeled portion of the training dataset to correspond with the input array for the training dataset.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">...\r\n# recombine training dataset labels\r\ny_train_mixed = concatenate((y_train_lab, nolabel))<\/pre>\n<p>We can now train the <em>LabelSpreading<\/em> model on the entire training dataset.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">...\r\n# define model\r\nmodel = LabelSpreading()\r\n# fit model on training dataset\r\nmodel.fit(X_train_mixed, y_train_mixed)<\/pre>\n<p>Next, we can use the model to make predictions on the holdout dataset and evaluate the model using classification accuracy.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">...\r\n# make predictions on hold out test set\r\nyhat = model.predict(X_test)\r\n# calculate score for test set\r\nscore = accuracy_score(y_test, yhat)\r\n# summarize score\r\nprint('Accuracy: %.3f' % (score*100))<\/pre>\n<p>Tying this together, the complete example of evaluating label spreading on the semi-supervised learning dataset is listed below.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\"># evaluate label spreading on the semi-supervised learning dataset\r\nfrom numpy import concatenate\r\nfrom sklearn.datasets import make_classification\r\nfrom sklearn.model_selection import train_test_split\r\nfrom sklearn.metrics import accuracy_score\r\nfrom sklearn.semi_supervised import LabelSpreading\r\n# define dataset\r\nX, y = make_classification(n_samples=1000, n_features=2, n_informative=2, n_redundant=0, random_state=1)\r\n# split into train and test\r\nX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.50, random_state=1, stratify=y)\r\n# split train into labeled and unlabeled\r\nX_train_lab, X_test_unlab, y_train_lab, y_test_unlab = train_test_split(X_train, y_train, test_size=0.50, random_state=1, stratify=y_train)\r\n# create the training dataset input\r\nX_train_mixed = concatenate((X_train_lab, X_test_unlab))\r\n# create \"no label\" for unlabeled data\r\nnolabel = [-1 for _ in range(len(y_test_unlab))]\r\n# recombine training dataset labels\r\ny_train_mixed = concatenate((y_train_lab, nolabel))\r\n# define model\r\nmodel = LabelSpreading()\r\n# fit model on training dataset\r\nmodel.fit(X_train_mixed, y_train_mixed)\r\n# make predictions on hold out test set\r\nyhat = model.predict(X_test)\r\n# calculate score for test set\r\nscore = accuracy_score(y_test, yhat)\r\n# summarize score\r\nprint('Accuracy: %.3f' % (score*100))<\/pre>\n<p>Running the algorithm fits the model on the entire training dataset and evaluates it on the holdout dataset and prints the classification accuracy.<\/p>\n<p><strong>Note<\/strong>: Your <a href=\"https:\/\/machinelearningmastery.com\/different-results-each-time-in-machine-learning\/\">results may vary<\/a> given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.<\/p>\n<p>In this case, we can see that the label spreading model achieves a classification accuracy of about 85.4 percent, which is slightly higher than a logistic regression fit only on the labeled training dataset that achieved an accuracy of about 84.8 percent.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">Accuracy: 85.400<\/pre>\n<p>So far so good.<\/p>\n<p>Another approach we can use with the semi-supervised model is to take the estimated labels for the training dataset and fit a supervised learning model.<\/p>\n<p>Recall that we can retrieve the labels for the entire training dataset from the label spreading model as follows:<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">...\r\n# get labels for entire training dataset data\r\ntran_labels = model.transduction_<\/pre>\n<p>We can then use these labels, along with all of the input data, to train and evaluate a supervised learning algorithm, such as a logistic regression model.<\/p>\n<p>The hope is that the supervised learning model fit on the entire training dataset would achieve even better performance than the semi-supervised learning model alone.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">...\r\n# define supervised learning model\r\nmodel2 = LogisticRegression()\r\n# fit supervised learning model on entire training dataset\r\nmodel2.fit(X_train_mixed, tran_labels)\r\n# make predictions on hold out test set\r\nyhat = model2.predict(X_test)\r\n# calculate score for test set\r\nscore = accuracy_score(y_test, yhat)\r\n# summarize score\r\nprint('Accuracy: %.3f' % (score*100))<\/pre>\n<p>Tying this together, the complete example of using the estimated training set labels to train and evaluate a supervised learning model is listed below.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\"># evaluate logistic regression fit on label spreading for semi-supervised learning\r\nfrom numpy import concatenate\r\nfrom sklearn.datasets import make_classification\r\nfrom sklearn.model_selection import train_test_split\r\nfrom sklearn.metrics import accuracy_score\r\nfrom sklearn.semi_supervised import LabelSpreading\r\nfrom sklearn.linear_model import LogisticRegression\r\n# define dataset\r\nX, y = make_classification(n_samples=1000, n_features=2, n_informative=2, n_redundant=0, random_state=1)\r\n# split into train and test\r\nX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.50, random_state=1, stratify=y)\r\n# split train into labeled and unlabeled\r\nX_train_lab, X_test_unlab, y_train_lab, y_test_unlab = train_test_split(X_train, y_train, test_size=0.50, random_state=1, stratify=y_train)\r\n# create the training dataset input\r\nX_train_mixed = concatenate((X_train_lab, X_test_unlab))\r\n# create \"no label\" for unlabeled data\r\nnolabel = [-1 for _ in range(len(y_test_unlab))]\r\n# recombine training dataset labels\r\ny_train_mixed = concatenate((y_train_lab, nolabel))\r\n# define model\r\nmodel = LabelSpreading()\r\n# fit model on training dataset\r\nmodel.fit(X_train_mixed, y_train_mixed)\r\n# get labels for entire training dataset data\r\ntran_labels = model.transduction_\r\n# define supervised learning model\r\nmodel2 = LogisticRegression()\r\n# fit supervised learning model on entire training dataset\r\nmodel2.fit(X_train_mixed, tran_labels)\r\n# make predictions on hold out test set\r\nyhat = model2.predict(X_test)\r\n# calculate score for test set\r\nscore = accuracy_score(y_test, yhat)\r\n# summarize score\r\nprint('Accuracy: %.3f' % (score*100))<\/pre>\n<p>Running the algorithm fits the semi-supervised model on the entire training dataset, then fits a supervised learning model on the entire training dataset with inferred labels and evaluates it on the holdout dataset, printing the classification accuracy.<\/p>\n<p><strong>Note<\/strong>: Your <a href=\"https:\/\/machinelearningmastery.com\/different-results-each-time-in-machine-learning\/\">results may vary<\/a> given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.<\/p>\n<p>In this case, we can see that this hierarchical approach of semi-supervised model followed by supervised model achieves a classification accuracy of about 85.8 percent on the holdout dataset, slightly better than the semi-supervised learning algorithm used alone that achieved an accuracy of about 85.6 percent.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">Accuracy: 85.800<\/pre>\n<p><strong>Can you achieve better results by tuning the hyperparameters of the LabelSpreading model?<\/strong><br \/>\nLet me know what you discover in the comments below.<\/p>\n<h2>Further Reading<\/h2>\n<p>This section provides more resources on the topic if you are looking to go deeper.<\/p>\n<h3>Books<\/h3>\n<ul>\n<li><a href=\"https:\/\/amzn.to\/37niYJw\">Introduction to Semi-Supervised Learning<\/a>, 2009.<\/li>\n<li>Chapter 11: Label Propagation and Quadratic Criterion, <a href=\"https:\/\/amzn.to\/3fVfO3O\">Semi-Supervised Learning<\/a>, 2006.<\/li>\n<\/ul>\n<h3>Papers<\/h3>\n<ul>\n<li><a href=\"https:\/\/proceedings.neurips.cc\/paper\/2003\/file\/87682805257e619d49b8e0dfdc14affa-Paper.pdf\">Learning With Local And Global Consistency<\/a>, 2003.<\/li>\n<\/ul>\n<h3>APIs<\/h3>\n<ul>\n<li><a href=\"https:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.semi_supervised.LabelSpreading.html\">sklearn.semi_supervised.LabelSpreading API<\/a>.<\/li>\n<li><a href=\"https:\/\/scikit-learn.org\/stable\/modules\/label_propagation.html\">Section 1.14. Semi-Supervised, Scikit-Learn User Guide<\/a>.<\/li>\n<li><a href=\"https:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.model_selection.train_test_split.html\">sklearn.model_selection.train_test_split API<\/a>.<\/li>\n<li><a href=\"https:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.linear_model.LogisticRegression.html\">sklearn.linear_model.LogisticRegression API<\/a>.<\/li>\n<li><a href=\"https:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.datasets.make_classification.html\">sklearn.datasets.make_classification API<\/a>.<\/li>\n<\/ul>\n<h3>Articles<\/h3>\n<ul>\n<li><a href=\"https:\/\/en.wikipedia.org\/wiki\/Semi-supervised_learning\">Semi-supervised learning, Wikipedia<\/a>.<\/li>\n<\/ul>\n<h2>Summary<\/h2>\n<p>In this tutorial, you discovered how to apply the label spreading algorithm to a semi-supervised learning classification dataset.<\/p>\n<p>Specifically, you learned:<\/p>\n<ul>\n<li>An intuition for how the label spreading semi-supervised learning algorithm works.<\/li>\n<li>How to develop a semi-supervised classification dataset and establish a baseline in performance with a supervised learning algorithm.<\/li>\n<li>How to develop and evaluate a label spreading algorithm and use the model output to train a supervised learning algorithm.<\/li>\n<\/ul>\n<p><strong>Do you have any questions?<\/strong><br \/>\nAsk your questions in the comments below and I will do my best to answer.<\/p>\n<p>The post <a rel=\"nofollow\" href=\"https:\/\/machinelearningmastery.com\/semi-supervised-learning-with-label-spreading\/\">Semi-Supervised Learning With Label Spreading<\/a> appeared first on <a rel=\"nofollow\" href=\"https:\/\/machinelearningmastery.com\/\">Machine Learning Mastery<\/a>.<\/p>\n<\/div>\n<p><a href=\"https:\/\/machinelearningmastery.com\/semi-supervised-learning-with-label-spreading\/\">Go to Source<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Author: Jason Brownlee Semi-supervised learning refers to algorithms that attempt to make use of both labeled and unlabeled training data. Semi-supervised learning algorithms are unlike [&hellip;] <span class=\"read-more-link\"><a class=\"read-more\" href=\"https:\/\/www.aiproblog.com\/index.php\/2021\/01\/03\/semi-supervised-learning-with-label-spreading\/\">Read More<\/a><\/span><\/p>\n","protected":false},"author":1,"featured_media":4258,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_bbp_topic_count":0,"_bbp_reply_count":0,"_bbp_total_topic_count":0,"_bbp_total_reply_count":0,"_bbp_voice_count":0,"_bbp_anonymous_reply_count":0,"_bbp_topic_count_hidden":0,"_bbp_reply_count_hidden":0,"_bbp_forum_subforum_count":0,"footnotes":""},"categories":[24],"tags":[],"_links":{"self":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/4257"}],"collection":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/comments?post=4257"}],"version-history":[{"count":0,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/4257\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media\/4258"}],"wp:attachment":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media?parent=4257"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/categories?post=4257"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/tags?post=4257"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}