{"id":2985,"date":"2019-12-31T18:00:11","date_gmt":"2019-12-31T18:00:11","guid":{"rendered":"https:\/\/www.aiproblog.com\/index.php\/2019\/12\/31\/failure-of-classification-accuracy-for-imbalanced-class-distributions\/"},"modified":"2019-12-31T18:00:11","modified_gmt":"2019-12-31T18:00:11","slug":"failure-of-classification-accuracy-for-imbalanced-class-distributions","status":"publish","type":"post","link":"https:\/\/www.aiproblog.com\/index.php\/2019\/12\/31\/failure-of-classification-accuracy-for-imbalanced-class-distributions\/","title":{"rendered":"Failure of Classification Accuracy for Imbalanced Class Distributions"},"content":{"rendered":"<p>Author: Jason Brownlee<\/p>\n<div>\n<p>Classification accuracy is a metric that summarizes the performance of a classification model as the number of correct predictions divided by the total number of predictions.<\/p>\n<p>It is easy to calculate and intuitive to understand, making it the most common metric used for evaluating classifier models. This intuition breaks down when the distribution of examples to classes is severely skewed.<\/p>\n<p>Intuitions developed by practitioners on balanced datasets, such as 99 percent representing a skillful model, can be incorrect and dangerously misleading on imbalanced classification predictive modeling problems.<\/p>\n<p>In this tutorial, you will discover the failure of classification accuracy for imbalanced classification problems.<\/p>\n<p>After completing this tutorial, you will know:<\/p>\n<ul>\n<li>Accuracy and error rate are the de facto standard metrics for summarizing the performance of classification models.<\/li>\n<li>Classification accuracy fails on classification problems with a skewed class distribution because of the intuitions developed by practitioners on datasets with an equal class distribution.<\/li>\n<li>Intuition for the failure of accuracy for skewed class distributions with a worked example.<\/li>\n<\/ul>\n<p>Let\u2019s get started.<\/p>\n<div id=\"attachment_9326\" style=\"width: 810px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-9326\" class=\"size-full wp-image-9326\" src=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2020\/01\/Classification-Accuracy-Is-Misleading-for-Skewed-Class-Distributions.jpg\" alt=\"Classification Accuracy Is Misleading for Skewed Class Distributions\" width=\"800\" height=\"600\" srcset=\"http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2020\/01\/Classification-Accuracy-Is-Misleading-for-Skewed-Class-Distributions.jpg 800w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2020\/01\/Classification-Accuracy-Is-Misleading-for-Skewed-Class-Distributions-300x225.jpg 300w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2020\/01\/Classification-Accuracy-Is-Misleading-for-Skewed-Class-Distributions-768x576.jpg 768w\" sizes=\"(max-width: 800px) 100vw, 800px\"><\/p>\n<p id=\"caption-attachment-9326\" class=\"wp-caption-text\">Classification Accuracy Is Misleading for Skewed Class Distributions<br \/>Photo by <a href=\"https:\/\/flickr.com\/photos\/esqui-ando-con-tonho\/41295716874\/\">Esqui-Ando con T\u00f2nho<\/a>, some rights reserved.<\/p>\n<\/div>\n<h2>Tutorial Overview<\/h2>\n<p>This tutorial is divided into three parts; they are:<\/p>\n<ol>\n<li>What Is Classification Accuracy?<\/li>\n<li>Accuracy Fails for Imbalanced Classification<\/li>\n<li>Example of Accuracy for Imbalanced Classification<\/li>\n<\/ol>\n<h2>What Is Classification Accuracy?<\/h2>\n<p>Classification predictive modeling involves predicting a class label given examples in a problem domain.<\/p>\n<p>The most common metric used to evaluate the performance of a classification predictive model is classification accuracy. Typically, the accuracy of a predictive model is good (above 90% accuracy), therefore it is also very common to summarize the performance of a model in terms of the error rate of the model.<\/p>\n<blockquote>\n<p>Accuracy and its complement error rate are the most frequently used metrics for estimating the performance of learning systems in classification problems.<\/p>\n<\/blockquote>\n<p>\u2014 <a href=\"https:\/\/arxiv.org\/abs\/1505.01658\">A Survey of Predictive Modelling under Imbalanced Distributions<\/a>, 2015.<\/p>\n<p><a href=\"https:\/\/en.wikipedia.org\/wiki\/Accuracy_and_precision\">Classification accuracy<\/a> involves first using a classification model to make a prediction for each example in a test dataset. The predictions are then compared to the known labels for those examples in the test set. Accuracy is then calculated as the proportion of examples in the test set that were predicted correctly, divided by all predictions that were made on the test set.<\/p>\n<ul>\n<li>Accuracy = Correct Predictions \/ Total Predictions<\/li>\n<\/ul>\n<p>Conversely, the error rate can be calculated as the total number of incorrect predictions made on the test set divided by all predictions made on the test set.<\/p>\n<ul>\n<li>Error Rate = Incorrect Predictions \/ Total Predictions<\/li>\n<\/ul>\n<p>The accuracy and error rate are complements of each other, meaning that we can always calculate one from the other. For example:<\/p>\n<ul>\n<li>Accuracy = 1 \u2013 Error Rate<\/li>\n<li>Error Rate = 1 \u2013 Accuracy<\/li>\n<\/ul>\n<p>Another valuable way to think about accuracy is in terms of the <a href=\"https:\/\/machinelearningmastery.com\/confusion-matrix-machine-learning\/\">confusion matrix<\/a>.<\/p>\n<p>A confusion matrix is a summary of the predictions made by a classification model organized into a table by class. Each row of the table indicates the actual class and each column represents the predicted class. A value in the cell is a count of the number of predictions made for a class that are actually for a given class. The cells on the diagonal represent correct predictions, where a predicted and expected class align.<\/p>\n<blockquote>\n<p>The most straightforward way to evaluate the performance of classifiers is based on the confusion matrix analysis. [\u2026] From such a matrix it is possible to extract a number of widely used metrics for measuring the performance of learning systems, such as Error Rate [\u2026] and Accuracy \u2026<\/p>\n<\/blockquote>\n<p>\u2014 <a href=\"https:\/\/dl.acm.org\/citation.cfm?id=1007735\">A Study Of The Behavior Of Several Methods For Balancing Machine Learning Training Data<\/a>, 2004.<\/p>\n<p>The confusion matrix provides more insight into not only the accuracy of a predictive model, but also which classes are being predicted correctly, which incorrectly, and what type of errors are being made.<\/p>\n<p>The simplest confusion matrix is for a two-class classification problem, with negative (class 0) and positive (class 1) classes.<\/p>\n<p>In this type of confusion matrix, each cell in the table has a specific and well-understood name, summarized as follows:<\/p>\n<pre class=\"crayon-plain-tag\">| Positive Prediction | Negative Prediction\r\nPositive Class | True Positive (TP)  | False Negative (FN)\r\nNegative Class | False Positive (FP) | True Negative (TN)<\/pre>\n<p>The classification accuracy can be calculated from this confusion matrix as the sum of correct cells in the table (true positives and true negatives) divided by all cells in the table.<\/p>\n<ul>\n<li>Accuracy = (TP + TN) \/ (TP + FN + FP + TN)<\/li>\n<\/ul>\n<p>Similarly, the error rate can also be calculated from the confusion matrix as the sum of incorrect cells of the table (false positives and false negatives) divided by all cells of the table.<\/p>\n<ul>\n<li>Error Rate = (FP + FN) \/ (TP + FN + FP + TN)<\/li>\n<\/ul>\n<p>Now that we are familiar with classification accuracy and its complement error rate, let\u2019s discover why they might be a bad idea to use for imbalanced classification problems.<\/p>\n<h2>Accuracy Fails for Imbalanced Classification<\/h2>\n<p>Classification accuracy is the most-used metric for evaluating classification models.<\/p>\n<p>The reason for its wide use is because it is easy to calculate, easy to interpret, and is a single number to summarize the model\u2019s capability.<\/p>\n<p>As such, it is natural to use it on imbalanced classification problems, where the distribution of examples in the training dataset across the classes is not equal.<\/p>\n<p>This is the most common mistake made by beginners to imbalanced classification.<\/p>\n<p>When the class distribution is slightly skewed, accuracy can still be a useful metric. When the skew in the class distributions are severe, accuracy can become an unreliable measure of model performance.<\/p>\n<p>The reason for this unreliability is centered around the average machine learning practitioner and the intuitions for classification accuracy.<\/p>\n<p>Typically, classification predictive modeling is practiced with small datasets where the class distribution is equal or very close to equal. Therefore, most practitioners develop an intuition that large accuracy score (or conversely small error rate scores) are good, and values above 90 percent are great.<\/p>\n<p>Achieving 90 percent classification accuracy, or even 99 percent classification accuracy, may be trivial on an imbalanced classification problem.<\/p>\n<p>This means that intuitions for classification accuracy developed on balanced class distributions will be applied and will be wrong, misleading the practitioner into thinking that a model has good or even excellent performance when it, in fact, does not.<\/p>\n<h3>Accuracy Paradox<\/h3>\n<p>Consider the case of an imbalanced dataset with a 1:100 class imbalance.<\/p>\n<p>In this problem, each example of the minority class (class 1) will have a corresponding 100 examples for the majority class (class 0).<\/p>\n<p>In problems of this type, the majority class represents \u201c<em>normal<\/em>\u201d and the minority class represents \u201c<em>abnormal<\/em>,\u201d such as a fault, a diagnosis, or a fraud. Good performance on the minority class will be preferred over good performance on both classes.<\/p>\n<blockquote>\n<p>Considering a user preference bias towards the minority (positive) class examples, accuracy is not suitable because the impact of the least represented, but more important examples, is reduced when compared to that of the majority class.<\/p>\n<\/blockquote>\n<p>\u2014 <a href=\"https:\/\/arxiv.org\/abs\/1505.01658\">A Survey of Predictive Modelling under Imbalanced Distributions<\/a>, 2015.<\/p>\n<p>On this problem, a model that predicts the majority class (class 0) for all examples in the test set will have a classification accuracy of 99 percent, mirroring the distribution of major and minor examples expected in the test set on average.<\/p>\n<p>Many machine learning models are designed around the assumption of balanced class distribution, and often learn simple rules (explicit or otherwise) like always predict the majority class, causing them to achieve an accuracy of 99 percent, although in practice performing no better than an unskilled majority class classifier.<\/p>\n<p>A beginner will see the performance of a sophisticated model achieving 99 percent on an imbalanced dataset of this type and believe their work is done, when in fact, they have been misled.<\/p>\n<p>This situation is so common that it has a name, referred to as the \u201c<a href=\"https:\/\/en.wikipedia.org\/wiki\/Accuracy_paradox\">accuracy paradox<\/a>.\u201d<\/p>\n<blockquote>\n<p>\u2026 in the framework of imbalanced data-sets, accuracy is no longer a proper measure, since it does not distinguish between the numbers of correctly classified examples of different classes. Hence, it may lead to erroneous conclusions \u2026<\/p>\n<\/blockquote>\n<p>\u2014 <a href=\"https:\/\/ieeexplore.ieee.org\/document\/5978225\">A Review on Ensembles for the Class Imbalance Problem: Bagging-, Boosting-, and Hybrid-Based Approaches<\/a>, 2011.<\/p>\n<p>Strictly speaking, accuracy does report a correct result; it is only the practitioner\u2019s intuition of high accuracy scores that is the point of failure. Instead of correcting faulty intuitions, it is common to use alternative metrics to summarize model performance for imbalanced classification problems.<\/p>\n<p>Now that we are familiar with the idea that classification can be misleading, let\u2019s look at a worked example.<\/p>\n<h2>Example of Accuracy for Imbalanced Classification<\/h2>\n<p>Although the explanation of why accuracy is a bad idea for imbalanced classification has been given, it is still an abstract idea.<\/p>\n<p>We can make the failure of accuracy concrete with a worked example, and attempt to counter any intuitions for accuracy on balanced class distributions that you may have developed, or more likely dissuade the use of accuracy for imbalanced datasets.<\/p>\n<p>First, we can define a synthetic dataset with a 1:100 class distribution.<\/p>\n<p>The <a href=\"https:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.datasets.make_blobs.html\">make_blobs() scikit-learn<\/a> function will always create synthetic datasets with an equal class distribution.<\/p>\n<p>Nevertheless, we can use this function to create synthetic classification datasets with arbitrary class distributions with a few extra lines of code. A class distribution can be defined as a dictionary where the key is the class value (e.g. 0 or 1) and the value is the number of randomly generated examples to include in the dataset.<\/p>\n<p>The function below, named <em>get_dataset()<\/em>, will take a class distribution and return a synthetic dataset with that class distribution.<\/p>\n<pre class=\"crayon-plain-tag\"># create a dataset with a given class distribution\r\ndef get_dataset(proportions):\r\n\t# determine the number of classes\r\n\tn_classes = len(proportions)\r\n\t# determine the number of examples to generate for each class\r\n\tlargest = max([v for k,v in proportions.items()])\r\n\tn_samples = largest * n_classes\r\n\t# create dataset\r\n\tX, y = make_blobs(n_samples=n_samples, centers=n_classes, n_features=2, random_state=1, cluster_std=3)\r\n\t# collect the examples\r\n\tX_list, y_list = list(), list()\r\n\tfor k,v in proportions.items():\r\n\t\trow_ix = where(y == k)[0]\r\n\t\tselected = row_ix[:v]\r\n\t\tX_list.append(X[selected, :])\r\n\t\ty_list.append(y[selected])\r\n\treturn vstack(X_list), hstack(y_list)<\/pre>\n<p>The function can take any number of classes, although we will use it for simple binary classification problems.<\/p>\n<p>Next, we can take the code from the previous section for creating a scatter plot for a created dataset and place it in a helper function. Below is the <em>plot_dataset()<\/em> function that will plot the dataset and show a legend to indicate the mapping of colors to class labels.<\/p>\n<pre class=\"crayon-plain-tag\"># scatter plot of dataset, different color for each class\r\ndef plot_dataset(X, y):\r\n\t# create scatter plot for samples from each class\r\n\tn_classes = len(unique(y))\r\n\tfor class_value in range(n_classes):\r\n\t\t# get row indexes for samples with this class\r\n\t\trow_ix = where(y == class_value)[0]\r\n\t\t# create scatter of these samples\r\n\t\tpyplot.scatter(X[row_ix, 0], X[row_ix, 1], label=str(class_value))\r\n\t# show a legend\r\n\tpyplot.legend()\r\n\t# show the plot\r\n\tpyplot.show()<\/pre>\n<p>Finally, we can test these new functions.<\/p>\n<p>We will define a dataset with 1:100 ratio, with 1,000 examples for the minority class and 10,000 examples for the majority class, and plot the result.<\/p>\n<p>The complete example is listed below.<\/p>\n<pre class=\"crayon-plain-tag\"># define an imbalanced dataset with a 1:100 class ratio\r\nfrom numpy import unique\r\nfrom numpy import hstack\r\nfrom numpy import vstack\r\nfrom numpy import where\r\nfrom matplotlib import pyplot\r\nfrom sklearn.datasets.samples_generator import make_blobs\r\n\r\n# create a dataset with a given class distribution\r\ndef get_dataset(proportions):\r\n\t# determine the number of classes\r\n\tn_classes = len(proportions)\r\n\t# determine the number of examples to generate for each class\r\n\tlargest = max([v for k,v in proportions.items()])\r\n\tn_samples = largest * n_classes\r\n\t# create dataset\r\n\tX, y = make_blobs(n_samples=n_samples, centers=n_classes, n_features=2, random_state=1, cluster_std=3)\r\n\t# collect the examples\r\n\tX_list, y_list = list(), list()\r\n\tfor k,v in proportions.items():\r\n\t\trow_ix = where(y == k)[0]\r\n\t\tselected = row_ix[:v]\r\n\t\tX_list.append(X[selected, :])\r\n\t\ty_list.append(y[selected])\r\n\treturn vstack(X_list), hstack(y_list)\r\n\r\n# scatter plot of dataset, different color for each class\r\ndef plot_dataset(X, y):\r\n\t# create scatter plot for samples from each class\r\n\tn_classes = len(unique(y))\r\n\tfor class_value in range(n_classes):\r\n\t\t# get row indexes for samples with this class\r\n\t\trow_ix = where(y == class_value)[0]\r\n\t\t# create scatter of these samples\r\n\t\tpyplot.scatter(X[row_ix, 0], X[row_ix, 1], label=str(class_value))\r\n\t# show a legend\r\n\tpyplot.legend()\r\n\t# show the plot\r\n\tpyplot.show()\r\n\r\n# define the class distribution 1:100\r\nproportions = {0:10000, 1:1000}\r\n# generate dataset\r\nX, y = get_dataset(proportions)\r\n# summarize class distribution:\r\nmajor = (len(where(y == 0)[0]) \/ len(X)) * 100\r\nminor = (len(where(y == 1)[0]) \/ len(X)) * 100\r\nprint('Class 0: %.3f%%, Class 1: %.3f%%' % (major, minor))\r\n# plot dataset\r\nplot_dataset(X, y)<\/pre>\n<p>Running the example first creates the dataset and prints the class distribution.<\/p>\n<p>We can see that a little over 90 percent of the examples in the dataset belong to the majority class, and a little less than 1 percent belong to the minority class.<\/p>\n<pre class=\"crayon-plain-tag\">Class 0: 99.010%, Class 1: 0.990%<\/pre>\n<p>A plot of the dataset is created and we can see that there are many more examples for each class and a helpful legend to indicate the mapping of plot colors to class labels.<\/p>\n<div id=\"attachment_9325\" style=\"width: 1290px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-9325\" class=\"size-full wp-image-9325\" src=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2019\/10\/Scatter-Plot-of-Binary-Classification-Dataset-With-1-to-100-Class-Distribution.png\" alt=\"Scatter Plot of Binary Classification Dataset With 1 to 100 Class Distribution\" width=\"1280\" height=\"960\" srcset=\"http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2019\/10\/Scatter-Plot-of-Binary-Classification-Dataset-With-1-to-100-Class-Distribution.png 1280w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2019\/10\/Scatter-Plot-of-Binary-Classification-Dataset-With-1-to-100-Class-Distribution-300x225.png 300w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2019\/10\/Scatter-Plot-of-Binary-Classification-Dataset-With-1-to-100-Class-Distribution-768x576.png 768w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2019\/10\/Scatter-Plot-of-Binary-Classification-Dataset-With-1-to-100-Class-Distribution-1024x768.png 1024w\" sizes=\"(max-width: 1280px) 100vw, 1280px\"><\/p>\n<p id=\"caption-attachment-9325\" class=\"wp-caption-text\">Scatter Plot of Binary Classification Dataset With 1 to 100 Class Distribution<\/p>\n<\/div>\n<p>Next, we can fit a naive classifier model that always predicts the majority class.<\/p>\n<p>We can achieve this using the <a href=\"https:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.dummy.DummyClassifier.html\">DummyClassifier<\/a> from scikit-learn and use the \u2018<em>most_frequent<\/em>\u2018 strategy that will always predict the class label that is most observed in the training dataset.<\/p>\n<pre class=\"crayon-plain-tag\">...\r\n# define model\r\nmodel = DummyClassifier(strategy='most_frequent')<\/pre>\n<p>We can then evaluate this model on the training dataset using repeated k-fold cross-validation. It is important that we use stratified cross-validation to ensure that each split of the dataset has the same class distribution as the training dataset. This can be achieved using the <a href=\"https:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.model_selection.RepeatedStratifiedKFold.html\">RepeatedStratifiedKFold class<\/a>.<\/p>\n<p>The <em>evaluate_model()<\/em> function below implements this and returns a list of scores for each evaluation of the model.<\/p>\n<pre class=\"crayon-plain-tag\"># evaluate a model using repeated k-fold cross-validation\r\ndef evaluate_model(X, y, metric):\r\n\t# define model\r\n\tmodel = DummyClassifier(strategy='most_frequent')\r\n\t# evaluate a model with repeated stratified k fold cv\r\n\tcv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1)\r\n\tscores = cross_val_score(model, X, y, scoring=metric, cv=cv, n_jobs=-1)\r\n\treturn scores<\/pre>\n<p>We can then evaluate the model and calculate the mean of the scores across each evaluation.<\/p>\n<p>We would expect that the naive classifier would achieve a classification accuracy of about 99 percent, which we know because that is the distribution of the majority class in the training dataset.<\/p>\n<pre class=\"crayon-plain-tag\">...\r\n# evaluate model\r\nscores = evaluate_model(X, y, 'accuracy')\r\n# report score\r\nprint('Accuracy: %.3f%%' % (mean(scores) * 100))<\/pre>\n<p>Tying this all together, the complete example of evaluating a naive classifier on the synthetic dataset with a 1:100 class distribution is listed below.<\/p>\n<pre class=\"crayon-plain-tag\"># evaluate a majority class classifier on an 1:100 imbalanced dataset\r\nfrom numpy import mean\r\nfrom numpy import hstack\r\nfrom numpy import vstack\r\nfrom numpy import where\r\nfrom sklearn.datasets.samples_generator import make_blobs\r\nfrom sklearn.dummy import DummyClassifier\r\nfrom sklearn.model_selection import cross_val_score\r\nfrom sklearn.model_selection import RepeatedStratifiedKFold\r\n\r\n# create a dataset with a given class distribution\r\ndef get_dataset(proportions):\r\n\t# determine the number of classes\r\n\tn_classes = len(proportions)\r\n\t# determine the number of examples to generate for each class\r\n\tlargest = max([v for k,v in proportions.items()])\r\n\tn_samples = largest * n_classes\r\n\t# create dataset\r\n\tX, y = make_blobs(n_samples=n_samples, centers=n_classes, n_features=2, random_state=1, cluster_std=3)\r\n\t# collect the examples\r\n\tX_list, y_list = list(), list()\r\n\tfor k,v in proportions.items():\r\n\t\trow_ix = where(y == k)[0]\r\n\t\tselected = row_ix[:v]\r\n\t\tX_list.append(X[selected, :])\r\n\t\ty_list.append(y[selected])\r\n\treturn vstack(X_list), hstack(y_list)\r\n\r\n# evaluate a model using repeated k-fold cross-validation\r\ndef evaluate_model(X, y, metric):\r\n\t# define model\r\n\tmodel = DummyClassifier(strategy='most_frequent')\r\n\t# evaluate a model with repeated stratified k fold cv\r\n\tcv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1)\r\n\tscores = cross_val_score(model, X, y, scoring=metric, cv=cv, n_jobs=-1)\r\n\treturn scores\r\n\r\n# define the class distribution 1:100\r\nproportions = {0:10000, 1:100}\r\n# generate dataset\r\nX, y = get_dataset(proportions)\r\n# summarize class distribution:\r\nmajor = (len(where(y == 0)[0]) \/ len(X)) * 100\r\nminor = (len(where(y == 1)[0]) \/ len(X)) * 100\r\nprint('Class 0: %.3f%%, Class 1: %.3f%%' % (major, minor))\r\n# evaluate model\r\nscores = evaluate_model(X, y, 'accuracy')\r\n# report score\r\nprint('Accuracy: %.3f%%' % (mean(scores) * 100))<\/pre>\n<p>Running the example first reports the class distribution of the training dataset again.<\/p>\n<p>Then the model is evaluated and the mean accuracy is reported. We can see that as expected, the performance of the naive classifier matches the class distribution exactly.<\/p>\n<p>Normally, achieving 99 percent classification accuracy would be cause for celebration. Although, as we have seen, because the class distribution is evenly imbalanced, 99 percent is actually the lowest acceptable accuracy for this dataset and the starting point from which more sophisticated models must improve.<\/p>\n<pre class=\"crayon-plain-tag\">Class 0: 99.010%, Class 1: 0.990%\r\nAccuracy: 99.010%<\/pre>\n<\/p>\n<h2>Further Reading<\/h2>\n<p>This section provides more resources on the topic if you are looking to go deeper.<\/p>\n<h3>Tutorials<\/h3>\n<ul>\n<li><a href=\"https:\/\/machinelearningmastery.com\/confusion-matrix-machine-learning\/\">What is a Confusion Matrix in Machine Learning<\/a><\/li>\n<\/ul>\n<h3>Papers<\/h3>\n<ul>\n<li><a href=\"https:\/\/arxiv.org\/abs\/1505.01658\">A Survey of Predictive Modelling under Imbalanced Distributions<\/a>, 2015.<\/li>\n<li><a href=\"https:\/\/ieeexplore.ieee.org\/document\/5978225\">A Review on Ensembles for the Class Imbalance Problem: Bagging-, Boosting-, and Hybrid-Based Approaches<\/a>, 2011.<\/li>\n<\/ul>\n<h3>Books<\/h3>\n<ul>\n<li><a href=\"https:\/\/amzn.to\/32K9K6d\">Imbalanced Learning: Foundations, Algorithms, and Applications<\/a>, 2013.<\/li>\n<li><a href=\"https:\/\/amzn.to\/307Xlva\">Learning from Imbalanced Data Sets<\/a>, 2018.<\/li>\n<\/ul>\n<h3>APIs<\/h3>\n<ul>\n<li><a href=\"https:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.datasets.make_blobs.html\">sklearn.datasets.make_blobs API<\/a>.<\/li>\n<li><a href=\"https:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.dummy.DummyClassifier.html\">sklearn.dummy.DummyClassifier API<\/a>.<\/li>\n<li><a href=\"https:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.model_selection.RepeatedStratifiedKFold.html\">sklearn.model_selection.RepeatedStratifiedKFold API<\/a>.<\/li>\n<\/ul>\n<h3>Articles<\/h3>\n<ul>\n<li><a href=\"https:\/\/en.wikipedia.org\/wiki\/Accuracy_and_precision\">Accuracy and precision, Wikipedia<\/a>.<\/li>\n<li><a href=\"https:\/\/en.wikipedia.org\/wiki\/Accuracy_paradox\">Accuracy paradox, Wikipedia<\/a>.<\/li>\n<\/ul>\n<h2>Summary<\/h2>\n<p>In this tutorial, you discovered the failure of classification accuracy for imbalanced classification problems.<\/p>\n<p>Specifically, you learned:<\/p>\n<ul>\n<li>Accuracy and error rate are the de facto standard metrics for summarizing the performance of classification models.<\/li>\n<li>Classification accuracy fails on classification problems with a skewed class distribution because of the intuitions developed by practitioners on datasets with an equal class distribution.<\/li>\n<li>Intuition for the failure of accuracy for skewed class distributions with a worked example.<\/li>\n<\/ul>\n<p>Do you have any questions?<br \/>\nAsk your questions in the comments below and I will do my best to answer.<\/p>\n<p>The post <a rel=\"nofollow\" href=\"https:\/\/machinelearningmastery.com\/failure-of-accuracy-for-imbalanced-class-distributions\/\">Failure of Classification Accuracy for Imbalanced Class Distributions<\/a> appeared first on <a rel=\"nofollow\" href=\"https:\/\/machinelearningmastery.com\/\">Machine Learning Mastery<\/a>.<\/p>\n<\/div>\n<p><a href=\"https:\/\/machinelearningmastery.com\/failure-of-accuracy-for-imbalanced-class-distributions\/\">Go to Source<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Author: Jason Brownlee Classification accuracy is a metric that summarizes the performance of a classification model as the number of correct predictions divided by the [&hellip;] <span class=\"read-more-link\"><a class=\"read-more\" href=\"https:\/\/www.aiproblog.com\/index.php\/2019\/12\/31\/failure-of-classification-accuracy-for-imbalanced-class-distributions\/\">Read More<\/a><\/span><\/p>\n","protected":false},"author":1,"featured_media":2986,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_bbp_topic_count":0,"_bbp_reply_count":0,"_bbp_total_topic_count":0,"_bbp_total_reply_count":0,"_bbp_voice_count":0,"_bbp_anonymous_reply_count":0,"_bbp_topic_count_hidden":0,"_bbp_reply_count_hidden":0,"_bbp_forum_subforum_count":0,"footnotes":""},"categories":[24],"tags":[],"_links":{"self":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/2985"}],"collection":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/comments?post=2985"}],"version-history":[{"count":0,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/2985\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media\/2986"}],"wp:attachment":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media?parent=2985"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/categories?post=2985"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/tags?post=2985"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}