{"id":3166,"date":"2020-02-23T18:00:21","date_gmt":"2020-02-23T18:00:21","guid":{"rendered":"https:\/\/www.aiproblog.com\/index.php\/2020\/02\/23\/a-gentle-introduction-to-the-fbeta-measure-for-machine-learning\/"},"modified":"2020-02-23T18:00:21","modified_gmt":"2020-02-23T18:00:21","slug":"a-gentle-introduction-to-the-fbeta-measure-for-machine-learning","status":"publish","type":"post","link":"https:\/\/www.aiproblog.com\/index.php\/2020\/02\/23\/a-gentle-introduction-to-the-fbeta-measure-for-machine-learning\/","title":{"rendered":"A Gentle Introduction to the Fbeta-Measure for Machine Learning"},"content":{"rendered":"<p>Author: Jason Brownlee<\/p>\n<div>\n<p>Fbeta-measure is a configurable single-score metric for evaluating a binary classification model based on the predictions made for the positive class.<\/p>\n<p>The Fbeta-measure is calculated using precision and recall.<\/p>\n<p><strong>Precision<\/strong> is a metric that calculates the percentage of correct predictions for the positive class. <strong>Recall<\/strong> calculates the percentage of correct predictions for the positive class out of all positive predictions that could be made. Maximizing precision will minimize the false-positive errors, whereas maximizing recall will minimize the false-negative errors.<\/p>\n<p>The <strong>F-measure<\/strong> is calculated as the harmonic mean of precision and recall, giving each the same weighting. It allows a model to be evaluated taking both the precision and recall into account using a single score, which is helpful when describing the performance of the model and in comparing models.<\/p>\n<p>The <strong>Fbeta-measure<\/strong> is a generalization of the F-measure that adds a configuration parameter called beta. A default beta value is 1.0, which is the same as the F-measure. A smaller beta value, such as 0.5, gives more weight to precision and less to recall, whereas a larger beta value, such as 2.0, gives less weight to precision and more weight to recall in the calculation of the score.<\/p>\n<p>It is a useful metric to use when both precision and recall are important but slightly more attention is needed on one or the other, such as when false negatives are more important than false positives, or the reverse.<\/p>\n<p>In this tutorial, you will discover the Fbeta-measure for evaluating classification algorithms for machine learning.<\/p>\n<p>After completing this tutorial, you will know:<\/p>\n<ul>\n<li>Precision and recall provide two ways to summarize the errors made for the positive class in a binary classification problem.<\/li>\n<li>F-measure provides a single score that summarizes the precision and recall.<\/li>\n<li>Fbeta-measure provides a configurable version of the F-measure to give more or less attention to the precision and recall measure when calculating a single score.<\/li>\n<\/ul>\n<p>Discover SMOTE, one-class classification, cost-sensitive learning, threshold moving, and much more <a href=\"https:\/\/machinelearningmastery.com\/imbalanced-classification-with-python\/\">in my new book<\/a>, with 30 step-by-step tutorials and full Python source code.<\/p>\n<p>Let&rsquo;s get started.<\/p>\n<div id=\"attachment_9653\" style=\"width: 809px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-9653\" class=\"size-full wp-image-9653\" src=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2020\/02\/A-Gentle-Introduction-to-the-Fbeta-Measure-for-Machine-Learning.jpg\" alt=\"A Gentle Introduction to the Fbeta-Measure for Machine Learning\" width=\"799\" height=\"533\" srcset=\"http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2020\/02\/A-Gentle-Introduction-to-the-Fbeta-Measure-for-Machine-Learning.jpg 799w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2020\/02\/A-Gentle-Introduction-to-the-Fbeta-Measure-for-Machine-Learning-300x200.jpg 300w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2020\/02\/A-Gentle-Introduction-to-the-Fbeta-Measure-for-Machine-Learning-768x512.jpg 768w\" sizes=\"(max-width: 799px) 100vw, 799px\"><\/p>\n<p id=\"caption-attachment-9653\" class=\"wp-caption-text\">A Gentle Introduction to the Fbeta-Measure for Machine Learning<br \/>Photo by <a href=\"https:\/\/flickr.com\/photos\/30478819@N08\/34564940376\/\">Marco Verch<\/a>, some rights reserved.<\/p>\n<\/div>\n<h2>Tutorial Overview<\/h2>\n<p>This tutorial is divided into three parts; they are:<\/p>\n<ol>\n<li>Precision and Recall\n<ol>\n<li>Confusion Matrix<\/li>\n<li>Precision<\/li>\n<li>Recall<\/li>\n<\/ol>\n<\/li>\n<li>F-Measure\n<ol>\n<li>Worst Case<\/li>\n<li>Best Case<\/li>\n<li>50% Precision, Perfect Recall<\/li>\n<\/ol>\n<\/li>\n<li>Fbeta-Measure\n<ol>\n<li>F1-Measure<\/li>\n<li>F0.5 Measure<\/li>\n<li>F2 Measure<\/li>\n<\/ol>\n<\/li>\n<\/ol>\n<h2>Precision and Recall<\/h2>\n<p>Before we can dive into the Fbeta-measure, we must review the basics of the precision and recall metrics used to evaluate the predictions made by a classification model.<\/p>\n<h3>Confusion Matrix<\/h3>\n<p>A <a href=\"https:\/\/machinelearningmastery.com\/ufaqs\/what-is-a-confusion-matrix\/\">confusion matrix<\/a> summarizes the number of predictions made by a model for each class, and the classes to which those predictions actually belong. It helps to understand the types of prediction errors made by a model.<\/p>\n<p>The simplest confusion matrix is for a two-class classification problem, with negative (class 0) and positive (class 1) classes.<\/p>\n<p>In this type of confusion matrix, each cell in the table has a specific and well-understood name, summarized as follows:<\/p>\n<pre class=\"crayon-plain-tag\">| Positive Prediction | Negative Prediction\r\nPositive Class | True Positive (TP)  | False Negative (FN)\r\nNegative Class | False Positive (FP) | True Negative (TN)<\/pre>\n<p>The precision and recall metrics are defined in terms of the cells in the confusion matrix, specifically terms like true positives and false negatives.<\/p>\n<h3>Precision<\/h3>\n<p>Precision is a metric that quantifies the number of correct positive predictions made.<\/p>\n<p>It is calculated as the ratio of correctly predicted positive examples divided by the total number of positive examples that were predicted.<\/p>\n<ul>\n<li>Precision = TruePositives \/ (TruePositives + FalsePositives)<\/li>\n<\/ul>\n<p>The result is a value between 0.0 for no precision and 1.0 for full or perfect precision.<\/p>\n<p>The intuition for precision is that it is not concerned with false negatives and it <strong>minimizes false positives<\/strong>. We can demonstrate this with a small example below.<\/p>\n<pre class=\"crayon-plain-tag\"># intuition for precision\r\nfrom sklearn.metrics import precision_score\r\n# no precision\r\ny_true = [0, 0, 0, 0, 0, 1, 1, 1, 1, 1]\r\ny_pred = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]\r\nscore = precision_score(y_true, y_pred)\r\nprint('No Precision: %.3f' % score)\r\n# some false positives\r\ny_true = [0, 0, 0, 0, 0, 1, 1, 1, 1, 1]\r\ny_pred = [0, 0, 0, 1, 1, 1, 1, 1, 1, 1]\r\nscore = precision_score(y_true, y_pred)\r\nprint('Some False Positives: %.3f' % score)\r\n# some false negatives\r\ny_true = [0, 0, 0, 0, 0, 1, 1, 1, 1, 1]\r\ny_pred = [0, 0, 0, 0, 0, 0, 0, 1, 1, 1]\r\nscore = precision_score(y_true, y_pred)\r\nprint('Some False Negatives: %.3f' % score)\r\n# perfect precision\r\ny_true = [0, 0, 0, 0, 0, 1, 1, 1, 1, 1]\r\ny_pred = [0, 0, 0, 0, 0, 1, 1, 1, 1, 1]\r\nscore = precision_score(y_true, y_pred)\r\nprint('Perfect Precision: %.3f' % score)<\/pre>\n<p>Running the example demonstrates calculating the precision for all incorrect and all correct predicted class labels, which shows no precision and perfect precision respectively.<\/p>\n<p>An example of predicting some false positives shows a drop in precision, highlighting that the measure is concerned with minimizing false positives.<\/p>\n<p>An example of predicting some false negatives shows perfect precision, highlighting that the measure is not concerned with false negatives.<\/p>\n<pre class=\"crayon-plain-tag\">No Precision: 0.000\r\nSome False Positives: 0.714\r\nSome False Negatives: 1.000\r\nPerfect Precision: 1.000<\/pre>\n<\/p>\n<h3>Recall<\/h3>\n<p>Recall is a metric that quantifies the number of correct positive predictions made out of all positive predictions that could have been made.<\/p>\n<p>It is calculated as the ratio of correctly predicted positive examples divided by the total number of positive examples that could be predicted.<\/p>\n<ul>\n<li>Recall = TruePositives \/ (TruePositives + FalseNegatives)<\/li>\n<\/ul>\n<p>The result is a value between 0.0 for no recall and 1.0 for full or perfect recall.<\/p>\n<p>The intuition for recall is that it is not concerned with false positives and it <strong>minimizes false negatives<\/strong>. We can demonstrate this with a small example below.<\/p>\n<pre class=\"crayon-plain-tag\"># intuition for recall\r\nfrom sklearn.metrics import recall_score\r\n# no recall\r\ny_true = [0, 0, 0, 0, 0, 1, 1, 1, 1, 1]\r\ny_pred = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]\r\nscore = recall_score(y_true, y_pred)\r\nprint('No Recall: %.3f' % score)\r\n# some false positives\r\ny_true = [0, 0, 0, 0, 0, 1, 1, 1, 1, 1]\r\ny_pred = [0, 0, 0, 1, 1, 1, 1, 1, 1, 1]\r\nscore = recall_score(y_true, y_pred)\r\nprint('Some False Positives: %.3f' % score)\r\n# some false negatives\r\ny_true = [0, 0, 0, 0, 0, 1, 1, 1, 1, 1]\r\ny_pred = [0, 0, 0, 0, 0, 0, 0, 1, 1, 1]\r\nscore = recall_score(y_true, y_pred)\r\nprint('Some False Negatives: %.3f' % score)\r\n# perfect recall\r\ny_true = [0, 0, 0, 0, 0, 1, 1, 1, 1, 1]\r\ny_pred = [0, 0, 0, 0, 0, 1, 1, 1, 1, 1]\r\nscore = recall_score(y_true, y_pred)\r\nprint('Perfect Recall: %.3f' % score)<\/pre>\n<p>Running the example demonstrates calculating the recall for all incorrect and all correct predicted class labels, which shows no recall and perfect recall respectively.<\/p>\n<p>An example of predicting some false positives shows perfect recall, highlighting that the measure is not concerned with false positives.<\/p>\n<p>An example of predicting&nbsp;some false negatives shows a drop in recall, highlighting that the measure is concerned with minimizing false negatives.<\/p>\n<pre class=\"crayon-plain-tag\">No Recall: 0.000\r\nSome False Positives: 1.000\r\nSome False Negatives: 0.600\r\nPerfect Recall: 1.000<\/pre>\n<p>Now that we are familiar with precision and recall, let&rsquo;s review the F-measure.<\/p>\n<\/p>\n<div class=\"woo-sc-hr\"><\/div>\n<p><center><\/p>\n<h3>Want to Get Started With Imbalance Classification?<\/h3>\n<p>Take my free 7-day email crash course now (with sample code).<\/p>\n<p>Click to sign-up and also get a free PDF Ebook version of the course.<\/p>\n<p><a href=\"https:\/\/machinelearningmastery.lpages.co\/leadbox\/14de34d42172a2%3A164f8be4f346dc\/4529268551712768\/\" target=\"_blank\" style=\"background: rgb(255, 206, 10); color: rgb(255, 255, 255); text-decoration: none; font-family: Helvetica, Arial, sans-serif; font-weight: bold; font-size: 16px; line-height: 20px; padding: 10px; display: inline-block; max-width: 300px; border-radius: 5px; text-shadow: rgba(0, 0, 0, 0.25) 0px -1px 1px; box-shadow: rgba(255, 255, 255, 0.5) 0px 1px 3px inset, rgba(0, 0, 0, 0.5) 0px 1px 3px;\" rel=\"noopener noreferrer\">Download Your FREE Mini-Course<\/a><script data-leadbox=\"14de34d42172a2:164f8be4f346dc\" data-url=\"https:\/\/machinelearningmastery.lpages.co\/leadbox\/14de34d42172a2%3A164f8be4f346dc\/4529268551712768\/\" data-config=\"%7B%7D\" type=\"text\/javascript\" src=\"https:\/\/machinelearningmastery.lpages.co\/leadbox-1576257931.js\"><\/script><\/p>\n<p><\/center><\/p>\n<div class=\"woo-sc-hr\"><\/div>\n<h2>F-Measure<\/h2>\n<p>Precision and recall measure the two types of errors that could be made for the positive class.<\/p>\n<p>Maximizing precision minimizes false positives and maximizing recall minimizes false negatives.<\/p>\n<p>F-Measure or F-Score provides a way to combine both precision and recall into a single measure that captures both properties.<\/p>\n<ul>\n<li>F-Measure = (2 * Precision * Recall) \/ (Precision + Recall)<\/li>\n<\/ul>\n<p>This is the <a href=\"https:\/\/en.wikipedia.org\/wiki\/Harmonic_mean\">harmonic mean<\/a> of the two fractions.<\/p>\n<p>The result is a value between 0.0 for the worst F-measure and 1.0 for a perfect F-measure.<\/p>\n<p>The intuition for F-measure is that both measures are balanced in importance and that only a good precision and good recall together result in a good F-measure.<\/p>\n<h3>Worst Case<\/h3>\n<p>First, if all examples are perfectly predicted incorrectly, we will have zero precision and zero recall, resulting in a zero F-measure; for example:<\/p>\n<pre class=\"crayon-plain-tag\"># worst case f-measure\r\nfrom sklearn.metrics import f1_score\r\nfrom sklearn.metrics import precision_score\r\nfrom sklearn.metrics import recall_score\r\n# no precision or recall\r\ny_true = [0, 0, 0, 0, 0, 1, 1, 1, 1, 1]\r\ny_pred = [1, 1, 1, 1, 1, 0, 0, 0, 0, 0]\r\np = precision_score(y_true, y_pred)\r\nr = recall_score(y_true, y_pred)\r\nf = f1_score(y_true, y_pred)\r\nprint('No Precision or Recall: p=%.3f, r=%.3f, f=%.3f' % (p, r, f))<\/pre>\n<p>Running the example, we can see that no precision or recall results in a worst-case F-measure.<\/p>\n<pre class=\"crayon-plain-tag\">No Precision or Recall: p=0.000, r=0.000, f=0.000<\/pre>\n<p>Given that precision and recall are only concerned with the positive class, we can achieve the same worst-case precision, recall, and F-measure by predicting the negative class for all examples:<\/p>\n<pre class=\"crayon-plain-tag\"># another worst case f-measure\r\nfrom sklearn.metrics import f1_score\r\nfrom sklearn.metrics import precision_score\r\nfrom sklearn.metrics import recall_score\r\n# no precision and recall\r\ny_true = [0, 0, 0, 0, 0, 1, 1, 1, 1, 1]\r\ny_pred = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]\r\np = precision_score(y_true, y_pred)\r\nr = recall_score(y_true, y_pred)\r\nf = f1_score(y_true, y_pred)\r\nprint('No Precision or Recall: p=%.3f, r=%.3f, f=%.3f' % (p, r, f))<\/pre>\n<p>Given that no positive cases were predicted, we must output a zero precision and recall and, in turn, F-measure.<\/p>\n<pre class=\"crayon-plain-tag\">No Precision or Recall: p=0.000, r=0.000, f=0.000<\/pre>\n<\/p>\n<h3>Best Case<\/h3>\n<p>Conversely, perfect predictions will result in a perfect precision and recall and, in turn, a perfect F-measure, for example:<\/p>\n<pre class=\"crayon-plain-tag\"># best case f-measure\r\nfrom sklearn.metrics import f1_score\r\nfrom sklearn.metrics import precision_score\r\nfrom sklearn.metrics import recall_score\r\n# perfect precision and recall\r\ny_true = [0, 0, 0, 0, 0, 1, 1, 1, 1, 1]\r\ny_pred = [0, 0, 0, 0, 0, 1, 1, 1, 1, 1]\r\np = precision_score(y_true, y_pred)\r\nr = recall_score(y_true, y_pred)\r\nf = f1_score(y_true, y_pred)\r\nprint('Perfect Precision and Recall: p=%.3f, r=%.3f, f=%.3f' % (p, r, f))<\/pre>\n<p>Running the example, we can see that perfect precision and recall results in a perfect F-measure.<\/p>\n<pre class=\"crayon-plain-tag\">Perfect Precision and Recall: p=1.000, r=1.000, f=1.000<\/pre>\n<\/p>\n<h3>50% Precision, Perfect Recall<\/h3>\n<p>It is not possible to have perfect precision and no recall, or no precision and perfect recall. Both precision and recall require true positives to be predicted.<\/p>\n<p>Consider the case where we predict the positive class for all cases.<\/p>\n<p>This would give us 50 percent precision as half of the predictions are false positives. It would give us perfect recall because we would no false negatives.<\/p>\n<p>For the balanced dataset we are using in our examples, half of the predictions would be true positives, half would be false positives; therefore, the precision ratio would be 0.5 or 50 percent. Combining 50 percept precision with perfect recall will result in a penalized F-measure, specifically the harmonic mean between 50 percent and 100 percent.<\/p>\n<p>The example below demonstrates this.<\/p>\n<pre class=\"crayon-plain-tag\"># perfect precision f-measure\r\nfrom sklearn.metrics import f1_score\r\nfrom sklearn.metrics import precision_score\r\nfrom sklearn.metrics import recall_score\r\n# perfect precision, 50% recall\r\ny_true = [0, 0, 0, 0, 0, 1, 1, 1, 1, 1]\r\ny_pred = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1]\r\np = precision_score(y_true, y_pred)\r\nr = recall_score(y_true, y_pred)\r\nf = f1_score(y_true, y_pred)\r\nprint('Result: p=%.3f, r=%.3f, f=%.3f' % (p, r, f))<\/pre>\n<p>Running the example confirms that we indeed have 50 percept precision and perfect recall, and that the F-score results in a value of about 0.667.<\/p>\n<pre class=\"crayon-plain-tag\">Result: p=0.500, r=1.000, f=0.667<\/pre>\n<\/p>\n<h2>Fbeta-Measure<\/h2>\n<p>The F-measure balances the precision and recall.<\/p>\n<p>On some problems, we might be interested in an F-measure with more attention put on precision, such as when false positives are more important to minimize, but false negatives are still important.<\/p>\n<p>On other problems, we might be interested in an F-measure with more attention put on recall, such as when false negatives are more important to minimize, but false positives are still important.<\/p>\n<p>The solution is the Fbeta-measure.<\/p>\n<p>The Fbeta-measure measure is an abstraction of the F-measure where the balance of precision and recall in the calculation of the <a href=\"https:\/\/en.wikipedia.org\/wiki\/Harmonic_mean\">harmonic mean<\/a> is controlled by a coefficient called <em>beta<\/em>.<\/p>\n<ul>\n<li>Fbeta = ((1 + beta^2) * Precision * Recall) \/ (beta^2 * Precision + Recall)<\/li>\n<\/ul>\n<p>The choice of the beta parameter will be used in the name of the Fbeta-measure.<\/p>\n<p>For example, a beta value of 2 is referred to as F2-measure or F2-score. A beta value of 1 is referred to as the F1-measure or the F1-score.<\/p>\n<p>Three common values for the beta parameter are as follows:<\/p>\n<ul>\n<li><strong>F0.5-Measure<\/strong> (beta=0.5): More weight on precision, less weight on recall.<\/li>\n<li><strong>F1-Measure<\/strong> (beta=1.0): Balance the weight on precision and recall.<\/li>\n<li><strong>F2-Measure<\/strong> (beta=2.0): Less weight on precision, more weight on recall<\/li>\n<\/ul>\n<p>The impact on the calculation for different beta values is not intuitive, at first.<\/p>\n<p>Let&rsquo;s take a closer look at each of these cases.<\/p>\n<h3>F1-Measure<\/h3>\n<p>The F-measure discussed in the previous section is an example of the Fbeta-measure with a <em>beta<\/em> value of 1.<\/p>\n<p>Specifically, F-measure and F1-measure calculate the same thing; for example:<\/p>\n<ul>\n<li>F-Measure = ((1 + 1^2) * Precision * Recall) \/ (1^2 * Precision + Recall)<\/li>\n<li>F-Measure = (2 * Precision * Recall) \/ (Precision + Recall)<\/li>\n<\/ul>\n<p>Consider the case where we have 50 percept precision and perfect recall. We can manually calculate the F1-measure for this case as follows:<\/p>\n<ul>\n<li>F-Measure = (2 * Precision * Recall) \/ (Precision + Recall)<\/li>\n<li>F-Measure = (2 * 0.5 * 1.0) \/ (0.5 + 1.0)<\/li>\n<li>F-Measure = 1.0 \/ 1.5<\/li>\n<li>F-Measure = 0.666<\/li>\n<\/ul>\n<p>We can confirm this calculation using the <a href=\"https:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.metrics.fbeta_score.html\">fbeta_score() function<\/a> in scikit-learn with the &ldquo;<em>beta<\/em>&rdquo; argument set to 1.0.<\/p>\n<p>The complete example is listed below.<\/p>\n<pre class=\"crayon-plain-tag\"># calculate the f1-measure\r\nfrom sklearn.metrics import fbeta_score\r\nfrom sklearn.metrics import precision_score\r\nfrom sklearn.metrics import recall_score\r\n# perfect precision, 50% recall\r\ny_true = [0, 0, 0, 0, 0, 1, 1, 1, 1, 1]\r\ny_pred = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1]\r\np = precision_score(y_true, y_pred)\r\nr = recall_score(y_true, y_pred)\r\nf = fbeta_score(y_true, y_pred, beta=1.0)\r\nprint('Result: p=%.3f, r=%.3f, f=%.3f' % (p, r, f))<\/pre>\n<p>Running the example confirms the perfect precision and 50 percent recall and an F1-measure of 0.667, confirming our calculation (with rounding).<\/p>\n<p>This F1-measure value of 0.667 matches the F-measure calculated for the same scenario in the previous section.<\/p>\n<pre class=\"crayon-plain-tag\">Result: p=0.500, r=1.000, f=0.667<\/pre>\n<\/p>\n<h3>F0.5-Measure<\/h3>\n<p>The F0.5-measure is an example of the Fbeta-measure with a <em>beta<\/em> value of 0.5.<\/p>\n<p>It has the effect of raising the importance of precision and lowering the importance of recall.<\/p>\n<p>If maximizing precision minimizes false positives, and maximizing recall minimizes false negatives, then the <strong>F0.5-measure puts more attention on minimizing false positives<\/strong> than minimizing false negatives.<\/p>\n<p>The F0.5-Measure is calculated as follows:<\/p>\n<ul>\n<li>F0.5-Measure = ((1 + 0.5^2) * Precision * Recall) \/ (0.5^2 * Precision + Recall)<\/li>\n<li>F0.5-Measure = (1.25 * Precision * Recall) \/ (0.25 * Precision + Recall)<\/li>\n<\/ul>\n<p>Consider the case where we have 50 percent precision and perfect recall. We can manually calculate the F0.5-measure for this case as follows:<\/p>\n<ul>\n<li>F0.5-Measure = (1.25 * Precision * Recall) \/ (0.25 * Precision + Recall)<\/li>\n<li>F0.5-Measure = (1.25 * 0.5 * 1.0) \/ (0.25 * 0.5 + 1.0)<\/li>\n<li>F0.5-Measure = 0.625 \/ 1.125<\/li>\n<li>F0.5-Measure = 0.555<\/li>\n<\/ul>\n<p>We would expect that a beta value of 0.5 would result in a lower score for this scenario given that precision has a poor score and the recall is excellent.<\/p>\n<p>This is exactly what we see, where an F0.5-measure of 0.555 is achieved for the same scenario where an F1-score was calculated as 0.667. Precision played more of a role in the calculation.<\/p>\n<p>We can confirm this calculation; the complete example is listed below.<\/p>\n<pre class=\"crayon-plain-tag\"># calculate the f0.5-measure\r\nfrom sklearn.metrics import fbeta_score\r\nfrom sklearn.metrics import f1_score\r\nfrom sklearn.metrics import precision_score\r\nfrom sklearn.metrics import recall_score\r\n# perfect precision, 50% recall\r\ny_true = [0, 0, 0, 0, 0, 1, 1, 1, 1, 1]\r\ny_pred = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1]\r\np = precision_score(y_true, y_pred)\r\nr = recall_score(y_true, y_pred)\r\nf = fbeta_score(y_true, y_pred, beta=0.5)\r\nprint('Result: p=%.3f, r=%.3f, f=%.3f' % (p, r, f))<\/pre>\n<p>Running the example confirms the precision and recall values, then reports an F0.5-measure of 0.556 (with rounding), the same value as we calculated manually.<\/p>\n<pre class=\"crayon-plain-tag\">Result: p=0.500, r=1.000, f=0.556<\/pre>\n<\/p>\n<h3>F2-Measure<\/h3>\n<p>The F2-measure is an example of the Fbeta-measure with a <em>beta<\/em> value of 2.0.<\/p>\n<p>It has the effect of lowering the importance of precision and increase the importance of recall.<\/p>\n<p>If maximizing precision minimizes false positives, and maximizing recall minimizes false negatives, then the <strong>F2-measure puts more attention on minimizing false negatives<\/strong> than minimizing false positives.<\/p>\n<p>The F2-measure is calculated as follows:<\/p>\n<ul>\n<li>F2-Measure = ((1 + 2^2) * Precision * Recall) \/ (2^2 * Precision + Recall)<\/li>\n<li>F2-Measure = (5 * Precision * Recall) \/ (4 * Precision + Recall)<\/li>\n<\/ul>\n<p>Consider the case where we have 50 percent precision and perfect recall.<\/p>\n<p>We can manually calculate the F2-measure for this case as follows:<\/p>\n<ul>\n<li>F2-Measure = (5 * Precision * Recall) \/ (4 * Precision + Recall)<\/li>\n<li>F2-Measure = (5 * 0.5 * 1.0) \/ (4 * 0.5 + 1.0)<\/li>\n<li>F2-Measure = 2.5 \/ 3.0<\/li>\n<li>F2-Measure = 0.833<\/li>\n<\/ul>\n<p>We would expect that a <em>beta<\/em> value of 2.0 would result in a higher score for this scenario given that recall has a perfect score, which will be promoted over that of the poor performance of precision.<\/p>\n<p>This is exactly what we see where an F2-measure of 0.833 is achieved for the same scenario where an F1-score was calculated as 0.667. Recall played more of a role in the calculation.<\/p>\n<p>We can confirm this calculation; the complete example is listed below.<\/p>\n<pre class=\"crayon-plain-tag\"># calculate the f2-measure\r\nfrom sklearn.metrics import fbeta_score\r\nfrom sklearn.metrics import f1_score\r\nfrom sklearn.metrics import precision_score\r\nfrom sklearn.metrics import recall_score\r\n# perfect precision, 50% recall\r\ny_true = [0, 0, 0, 0, 0, 1, 1, 1, 1, 1]\r\ny_pred = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1]\r\np = precision_score(y_true, y_pred)\r\nr = recall_score(y_true, y_pred)\r\nf = fbeta_score(y_true, y_pred, beta=2.0)\r\nprint('Result: p=%.3f, r=%.3f, f=%.3f' % (p, r, f))<\/pre>\n<p>Running the example confirms the precision and recall values, then reports an F2-measure of 0.883, the same value as we calculated manually (with rounding).<\/p>\n<pre class=\"crayon-plain-tag\">Result: p=0.500, r=1.000, f=0.833<\/pre>\n<\/p>\n<h2>Further Reading<\/h2>\n<p>This section provides more resources on the topic if you are looking to go deeper.<\/p>\n<h3>Tutorials<\/h3>\n<ul>\n<li><a href=\"https:\/\/machinelearningmastery.com\/tour-of-evaluation-metrics-for-imbalanced-classification\/\">Tour of Evaluation Metrics for Imbalanced Classification<\/a><\/li>\n<li><a href=\"https:\/\/machinelearningmastery.com\/precision-recall-and-f-measure-for-imbalanced-classification\/\">How to Calculate Precision, Recall, and F-Measure for Imbalanced Classification<\/a><\/li>\n<\/ul>\n<h3>Papers<\/h3>\n<ul>\n<li><a href=\"https:\/\/www.toyota-ti.ac.jp\/Lab\/Denshi\/COIN\/people\/yutaka.sasaki\/F-measure-YS-26Oct07.pdf\">The truth of the F-measure<\/a>, 2007.<\/li>\n<\/ul>\n<h3>APIs<\/h3>\n<ul>\n<li><a href=\"https:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.metrics.f1_score.html\">sklearn.metrics.f1_score API<\/a>.<\/li>\n<li><a href=\"https:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.metrics.fbeta_score.html\">sklearn.metrics.fbeta_score API<\/a>.<\/li>\n<\/ul>\n<h3>Articles<\/h3>\n<ul>\n<li><a href=\"https:\/\/en.wikipedia.org\/wiki\/F1_score\">F1 score, Wikipedia<\/a>.<\/li>\n<li><a href=\"https:\/\/en.wikipedia.org\/wiki\/Harmonic_mean\">Harmonic mean, Wikipedia<\/a>.<\/li>\n<\/ul>\n<h2>Summary<\/h2>\n<p>In this tutorial, you discovered the Fbeta-measure for evaluating classification algorithms for machine learning.<\/p>\n<p>Specifically, you learned:<\/p>\n<ul>\n<li>Precision and recall provide two ways to summarize the errors made for the positive class in a binary classification problem.<\/li>\n<li>F-measure provides a single score that summarizes the precision and recall.<\/li>\n<li>Fbeta-measure provides a configurable version of the F-measure to give more or less attention to the precision and recall measure when calculating a single score.<\/li>\n<\/ul>\n<p>Do you have any questions?<br \/>\nAsk your questions in the comments below and I will do my best to answer.<\/p>\n<p>The post <a rel=\"nofollow\" href=\"https:\/\/machinelearningmastery.com\/fbeta-measure-for-machine-learning\/\">A Gentle Introduction to the Fbeta-Measure for Machine Learning<\/a> appeared first on <a rel=\"nofollow\" href=\"https:\/\/machinelearningmastery.com\/\">Machine Learning Mastery<\/a>.<\/p>\n<\/div>\n<p><a href=\"https:\/\/machinelearningmastery.com\/fbeta-measure-for-machine-learning\/\">Go to Source<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Author: Jason Brownlee Fbeta-measure is a configurable single-score metric for evaluating a binary classification model based on the predictions made for the positive class. The [&hellip;] <span class=\"read-more-link\"><a class=\"read-more\" href=\"https:\/\/www.aiproblog.com\/index.php\/2020\/02\/23\/a-gentle-introduction-to-the-fbeta-measure-for-machine-learning\/\">Read More<\/a><\/span><\/p>\n","protected":false},"author":1,"featured_media":3167,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_bbp_topic_count":0,"_bbp_reply_count":0,"_bbp_total_topic_count":0,"_bbp_total_reply_count":0,"_bbp_voice_count":0,"_bbp_anonymous_reply_count":0,"_bbp_topic_count_hidden":0,"_bbp_reply_count_hidden":0,"_bbp_forum_subforum_count":0,"footnotes":""},"categories":[24],"tags":[],"_links":{"self":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/3166"}],"collection":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/comments?post=3166"}],"version-history":[{"count":0,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/3166\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media\/3167"}],"wp:attachment":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media?parent=3166"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/categories?post=3166"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/tags?post=3166"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}