{"id":2706,"date":"2019-10-17T18:00:15","date_gmt":"2019-10-17T18:00:15","guid":{"rendered":"https:\/\/www.aiproblog.com\/index.php\/2019\/10\/17\/naive-bayes-classifier-from-scratch-in-python\/"},"modified":"2019-10-17T18:00:15","modified_gmt":"2019-10-17T18:00:15","slug":"naive-bayes-classifier-from-scratch-in-python","status":"publish","type":"post","link":"https:\/\/www.aiproblog.com\/index.php\/2019\/10\/17\/naive-bayes-classifier-from-scratch-in-python\/","title":{"rendered":"Naive Bayes Classifier From Scratch in Python"},"content":{"rendered":"<p>Author: Jason Brownlee<\/p>\n<div>\n<p style=\"text-align: left;\">In this tutorial you are going to learn about the <strong>Naive Bayes algorithm<\/strong> including how it works and how to implement it from scratch in Python (without libraries).<\/p>\n<p>We can use probability to make predictions in machine learning. Perhaps the most widely used example is called the Naive Bayes algorithm. Not only is it straightforward to understand, but it also achieves surprisingly good results on a wide range of problems.<\/p>\n<p>After completing this tutorial you will know:<\/p>\n<ul>\n<li>How to calculate the probabilities required by the Naive Bayes algorithm.<\/li>\n<li>How to implement the Naive Bayes algorithm from scratch.<\/li>\n<li>How to apply Naive Bayes to a real-world predictive modeling problem.<\/li>\n<\/ul>\n<p>Discover how to code ML algorithms from scratch including kNN, decision trees, neural nets, ensembles and much more <a href=\"https:\/\/machinelearningmastery.com\/machine-learning-algorithms-from-scratch\/\" rel=\"nofollow\">in my new book<\/a>, with full Python code and no fancy libraries.<\/p>\n<p>Let\u2019s get started.<\/p>\n<ul>\n<li><strong>Update Dec\/2014<\/strong>: Original implementation.<\/li>\n<li><strong>Update Oct\/2019<\/strong>: Rewrote the tutorial and code from the ground-up.<\/li>\n<\/ul>\n<div id=\"attachment_1947\" style=\"width: 650px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-1947\" class=\"size-full wp-image-1947\" src=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2014\/12\/naive-bayes-classifier.jpg\" alt=\"naive bayes classifier\" width=\"640\" height=\"410\" srcset=\"http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2014\/12\/naive-bayes-classifier.jpg 640w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2014\/12\/naive-bayes-classifier-300x192.jpg 300w\" sizes=\"(max-width: 640px) 100vw, 640px\"><\/p>\n<p id=\"caption-attachment-1947\" class=\"wp-caption-text\">Code a Naive Bayes Classifier From Scratch in Python (with no libraries)<br \/>Photo by <a href=\"https:\/\/www.flickr.com\/photos\/mattbuck007\/3676624894\">Matt Buck<\/a>, some rights reserved<\/p>\n<\/div>\n<h2>Overview<\/h2>\n<p>This section provides a brief overview of the Naive Bayes algorithm and the Iris flowers dataset that we will use in this tutorial.<\/p>\n<h3>Naive Bayes<\/h3>\n<p><a href=\"https:\/\/machinelearningmastery.com\/bayes-theorem-for-machine-learning\/\">Bayes\u2019 Theorem<\/a> provides a way that we can calculate the probability of a piece of data belonging to a given class, given our prior knowledge. Bayes\u2019 Theorem is stated as:<\/p>\n<ul>\n<li>P(class|data) = (P(data|class) * P(class)) \/ P(data)<\/li>\n<\/ul>\n<p>Where P(class|data) is the probability of class given the provided data.<\/p>\n<p>For an in-depth introduction to Bayes Theorem, see the tutorial:<\/p>\n<ul>\n<li><a href=\"https:\/\/machinelearningmastery.com\/bayes-theorem-for-machine-learning\/\">A Gentle Introduction to Bayes Theorem for Machine Learning<\/a><\/li>\n<\/ul>\n<p>Naive Bayes is a classification algorithm for binary (two-class) and multiclass classification problems. It is called Naive Bayes or idiot Bayes because the calculations of the probabilities for each class are simplified to make their calculations tractable.<\/p>\n<p>Rather than attempting to calculate the probabilities of each attribute value, they are assumed to be conditionally independent given the class value.<\/p>\n<p>This is a very strong assumption that is most unlikely in real data, i.e. that the attributes do not interact. Nevertheless, the approach performs surprisingly well on data where this assumption does not hold.<\/p>\n<p>For an in-depth introduction to Naive Bayes, see the tutorial:<\/p>\n<ul>\n<li><a href=\"https:\/\/machinelearningmastery.com\/classification-as-conditional-probability-and-the-naive-bayes-algorithm\/\">How to Develop a Naive Bayes Classifier<\/a><\/li>\n<\/ul>\n<h3>Iris Flower Species Dataset<\/h3>\n<p>In this tutorial we will use the Iris Flower Species Dataset.<\/p>\n<p>The Iris Flower Dataset involves predicting the flower species given measurements of iris flowers.<\/p>\n<p>It is a multiclass classification problem. The number of observations for each class is balanced. There are 150 observations with 4 input variables and 1 output variable. The variable names are as follows:<\/p>\n<ul>\n<li>Sepal length in cm.<\/li>\n<li>Sepal width in cm.<\/li>\n<li>Petal length in cm.<\/li>\n<li>Petal width in cm.<\/li>\n<li>Class<\/li>\n<\/ul>\n<p>A sample of the first 5 rows is listed below.<\/p>\n<pre class=\"crayon-plain-tag\">5.1,3.5,1.4,0.2,Iris-setosa\r\n4.9,3.0,1.4,0.2,Iris-setosa\r\n4.7,3.2,1.3,0.2,Iris-setosa\r\n4.6,3.1,1.5,0.2,Iris-setosa\r\n5.0,3.6,1.4,0.2,Iris-setosa\r\n...<\/pre>\n<p>The baseline performance on the problem is approximately 33%.<\/p>\n<p>Download the dataset and save it into your current working directory with the filename <em>iris.csv<\/em>.<\/p>\n<ul>\n<li><a href=\"https:\/\/raw.githubusercontent.com\/jbrownlee\/Datasets\/master\/iris.csv\">Download Dataset (iris.csv)<\/a><\/li>\n<li><a href=\"https:\/\/raw.githubusercontent.com\/jbrownlee\/Datasets\/master\/iris.names\">More Information on Dataset (iris.names)<\/a><\/li>\n<\/ul>\n<h2>Naive Bayes Tutorial (in 5 easy steps)<\/h2>\n<p>First we will develop each piece of the algorithm in this section, then we will tie all of the elements together into a working implementation applied to a real dataset in the next section.<\/p>\n<p>This Naive Bayes tutorial is broken down into 5 parts:<\/p>\n<ul>\n<li>Step 1: Separate By Class.<\/li>\n<li>Step 2: Summarize Dataset.<\/li>\n<li>Step 3: Summarize Data By Class.<\/li>\n<li>Step 4: Gaussian Probability Density Function.<\/li>\n<li>Step 5: Class Probabilities.<\/li>\n<\/ul>\n<p>These steps will provide the foundation that you need to implement Naive Bayes from scratch and apply it to your own predictive modeling problems.<\/p>\n<p><strong>Note<\/strong>: This tutorial assumes that you are using <strong>Python 3<\/strong>. If you need help installing Python, see this tutorial:<\/p>\n<ul>\n<li><a href=\"https:\/\/machinelearningmastery.com\/setup-python-environment-machine-learning-deep-learning-anaconda\/\">How to Setup Your Python Environment for Machine Learning<\/a><\/li>\n<\/ul>\n<p><strong>Note<\/strong>: if you are using <strong>Python 2.7<\/strong>, you must change all calls to the <em>items()<\/em> function on dictionary objects to <em>iteritems()<\/em>.<\/p>\n<h3>Step 1: Separate By Class<\/h3>\n<p>We will need to calculate the probability of data by the class they belong to, the so-called base rate.<\/p>\n<p>This means that we will first need to separate our training data by class. A relatively straightforward operation.<\/p>\n<p>We can create a dictionary object where each key is the class value and then add a list of all the records as the value in the dictionary.<\/p>\n<p>Below is a function named <em>separate_by_class()<\/em> that implements this approach. It assumes that the last column in each row is the class value.<\/p>\n<pre class=\"crayon-plain-tag\"># Split the dataset by class values, returns a dictionary\r\ndef separate_by_class(dataset):\r\n\tseparated = dict()\r\n\tfor i in range(len(dataset)):\r\n\t\tvector = dataset[i]\r\n\t\tclass_value = vector[-1]\r\n\t\tif (class_value not in separated):\r\n\t\t\tseparated[class_value] = list()\r\n\t\tseparated[class_value].append(vector)\r\n\treturn separated<\/pre>\n<p>We can contrive a small dataset to test out this function.<\/p>\n<pre class=\"crayon-plain-tag\">X1\t\t\t\t\t\tX2\t\t\t\t\t\t\tY\r\n3.393533211\t\t2.331273381\t\t\t0\r\n3.110073483\t\t1.781539638\t\t\t0\r\n1.343808831\t\t3.368360954\t\t\t0\r\n3.582294042\t\t4.67917911\t\t\t0\r\n2.280362439\t\t2.866990263\t\t\t0\r\n7.423436942\t\t4.696522875\t\t\t1\r\n5.745051997\t\t3.533989803\t\t\t1\r\n9.172168622\t\t2.511101045\t\t\t1\r\n7.792783481\t\t3.424088941\t\t\t1\r\n7.939820817\t\t0.791637231\t\t\t1<\/pre>\n<p>We can plot this dataset and use separate colors for each class.<\/p>\n<div id=\"attachment_9369\" style=\"width: 764px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-9369\" class=\"size-full wp-image-9369\" src=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2014\/12\/Scatter-Plot-of-Small-Contrived-Dataset-for-Testing-the-Naive-Bayes-Algorithm.png\" alt=\"Scatter Plot of Small Contrived Dataset for Testing the Naive Bayes Algorithm\" width=\"754\" height=\"453\" srcset=\"http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2014\/12\/Scatter-Plot-of-Small-Contrived-Dataset-for-Testing-the-Naive-Bayes-Algorithm.png 754w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2014\/12\/Scatter-Plot-of-Small-Contrived-Dataset-for-Testing-the-Naive-Bayes-Algorithm-300x180.png 300w\" sizes=\"(max-width: 754px) 100vw, 754px\"><\/p>\n<p id=\"caption-attachment-9369\" class=\"wp-caption-text\">Scatter Plot of Small Contrived Dataset for Testing the Naive Bayes Algorithm<\/p>\n<\/div>\n<p>Putting this all together, we can test our <em>separate_by_class()<\/em> function on the contrived dataset.<\/p>\n<pre class=\"crayon-plain-tag\"># Example of separating data by class value\r\n\r\n# Split the dataset by class values, returns a dictionary\r\ndef separate_by_class(dataset):\r\n\tseparated = dict()\r\n\tfor i in range(len(dataset)):\r\n\t\tvector = dataset[i]\r\n\t\tclass_value = vector[-1]\r\n\t\tif (class_value not in separated):\r\n\t\t\tseparated[class_value] = list()\r\n\t\tseparated[class_value].append(vector)\r\n\treturn separated\r\n\r\n# Test separating data by class\r\ndataset = [[3.393533211,2.331273381,0],\r\n\t[3.110073483,1.781539638,0],\r\n\t[1.343808831,3.368360954,0],\r\n\t[3.582294042,4.67917911,0],\r\n\t[2.280362439,2.866990263,0],\r\n\t[7.423436942,4.696522875,1],\r\n\t[5.745051997,3.533989803,1],\r\n\t[9.172168622,2.511101045,1],\r\n\t[7.792783481,3.424088941,1],\r\n\t[7.939820817,0.791637231,1]]\r\nseparated = separate_by_class(dataset)\r\nfor label in separated:\r\n\tprint(label)\r\n\tfor row in separated[label]:\r\n\t\tprint(row)<\/pre>\n<p>Running the example sorts observations in the dataset by their class value, then prints the class value followed by all identified records.<\/p>\n<pre class=\"crayon-plain-tag\">0\r\n[3.393533211, 2.331273381, 0]\r\n[3.110073483, 1.781539638, 0]\r\n[1.343808831, 3.368360954, 0]\r\n[3.582294042, 4.67917911, 0]\r\n[2.280362439, 2.866990263, 0]\r\n1\r\n[7.423436942, 4.696522875, 1]\r\n[5.745051997, 3.533989803, 1]\r\n[9.172168622, 2.511101045, 1]\r\n[7.792783481, 3.424088941, 1]\r\n[7.939820817, 0.791637231, 1]<\/pre>\n<p>Next we can start to develop the functions needed to collect statistics.<\/p>\n<h3>Step 2: Summarize Dataset<\/h3>\n<p>We need two statistics from a given set of data.<\/p>\n<p>We\u2019ll see how these statistics are used in the calculation of probabilities in a few steps. The two statistics we require from a given dataset are the mean and the standard deviation (average deviation from the mean).<\/p>\n<p>The mean is the average value and can be calculated as:<\/p>\n<ul>\n<li>mean = sum(x)\/n * count(x)<\/li>\n<\/ul>\n<p>Where <em>x<\/em> is the list of values or a column we are looking.<\/p>\n<p>Below is a small function named <em>mean()<\/em> that calculates the mean of a list of numbers.<\/p>\n<pre class=\"crayon-plain-tag\"># Calculate the mean of a list of numbers\r\ndef mean(numbers):\r\n\treturn sum(numbers)\/float(len(numbers))<\/pre>\n<p>The sample standard deviation is calculated as the mean difference from the mean value. This can be calculated as:<\/p>\n<ul>\n<li>standard deviation = sqrt((sum i to N (x_i \u2013 mean(x))^2) \/ N-1)<\/li>\n<\/ul>\n<p>You can see that we square the difference between the mean and a given value, calculate the average squared difference from the mean, then take the square root to return the units back to their original value.<\/p>\n<p>Below is a small function named <em>standard_deviation()<\/em> that calculates the standard deviation of a list of numbers. You will notice that it calculates the mean. It might be more efficient to calculate the mean of a list of numbers once and pass it to the <em>standard_deviation()<\/em> function as a parameter. You can explore this optimization if you\u2019re interested later.<\/p>\n<pre class=\"crayon-plain-tag\">from math import sqrt\r\n\r\n# Calculate the standard deviation of a list of numbers\r\ndef stdev(numbers):\r\n\tavg = mean(numbers)\r\n\tvariance = sum([(x-avg)**2 for x in numbers]) \/ float(len(numbers)-1)\r\n\treturn sqrt(variance)<\/pre>\n<p>We require the mean and standard deviation statistics to be calculated for each input attribute or each column of our data.<\/p>\n<p>We can do that by gathering all of the values for each column into a list and calculating the mean and standard deviation on that list. Once calculated, we can gather the statistics together into a list or tuple of statistics. Then, repeat this operation for each column in the dataset and return a list of tuples of statistics.<\/p>\n<p>Below is a function named <em>summarize_dataset()<\/em> that implements this approach. It uses some Python tricks to cut down on the number of lines required.<\/p>\n<pre class=\"crayon-plain-tag\"># Calculate the mean, stdev and count for each column in a dataset\r\ndef summarize_dataset(dataset):\r\n\tsummaries = [(mean(column), stdev(column), len(column)) for column in zip(*dataset)]\r\n\tdel(summaries[-1])\r\n\treturn summaries<\/pre>\n<p>The first trick is the use of the <a href=\"https:\/\/docs.python.org\/3.3\/library\/functions.html#zip\">zip() function<\/a> that will aggregate elements from each provided argument. We pass in the dataset to the <em>zip()<\/em> function with the * operator that separates the dataset (that is a list of lists) into separate lists for each row. The <em>zip()<\/em> function then iterates over each element of each row and returns a column from the dataset as a list of numbers. A clever little trick.<\/p>\n<p>We then calculate the mean, standard deviation and count of rows in each column. A tuple is created from these 3 numbers and a list of these tuples is stored. We then remove the statistics for the class variable as we will not need these statistics.<\/p>\n<p>Let\u2019s test all of these functions on our contrived dataset from above. Below is the complete example.<\/p>\n<pre class=\"crayon-plain-tag\"># Example of summarizing a dataset\r\nfrom math import sqrt\r\n\r\n# Calculate the mean of a list of numbers\r\ndef mean(numbers):\r\n\treturn sum(numbers)\/float(len(numbers))\r\n\r\n# Calculate the standard deviation of a list of numbers\r\ndef stdev(numbers):\r\n\tavg = mean(numbers)\r\n\tvariance = sum([(x-avg)**2 for x in numbers]) \/ float(len(numbers)-1)\r\n\treturn sqrt(variance)\r\n\r\n# Calculate the mean, stdev and count for each column in a dataset\r\ndef summarize_dataset(dataset):\r\n\tsummaries = [(mean(column), stdev(column), len(column)) for column in zip(*dataset)]\r\n\tdel(summaries[-1])\r\n\treturn summaries\r\n\r\n# Test summarizing a dataset\r\ndataset = [[3.393533211,2.331273381,0],\r\n\t[3.110073483,1.781539638,0],\r\n\t[1.343808831,3.368360954,0],\r\n\t[3.582294042,4.67917911,0],\r\n\t[2.280362439,2.866990263,0],\r\n\t[7.423436942,4.696522875,1],\r\n\t[5.745051997,3.533989803,1],\r\n\t[9.172168622,2.511101045,1],\r\n\t[7.792783481,3.424088941,1],\r\n\t[7.939820817,0.791637231,1]]\r\nsummary = summarize_dataset(dataset)\r\nprint(summary)<\/pre>\n<p>Running the example prints out the list of tuples of statistics on each of the two input variables.<\/p>\n<p>Interpreting the results, we can see that the mean value of X1 is 5.178333386499999 and the standard deviation of X1 is 2.7665845055177263.<\/p>\n<pre class=\"crayon-plain-tag\">[(5.178333386499999, 2.7665845055177263, 10), (2.9984683241, 1.218556343617447, 10)]<\/pre>\n<p>Now we are ready to use these functions on each group of rows in our dataset.<\/p>\n<h3>Step 3: Summarize Data By Class<\/h3>\n<p>We require statistics from our training dataset organized by class.<\/p>\n<p>Above, we have developed the <em>separate_by_class()<\/em> function to separate a dataset into rows by class. And we have developed <em>summarize_dataset()<\/em> function to calculate summary statistics for each column.<\/p>\n<p>We can put all of this together and summarize the columns in the dataset organized by class values.<\/p>\n<p>Below is a function named <em>summarize_by_class()<\/em> that implements this operation. The dataset is first split by class, then statistics are calculated on each subset. The results in the form of a list of tuples of statistics are then stored in a dictionary by their class value.<\/p>\n<pre class=\"crayon-plain-tag\"># Split dataset by class then calculate statistics for each row\r\ndef summarize_by_class(dataset):\r\n\tseparated = separate_by_class(dataset)\r\n\tsummaries = dict()\r\n\tfor class_value, rows in separated.items():\r\n\t\tsummaries[class_value] = summarize_dataset(rows)\r\n\treturn summaries<\/pre>\n<p>Again, let\u2019s test out all of these behaviors on our contrived dataset.<\/p>\n<pre class=\"crayon-plain-tag\"># Example of summarizing data by class value\r\nfrom math import sqrt\r\n\r\n# Split the dataset by class values, returns a dictionary\r\ndef separate_by_class(dataset):\r\n\tseparated = dict()\r\n\tfor i in range(len(dataset)):\r\n\t\tvector = dataset[i]\r\n\t\tclass_value = vector[-1]\r\n\t\tif (class_value not in separated):\r\n\t\t\tseparated[class_value] = list()\r\n\t\tseparated[class_value].append(vector)\r\n\treturn separated\r\n\r\n# Calculate the mean of a list of numbers\r\ndef mean(numbers):\r\n\treturn sum(numbers)\/float(len(numbers))\r\n\r\n# Calculate the standard deviation of a list of numbers\r\ndef stdev(numbers):\r\n\tavg = mean(numbers)\r\n\tvariance = sum([(x-avg)**2 for x in numbers]) \/ float(len(numbers)-1)\r\n\treturn sqrt(variance)\r\n\r\n# Calculate the mean, stdev and count for each column in a dataset\r\ndef summarize_dataset(dataset):\r\n\tsummaries = [(mean(column), stdev(column), len(column)) for column in zip(*dataset)]\r\n\tdel(summaries[-1])\r\n\treturn summaries\r\n\r\n# Split dataset by class then calculate statistics for each row\r\ndef summarize_by_class(dataset):\r\n\tseparated = separate_by_class(dataset)\r\n\tsummaries = dict()\r\n\tfor class_value, rows in separated.items():\r\n\t\tsummaries[class_value] = summarize_dataset(rows)\r\n\treturn summaries\r\n\r\n# Test summarizing by class\r\ndataset = [[3.393533211,2.331273381,0],\r\n\t[3.110073483,1.781539638,0],\r\n\t[1.343808831,3.368360954,0],\r\n\t[3.582294042,4.67917911,0],\r\n\t[2.280362439,2.866990263,0],\r\n\t[7.423436942,4.696522875,1],\r\n\t[5.745051997,3.533989803,1],\r\n\t[9.172168622,2.511101045,1],\r\n\t[7.792783481,3.424088941,1],\r\n\t[7.939820817,0.791637231,1]]\r\nsummary = summarize_by_class(dataset)\r\nfor label in summary:\r\n\tprint(label)\r\n\tfor row in summary[label]:\r\n\t\tprint(row)<\/pre>\n<p>Running this example calculates the statistics for each input variable and prints them organized by class value. Interpreting the results, we can see that the X1 values for rows for class 0 have a mean value of 2.7420144012.<\/p>\n<pre class=\"crayon-plain-tag\">0\r\n(2.7420144012, 0.9265683289298018, 5)\r\n(3.0054686692, 1.1073295894898725, 5)\r\n1\r\n(7.6146523718, 1.2344321550313704, 5)\r\n(2.9914679790000003, 1.4541931384601618, 5)<\/pre>\n<p>There is one more piece we need before we start calculating probabilities.<\/p>\n<h3>Step 4: Gaussian Probability Density Function<\/h3>\n<p>Calculating the probability or likelihood of observing a given real-value like X1 is difficult.<\/p>\n<p>One way we can do this is to assume that X1 values are drawn from a distribution, such as a bell curve or Gaussian distribution.<\/p>\n<p>A <a href=\"https:\/\/machinelearningmastery.com\/continuous-probability-distributions-for-machine-learning\/\">Gaussian distribution<\/a> can be summarized using only two numbers: the mean and the standard deviation. Therefore, with a little math, we can estimate the probability of a given value. This piece of math is called a Gaussian <a href=\"https:\/\/en.wikipedia.org\/wiki\/Gaussian_function\">Probability Distribution Function<\/a> (or Gaussian PDF) and can be calculated as:<\/p>\n<ul>\n<li>f(x) = (1 \/ sqrt(2 * PI) * sigma) * exp(-((x-mean)^2 \/ (2 * sigma^2)))<\/li>\n<\/ul>\n<p>Where <em>sigma<\/em> is the standard deviation for <em>x<\/em>, <em>mean<\/em> is the mean for <em>x<\/em> and <em>PI<\/em> is the value of pi.<\/p>\n<p>Below is a function that implements this. I tried to split it up to make it more readable.<\/p>\n<pre class=\"crayon-plain-tag\"># Calculate the Gaussian probability distribution function for x\r\ndef calculate_probability(x, mean, stdev):\r\n\texponent = exp(-((x-mean)**2 \/ (2 * stdev**2 )))\r\n\treturn (1 \/ (sqrt(2 * pi) * stdev)) * exponent<\/pre>\n<p>Let\u2019s test it out to see how it works. Below are some worked examples.<\/p>\n<pre class=\"crayon-plain-tag\"># Example of Gaussian PDF\r\nfrom math import sqrt\r\nfrom math import pi\r\nfrom math import exp\r\n\r\n# Calculate the Gaussian probability distribution function for x\r\ndef calculate_probability(x, mean, stdev):\r\n\texponent = exp(-((x-mean)**2 \/ (2 * stdev**2 )))\r\n\treturn (1 \/ (sqrt(2 * pi) * stdev)) * exponent\r\n\r\n# Test Gaussian PDF\r\nprint(calculate_probability(1.0, 1.0, 1.0))\r\nprint(calculate_probability(2.0, 1.0, 1.0))\r\nprint(calculate_probability(0.0, 1.0, 1.0))<\/pre>\n<p>Running it prints the probability of some input values. You can see that when the value is 1 and the mean and standard deviation is 1 our input is the most likely (top of the bell curve) and has the probability of 0.39.<\/p>\n<p>We can see that when we keep the statistics the same and change the x value to 1 standard deviation either side of the mean value (2 and 0 or the same distance either side of the bell curve) the probabilities of those input values are the same at 0.24.<\/p>\n<pre class=\"crayon-plain-tag\">0.3989422804014327\r\n0.24197072451914337\r\n0.24197072451914337<\/pre>\n<p>Now that we have all the pieces in place, let\u2019s see how we can calculate the probabilities we need for the Naive Bayes classifier.<\/p>\n<h3>Step 5: Class Probabilities<\/h3>\n<p>Now it is time to use the statistics calculated from our training data to calculate probabilities for new data.<\/p>\n<p>Probabilities are calculated separately for each class. This means that we first calculate the probability that a new piece of data belongs to the first class, then calculate probabilities that it belongs to the second class, and so on for all the classes.<\/p>\n<p>The probability that a piece of data belongs to a class is calculated as follows:<\/p>\n<ul>\n<li>P(class|data) = P(X|class) * P(class)<\/li>\n<\/ul>\n<p>You may note that this is different from the Bayes Theorem described above.<\/p>\n<p>The division has been removed to simplify the calculation.<\/p>\n<p>This means that the result is no longer strictly a probability of the data belonging to a class. The value is still maximized, meaning that the calculation for the class that results in the largest value is taken as the prediction. This is a common implementation simplification as we are often more interested in the class prediction rather than the probability.<\/p>\n<p>The input variables are treated separately, giving the technique it\u2019s name \u201c<em>naive<\/em>\u201c. For the above example where we have 2 input variables, the calculation of the probability that a row belongs to the first class 0 can be calculated as:<\/p>\n<ul>\n<li>P(class=0|X1,X2) = P(X1|class=0) * P(X2|class=0) * P(class=0)<\/li>\n<\/ul>\n<p>Now you can see why we need to separate the data by class value. The Gaussian Probability Density function in the previous step is how we calculate the probability of a real value like X1 and the statistics we prepared are used in this calculation.<\/p>\n<p>Below is a function named <em>calculate_class_probabilities()<\/em> that ties all of this together.<\/p>\n<p>It takes a set of prepared summaries and a new row as input arguments.<\/p>\n<p>First the total number of training records is calculated from the counts stored in the summary statistics. This is used in the calculation of the probability of a given class or <em>P(class)<\/em> as the ratio of rows with a given class of all rows in the training data.<\/p>\n<p>Next, probabilities are calculated for each input value in the row using the Gaussian probability density function and the statistics for that column and of that class. Probabilities are multiplied together as they accumulated.<\/p>\n<p>This process is repeated for each class in the dataset.<\/p>\n<p>Finally a dictionary of probabilities is returned with one entry for each class.<\/p>\n<pre class=\"crayon-plain-tag\"># Calculate the probabilities of predicting each class for a given row\r\ndef calculate_class_probabilities(summaries, row):\r\n\ttotal_rows = sum([summaries[label][0][2] for label in summaries])\r\n\tprobabilities = dict()\r\n\tfor class_value, class_summaries in summaries.items():\r\n\t\tprobabilities[class_value] = summaries[class_value][0][2]\/float(total_rows)\r\n\t\tfor i in range(len(class_summaries)):\r\n\t\t\tmean, stdev, count = class_summaries[i]\r\n\t\t\tprobabilities[class_value] *= calculate_probability(row[i], mean, stdev)\r\n\treturn probabilities<\/pre>\n<p>Let\u2019s tie this together with an example on the contrived dataset.<\/p>\n<p>The example below first calculates the summary statistics by class for the training dataset, then uses these statistics to calculate the probability of the first record belonging to each class.<\/p>\n<pre class=\"crayon-plain-tag\"># Example of calculating class probabilities\r\nfrom math import sqrt\r\nfrom math import pi\r\nfrom math import exp\r\n\r\n# Split the dataset by class values, returns a dictionary\r\ndef separate_by_class(dataset):\r\n\tseparated = dict()\r\n\tfor i in range(len(dataset)):\r\n\t\tvector = dataset[i]\r\n\t\tclass_value = vector[-1]\r\n\t\tif (class_value not in separated):\r\n\t\t\tseparated[class_value] = list()\r\n\t\tseparated[class_value].append(vector)\r\n\treturn separated\r\n\r\n# Calculate the mean of a list of numbers\r\ndef mean(numbers):\r\n\treturn sum(numbers)\/float(len(numbers))\r\n\r\n# Calculate the standard deviation of a list of numbers\r\ndef stdev(numbers):\r\n\tavg = mean(numbers)\r\n\tvariance = sum([(x-avg)**2 for x in numbers]) \/ float(len(numbers)-1)\r\n\treturn sqrt(variance)\r\n\r\n# Calculate the mean, stdev and count for each column in a dataset\r\ndef summarize_dataset(dataset):\r\n\tsummaries = [(mean(column), stdev(column), len(column)) for column in zip(*dataset)]\r\n\tdel(summaries[-1])\r\n\treturn summaries\r\n\r\n# Split dataset by class then calculate statistics for each row\r\ndef summarize_by_class(dataset):\r\n\tseparated = separate_by_class(dataset)\r\n\tsummaries = dict()\r\n\tfor class_value, rows in separated.items():\r\n\t\tsummaries[class_value] = summarize_dataset(rows)\r\n\treturn summaries\r\n\r\n# Calculate the Gaussian probability distribution function for x\r\ndef calculate_probability(x, mean, stdev):\r\n\texponent = exp(-((x-mean)**2 \/ (2 * stdev**2 )))\r\n\treturn (1 \/ (sqrt(2 * pi) * stdev)) * exponent\r\n\r\n# Calculate the probabilities of predicting each class for a given row\r\ndef calculate_class_probabilities(summaries, row):\r\n\ttotal_rows = sum([summaries[label][0][2] for label in summaries])\r\n\tprobabilities = dict()\r\n\tfor class_value, class_summaries in summaries.items():\r\n\t\tprobabilities[class_value] = summaries[class_value][0][2]\/float(total_rows)\r\n\t\tfor i in range(len(class_summaries)):\r\n\t\t\tmean, stdev, _ = class_summaries[i]\r\n\t\t\tprobabilities[class_value] *= calculate_probability(row[i], mean, stdev)\r\n\treturn probabilities\r\n\r\n# Test calculating class probabilities\r\ndataset = [[3.393533211,2.331273381,0],\r\n\t[3.110073483,1.781539638,0],\r\n\t[1.343808831,3.368360954,0],\r\n\t[3.582294042,4.67917911,0],\r\n\t[2.280362439,2.866990263,0],\r\n\t[7.423436942,4.696522875,1],\r\n\t[5.745051997,3.533989803,1],\r\n\t[9.172168622,2.511101045,1],\r\n\t[7.792783481,3.424088941,1],\r\n\t[7.939820817,0.791637231,1]]\r\nsummaries = summarize_by_class(dataset)\r\nprobabilities = calculate_class_probabilities(summaries, dataset[0])\r\nprint(probabilities)<\/pre>\n<p>Running the example prints the probabilities calculated for each class.<\/p>\n<p>We can see that the probability of the first row belonging to the 0 class (0.0503) is higher than the probability of it belonging to the 1 class (0.0001). We would therefore correctly conclude that it belongs to the 0 class.<\/p>\n<pre class=\"crayon-plain-tag\">{0: 0.05032427673372075, 1: 0.00011557718379945765}<\/pre>\n<p>Now that we have seen how to implement the Naive Bayes algorithm, let\u2019s apply it to the Iris flowers dataset.<\/p>\n<h2>Iris Flower Species Case Study<\/h2>\n<p>This section applies the Naive Bayes algorithm to the Iris flowers dataset.<\/p>\n<p>The first step is to load the dataset and convert the loaded data to numbers that we can use with the mean and standard deviation calculations. For this we will use the helper function <em>load_csv()<\/em> to load the file, <em>str_column_to_float()<\/em> to convert string numbers to floats and <em>str_column_to_int()<\/em> to convert the class column to integer values.<\/p>\n<p>We will evaluate the algorithm using <a href=\"https:\/\/machinelearningmastery.com\/k-fold-cross-validation\/\">k-fold cross-validation<\/a> with 5 folds. This means that 150\/5=30 records will be in each fold. We will use the helper functions <em>evaluate_algorithm()<\/em> to evaluate the algorithm with cross-validation and <em>accuracy_metric()<\/em> to calculate the accuracy of predictions.<\/p>\n<p>A new function named <em>predict()<\/em> was developed to manage the calculation of the probabilities of a new row belonging to each class and selecting the class with the largest probability value.<\/p>\n<p>Another new function named <em>naive_bayes()<\/em> was developed to manage the application of the Naive Bayes algorithm, first learning the statistics from a training dataset and using them to make predictions for a test dataset.<\/p>\n<p>If you would like more help with the data loading functions used below, see the tutorial:<\/p>\n<ul>\n<li><a href=\"https:\/\/machinelearningmastery.com\/load-machine-learning-data-scratch-python\/\">How to Load Machine Learning Data From Scratch In Python<\/a><\/li>\n<\/ul>\n<p>If you would like more help with the way the model is evaluated using cross validation, see the tutorial:<\/p>\n<ul>\n<li><a href=\"https:\/\/machinelearningmastery.com\/implement-resampling-methods-scratch-python\/\">How to Implement Resampling Methods From Scratch In Python<\/a><\/li>\n<\/ul>\n<p>The complete example is listed below.<\/p>\n<pre class=\"crayon-plain-tag\"># Naive Bayes On The Iris Dataset\r\nfrom csv import reader\r\nfrom random import seed\r\nfrom random import randrange\r\nfrom math import sqrt\r\nfrom math import exp\r\nfrom math import pi\r\n\r\n# Load a CSV file\r\ndef load_csv(filename):\r\n\tdataset = list()\r\n\twith open(filename, 'r') as file:\r\n\t\tcsv_reader = reader(file)\r\n\t\tfor row in csv_reader:\r\n\t\t\tif not row:\r\n\t\t\t\tcontinue\r\n\t\t\tdataset.append(row)\r\n\treturn dataset\r\n\r\n# Convert string column to float\r\ndef str_column_to_float(dataset, column):\r\n\tfor row in dataset:\r\n\t\trow[column] = float(row[column].strip())\r\n\r\n# Convert string column to integer\r\ndef str_column_to_int(dataset, column):\r\n\tclass_values = [row[column] for row in dataset]\r\n\tunique = set(class_values)\r\n\tlookup = dict()\r\n\tfor i, value in enumerate(unique):\r\n\t\tlookup[value] = i\r\n\tfor row in dataset:\r\n\t\trow[column] = lookup[row[column]]\r\n\treturn lookup\r\n\r\n# Split a dataset into k folds\r\ndef cross_validation_split(dataset, n_folds):\r\n\tdataset_split = list()\r\n\tdataset_copy = list(dataset)\r\n\tfold_size = int(len(dataset) \/ n_folds)\r\n\tfor _ in range(n_folds):\r\n\t\tfold = list()\r\n\t\twhile len(fold) < fold_size:\r\n\t\t\tindex = randrange(len(dataset_copy))\r\n\t\t\tfold.append(dataset_copy.pop(index))\r\n\t\tdataset_split.append(fold)\r\n\treturn dataset_split\r\n\r\n# Calculate accuracy percentage\r\ndef accuracy_metric(actual, predicted):\r\n\tcorrect = 0\r\n\tfor i in range(len(actual)):\r\n\t\tif actual[i] == predicted[i]:\r\n\t\t\tcorrect += 1\r\n\treturn correct \/ float(len(actual)) * 100.0\r\n\r\n# Evaluate an algorithm using a cross validation split\r\ndef evaluate_algorithm(dataset, algorithm, n_folds, *args):\r\n\tfolds = cross_validation_split(dataset, n_folds)\r\n\tscores = list()\r\n\tfor fold in folds:\r\n\t\ttrain_set = list(folds)\r\n\t\ttrain_set.remove(fold)\r\n\t\ttrain_set = sum(train_set, [])\r\n\t\ttest_set = list()\r\n\t\tfor row in fold:\r\n\t\t\trow_copy = list(row)\r\n\t\t\ttest_set.append(row_copy)\r\n\t\t\trow_copy[-1] = None\r\n\t\tpredicted = algorithm(train_set, test_set, *args)\r\n\t\tactual = [row[-1] for row in fold]\r\n\t\taccuracy = accuracy_metric(actual, predicted)\r\n\t\tscores.append(accuracy)\r\n\treturn scores\r\n\r\n# Split the dataset by class values, returns a dictionary\r\ndef separate_by_class(dataset):\r\n\tseparated = dict()\r\n\tfor i in range(len(dataset)):\r\n\t\tvector = dataset[i]\r\n\t\tclass_value = vector[-1]\r\n\t\tif (class_value not in separated):\r\n\t\t\tseparated[class_value] = list()\r\n\t\tseparated[class_value].append(vector)\r\n\treturn separated\r\n\r\n# Calculate the mean of a list of numbers\r\ndef mean(numbers):\r\n\treturn sum(numbers)\/float(len(numbers))\r\n\r\n# Calculate the standard deviation of a list of numbers\r\ndef stdev(numbers):\r\n\tavg = mean(numbers)\r\n\tvariance = sum([(x-avg)**2 for x in numbers]) \/ float(len(numbers)-1)\r\n\treturn sqrt(variance)\r\n\r\n# Calculate the mean, stdev and count for each column in a dataset\r\ndef summarize_dataset(dataset):\r\n\tsummaries = [(mean(column), stdev(column), len(column)) for column in zip(*dataset)]\r\n\tdel(summaries[-1])\r\n\treturn summaries\r\n\r\n# Split dataset by class then calculate statistics for each row\r\ndef summarize_by_class(dataset):\r\n\tseparated = separate_by_class(dataset)\r\n\tsummaries = dict()\r\n\tfor class_value, rows in separated.items():\r\n\t\tsummaries[class_value] = summarize_dataset(rows)\r\n\treturn summaries\r\n\r\n# Calculate the Gaussian probability distribution function for x\r\ndef calculate_probability(x, mean, stdev):\r\n\texponent = exp(-((x-mean)**2 \/ (2 * stdev**2 )))\r\n\treturn (1 \/ (sqrt(2 * pi) * stdev)) * exponent\r\n\r\n# Calculate the probabilities of predicting each class for a given row\r\ndef calculate_class_probabilities(summaries, row):\r\n\ttotal_rows = sum([summaries[label][0][2] for label in summaries])\r\n\tprobabilities = dict()\r\n\tfor class_value, class_summaries in summaries.items():\r\n\t\tprobabilities[class_value] = summaries[class_value][0][2]\/float(total_rows)\r\n\t\tfor i in range(len(class_summaries)):\r\n\t\t\tmean, stdev, _ = class_summaries[i]\r\n\t\t\tprobabilities[class_value] *= calculate_probability(row[i], mean, stdev)\r\n\treturn probabilities\r\n\r\n# Predict the class for a given row\r\ndef predict(summaries, row):\r\n\tprobabilities = calculate_class_probabilities(summaries, row)\r\n\tbest_label, best_prob = None, -1\r\n\tfor class_value, probability in probabilities.items():\r\n\t\tif best_label is None or probability > best_prob:\r\n\t\t\tbest_prob = probability\r\n\t\t\tbest_label = class_value\r\n\treturn best_label\r\n\r\n# Naive Bayes Algorithm\r\ndef naive_bayes(train, test):\r\n\tsummarize = summarize_by_class(train)\r\n\tpredictions = list()\r\n\tfor row in test:\r\n\t\toutput = predict(summarize, row)\r\n\t\tpredictions.append(output)\r\n\treturn(predictions)\r\n\r\n# Test Naive Bayes on Iris Dataset\r\nseed(1)\r\nfilename = 'iris.csv'\r\ndataset = load_csv(filename)\r\nfor i in range(len(dataset[0])-1):\r\n\tstr_column_to_float(dataset, i)\r\n# convert class column to integers\r\nstr_column_to_int(dataset, len(dataset[0])-1)\r\n# evaluate algorithm\r\nn_folds = 5\r\nscores = evaluate_algorithm(dataset, naive_bayes, n_folds)\r\nprint('Scores: %s' % scores)\r\nprint('Mean Accuracy: %.3f%%' % (sum(scores)\/float(len(scores))))<\/pre>\n<p>Running the example prints the mean classification accuracy scores on each cross-validation fold as well as the mean accuracy score.<\/p>\n<p>We can see that the mean accuracy of about 95% is dramatically better than the baseline accuracy of 33%.<\/p>\n<pre class=\"crayon-plain-tag\">Scores: [93.33333333333333, 96.66666666666667, 100.0, 93.33333333333333, 93.33333333333333]\r\nMean Accuracy: 95.333%<\/pre>\n<p>We can fit the model on the entire dataset and then use the model to make predictions for new observations (rows of data).<\/p>\n<p>For example, the model is just a set of probabilities calculated via the <em>summarize_by_class()<\/em> function.<\/p>\n<pre class=\"crayon-plain-tag\">...\r\n# fit model\r\nmodel = summarize_by_class(dataset)<\/pre>\n<p>Once calculated, we can use them in a call to the predict() function with a row representing our new observation to predict the class label.<\/p>\n<pre class=\"crayon-plain-tag\">...\r\n# predict the label\r\nlabel = predict(model, row)<\/pre>\n<p>We also might like to know the class label (string) for a prediction. We can update the str_column_to_int() function to print the mapping of string class names to integers so we can interpret the prediction by the model.<\/p>\n<pre class=\"crayon-plain-tag\"># Convert string column to integer\r\ndef str_column_to_int(dataset, column):\r\n\tclass_values = [row[column] for row in dataset]\r\n\tunique = set(class_values)\r\n\tlookup = dict()\r\n\tfor i, value in enumerate(unique):\r\n\t\tlookup[value] = i\r\n\t\tprint('[%s] => %d' % (value, i))\r\n\tfor row in dataset:\r\n\t\trow[column] = lookup[row[column]]\r\n\treturn lookup<\/pre>\n<p>\u00a0<\/p>\n<p>Tying this together, a complete example of fitting the Naive Bayes model on the entire dataset and making a single prediction for a new observation is listed below.<\/p>\n<pre class=\"crayon-plain-tag\"># Make Predictions with Naive Bayes On The Iris Dataset\r\nfrom csv import reader\r\nfrom math import sqrt\r\nfrom math import exp\r\nfrom math import pi\r\n\r\n# Load a CSV file\r\ndef load_csv(filename):\r\n\tdataset = list()\r\n\twith open(filename, 'r') as file:\r\n\t\tcsv_reader = reader(file)\r\n\t\tfor row in csv_reader:\r\n\t\t\tif not row:\r\n\t\t\t\tcontinue\r\n\t\t\tdataset.append(row)\r\n\treturn dataset\r\n\r\n# Convert string column to float\r\ndef str_column_to_float(dataset, column):\r\n\tfor row in dataset:\r\n\t\trow[column] = float(row[column].strip())\r\n\r\n# Convert string column to integer\r\ndef str_column_to_int(dataset, column):\r\n\tclass_values = [row[column] for row in dataset]\r\n\tunique = set(class_values)\r\n\tlookup = dict()\r\n\tfor i, value in enumerate(unique):\r\n\t\tlookup[value] = i\r\n\t\tprint('[%s] => %d' % (value, i))\r\n\tfor row in dataset:\r\n\t\trow[column] = lookup[row[column]]\r\n\treturn lookup\r\n\r\n# Split the dataset by class values, returns a dictionary\r\ndef separate_by_class(dataset):\r\n\tseparated = dict()\r\n\tfor i in range(len(dataset)):\r\n\t\tvector = dataset[i]\r\n\t\tclass_value = vector[-1]\r\n\t\tif (class_value not in separated):\r\n\t\t\tseparated[class_value] = list()\r\n\t\tseparated[class_value].append(vector)\r\n\treturn separated\r\n\r\n# Calculate the mean of a list of numbers\r\ndef mean(numbers):\r\n\treturn sum(numbers)\/float(len(numbers))\r\n\r\n# Calculate the standard deviation of a list of numbers\r\ndef stdev(numbers):\r\n\tavg = mean(numbers)\r\n\tvariance = sum([(x-avg)**2 for x in numbers]) \/ float(len(numbers)-1)\r\n\treturn sqrt(variance)\r\n\r\n# Calculate the mean, stdev and count for each column in a dataset\r\ndef summarize_dataset(dataset):\r\n\tsummaries = [(mean(column), stdev(column), len(column)) for column in zip(*dataset)]\r\n\tdel(summaries[-1])\r\n\treturn summaries\r\n\r\n# Split dataset by class then calculate statistics for each row\r\ndef summarize_by_class(dataset):\r\n\tseparated = separate_by_class(dataset)\r\n\tsummaries = dict()\r\n\tfor class_value, rows in separated.items():\r\n\t\tsummaries[class_value] = summarize_dataset(rows)\r\n\treturn summaries\r\n\r\n# Calculate the Gaussian probability distribution function for x\r\ndef calculate_probability(x, mean, stdev):\r\n\texponent = exp(-((x-mean)**2 \/ (2 * stdev**2 )))\r\n\treturn (1 \/ (sqrt(2 * pi) * stdev)) * exponent\r\n\r\n# Calculate the probabilities of predicting each class for a given row\r\ndef calculate_class_probabilities(summaries, row):\r\n\ttotal_rows = sum([summaries[label][0][2] for label in summaries])\r\n\tprobabilities = dict()\r\n\tfor class_value, class_summaries in summaries.items():\r\n\t\tprobabilities[class_value] = summaries[class_value][0][2]\/float(total_rows)\r\n\t\tfor i in range(len(class_summaries)):\r\n\t\t\tmean, stdev, _ = class_summaries[i]\r\n\t\t\tprobabilities[class_value] *= calculate_probability(row[i], mean, stdev)\r\n\treturn probabilities\r\n\r\n# Predict the class for a given row\r\ndef predict(summaries, row):\r\n\tprobabilities = calculate_class_probabilities(summaries, row)\r\n\tbest_label, best_prob = None, -1\r\n\tfor class_value, probability in probabilities.items():\r\n\t\tif best_label is None or probability > best_prob:\r\n\t\t\tbest_prob = probability\r\n\t\t\tbest_label = class_value\r\n\treturn best_label\r\n\r\n# Make a prediction with Naive Bayes on Iris Dataset\r\nfilename = 'iris.csv'\r\ndataset = load_csv(filename)\r\nfor i in range(len(dataset[0])-1):\r\n\tstr_column_to_float(dataset, i)\r\n# convert class column to integers\r\nstr_column_to_int(dataset, len(dataset[0])-1)\r\n# fit model\r\nmodel = summarize_by_class(dataset)\r\n# define a new record\r\nrow = [5.7,2.9,4.2,1.3]\r\n# predict the label\r\nlabel = predict(model, row)\r\nprint('Data=%s, Predicted: %s' % (row, label))<\/pre>\n<p>Running the data first summarizes the mapping of class labels to integers and then fits the model on the entire dataset.<\/p>\n<p>Then a new observation is defined (in this case I took a row from the dataset), and a predicted label is calculated. In this case our observation is predicted as belonging to class 2 which we know is \u201cIris-setosa\u201d.<\/p>\n<pre class=\"crayon-plain-tag\">[Iris-virginica] => 0\r\n[Iris-versicolor] => 1\r\n[Iris-setosa] => 2\r\n\r\nData=[5.7, 2.9, 4.2, 1.3], Predicted: 1<\/pre>\n<\/p>\n<h2>Extensions<\/h2>\n<p>This section lists extensions to the tutorial that you may wish to explore.<\/p>\n<ul>\n<li><strong>Log Probabilities<\/strong>: The conditional probabilities for each class given an attribute value are small. When they are multiplied together they result in very small values, which can lead to floating point underflow (numbers too small to represent in Python). A common fix for this is to add the log of the probabilities together. Research and implement this improvement.<\/li>\n<li><strong>Nominal Attributes<\/strong>: Update the implementation to support nominal attributes. This is much similar and the summary information you can collect for each attribute is the ratio of category values for each class. Dive into the references for more information.<\/li>\n<li><strong>Different Density Function<\/strong> (<em>bernoulli<\/em> or <em>multinomial<\/em>): We have looked at Gaussian Naive Bayes, but you can also look at other distributions. Implement a different distribution such as multinomial, bernoulli or kernel naive bayes that make different assumptions about the distribution of attribute values and\/or their relationship with the class value.<\/li>\n<\/ul>\n<p>If you try any of these extensions, let me know in the comments below.<\/p>\n<h2>Further Reading<\/h2>\n<h3>Tutorials<\/h3>\n<ul>\n<li><a href=\"https:\/\/machinelearningmastery.com\/bayes-theorem-for-machine-learning\/\">A Gentle Introduction to Bayes Theorem for Machine Learning<\/a><\/li>\n<li><a href=\"https:\/\/machinelearningmastery.com\/classification-as-conditional-probability-and-the-naive-bayes-algorithm\/\">How to Develop a Naive Bayes Classifier from Scratch in Python<\/a><\/li>\n<li><a href=\"https:\/\/machinelearningmastery.com\/naive-bayes-tutorial-for-machine-learning\/\">Naive Bayes Tutorial for Machine Learning<\/a><\/li>\n<li><a href=\"https:\/\/machinelearningmastery.com\/naive-bayes-for-machine-learning\/\">Naive Bayes for Machine Learning<\/a><\/li>\n<li><a href=\"https:\/\/machinelearningmastery.com\/better-naive-bayes\/\">Better Naive Bayes: 12 Tips To Get The Most From The Naive Bayes Algorithm<\/a><\/li>\n<\/ul>\n<h3>Books<\/h3>\n<ul>\n<li>Section 13.6 Naive Bayes, page 353, <a href=\"http:\/\/amzn.to\/2e3lNXF\">Applied Predictive Modeling<\/a>, 2013.<\/li>\n<li>Section 4.2, Statistical modeling, page 88, <a href=\"http:\/\/amzn.to\/2fj3SYY\">Data Mining: Practical Machine Learning Tools and Techniques<\/a>, 2nd edition, 2005.<\/li>\n<\/ul>\n<h2>Summary<\/h2>\n<p>In this tutorial you discovered how to implement the Naive Bayes algorithm from scratch in Python.<\/p>\n<p>Specifically, you learned:<\/p>\n<ul>\n<li>How to calculate the probabilities required by the Naive interpretation of Bayes Theorem.<\/li>\n<li>How to use probabilities to make predictions on new data.<\/li>\n<li>How to apply Naive Bayes to a real-world predictive modeling problem.<\/li>\n<\/ul>\n<h3>Next Step<\/h3>\n<p>Take action!<\/p>\n<ol>\n<li>Follow the tutorial and implement Naive Bayes from scratch.<\/li>\n<li>Adapt the example to another dataset.<\/li>\n<li>Follow the extensions and improve upon the implementation.<\/li>\n<\/ol>\n<p>Leave a comment and share your experiences.<\/p>\n<p>The post <a rel=\"nofollow\" href=\"https:\/\/machinelearningmastery.com\/naive-bayes-classifier-scratch-python\/\">Naive Bayes Classifier From Scratch in Python<\/a> appeared first on <a rel=\"nofollow\" href=\"https:\/\/machinelearningmastery.com\/\">Machine Learning Mastery<\/a>.<\/p>\n<\/div>\n<p><a href=\"https:\/\/machinelearningmastery.com\/naive-bayes-classifier-scratch-python\/\">Go to Source<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Author: Jason Brownlee In this tutorial you are going to learn about the Naive Bayes algorithm including how it works and how to implement it [&hellip;] <span class=\"read-more-link\"><a class=\"read-more\" href=\"https:\/\/www.aiproblog.com\/index.php\/2019\/10\/17\/naive-bayes-classifier-from-scratch-in-python\/\">Read More<\/a><\/span><\/p>\n","protected":false},"author":1,"featured_media":2707,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_bbp_topic_count":0,"_bbp_reply_count":0,"_bbp_total_topic_count":0,"_bbp_total_reply_count":0,"_bbp_voice_count":0,"_bbp_anonymous_reply_count":0,"_bbp_topic_count_hidden":0,"_bbp_reply_count_hidden":0,"_bbp_forum_subforum_count":0,"footnotes":""},"categories":[24],"tags":[],"_links":{"self":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/2706"}],"collection":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/comments?post=2706"}],"version-history":[{"count":0,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/2706\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media\/2707"}],"wp:attachment":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media?parent=2706"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/categories?post=2706"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/tags?post=2706"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}