{"id":2605,"date":"2019-09-22T19:00:43","date_gmt":"2019-09-22T19:00:43","guid":{"rendered":"https:\/\/www.aiproblog.com\/index.php\/2019\/09\/22\/continuous-probability-distributions-for-machine-learning\/"},"modified":"2019-09-22T19:00:43","modified_gmt":"2019-09-22T19:00:43","slug":"continuous-probability-distributions-for-machine-learning","status":"publish","type":"post","link":"https:\/\/www.aiproblog.com\/index.php\/2019\/09\/22\/continuous-probability-distributions-for-machine-learning\/","title":{"rendered":"Continuous Probability Distributions for Machine Learning"},"content":{"rendered":"<p>Author: Jason Brownlee<\/p>\n<div>\n<p>The probability for a continuous random variable can be summarized with a continuous probability distribution.<\/p>\n<p>Continuous probability distributions are encountered in machine learning, most notably in the distribution of numerical input and output variables for models and in the distribution of errors made by models. Knowledge of the normal continuous probability distribution is also required more generally in the density and parameter estimation performed by many machine learning models.<\/p>\n<p>As such, continuous probability distributions play an important role in applied machine learning and there are a few distributions that a practitioner must know about.<\/p>\n<p>In this tutorial, you will discover continuous probability distributions used in machine learning.<\/p>\n<p>After completing this tutorial, you will know:<\/p>\n<ul>\n<li>The probability of outcomes for continuous random variables can be summarized using continuous probability distributions.<\/li>\n<li>How to parametrize, define, and randomly sample from common continuous probability distributions.<\/li>\n<li>How to create probability density and cumulative density plots for common continuous probability distributions.<\/li>\n<\/ul>\n<p>Let\u2019s get started.<\/p>\n<div id=\"attachment_8742\" style=\"width: 650px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-8742\" class=\"size-full wp-image-8742\" src=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2019\/09\/Continuous-Probability-Distributions-for-Machine-Learning.jpg\" alt=\"Continuous Probability Distributions for Machine Learning\" width=\"640\" height=\"360\" srcset=\"http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2019\/09\/Continuous-Probability-Distributions-for-Machine-Learning.jpg 640w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2019\/09\/Continuous-Probability-Distributions-for-Machine-Learning-300x169.jpg 300w\" sizes=\"(max-width: 640px) 100vw, 640px\"><\/p>\n<p id=\"caption-attachment-8742\" class=\"wp-caption-text\">Continuous Probability Distributions for Machine Learning<br \/>Photo by <a href=\"https:\/\/www.flickr.com\/photos\/blmoregon\/11343624354\/\">Bureau of Land Management<\/a>, some rights reserved.<\/p>\n<\/div>\n<h2>Tutorial Overview<\/h2>\n<p>This tutorial is divided into four parts; they are:<\/p>\n<ol>\n<li>Continuous Probability Distributions<\/li>\n<li>Normal Distribution<\/li>\n<li>Exponential Distribution<\/li>\n<li>Pareto Distribution<\/li>\n<\/ol>\n<h2>Continuous Probability Distributions<\/h2>\n<p>A random variable is a quantity produced by a random process.<\/p>\n<p>A continuous random variable is a random variable that has a real numerical value.<\/p>\n<p>Each numerical outcome of a continuous random variable can be assigned a probability.<\/p>\n<p>The relationship between the events for a continuous random variable and their probabilities is called the continuous probability distribution and is summarized by a <a href=\"https:\/\/en.wikipedia.org\/wiki\/Probability_density_function\">probability density function<\/a>, or PDF for short.<\/p>\n<p>Unlike a discrete random variable, the probability for a given continuous random variable cannot be specified directly; instead, it is calculated as an integral (area under the curve) for a tiny interval around the specific outcome.<\/p>\n<p>The probability of an event equal to or less than a given value is defined by the cumulative distribution function, or CDF for short. The inverse of the CDF is called the percentage-point function and will give the discrete outcome that is less than or equal to a probability.<\/p>\n<ul>\n<li><strong>PDF<\/strong>: Probability Density Function, returns the probability of a given continuous outcome.<\/li>\n<li><strong>CDF<\/strong>: Cumulative Distribution Function, returns the probability of a value less than or equal to a given outcome.<\/li>\n<li><strong>PPF<\/strong>: Percent-Point Function, returns a discrete value that is less than or equal to the given probability.<\/li>\n<\/ul>\n<p>There are many common continuous probability distributions. The most common is the normal probability distribution. Practically all continuous probability distributions of interest belong to the so-called <a href=\"https:\/\/en.wikipedia.org\/wiki\/Exponential_family\">exponential family of distributions<\/a>, which are just a collection of parameterized probability distributions (e.g. distributions that change based on the values of parameters).<\/p>\n<p>Continuous probability distributions play an important role in machine learning from the distribution of input variables to the models, the distribution of errors made by models, and in the models themselves when estimating the mapping between inputs and outputs.<\/p>\n<p>In the following sections, will take a closer look at some of the more common continuous probability distributions.<\/p>\n<h2>Normal Distribution<\/h2>\n<p>The <a href=\"https:\/\/en.wikipedia.org\/wiki\/Normal_distribution\">normal distribution<\/a> is also called the Gaussian distribution (named for <a href=\"https:\/\/en.wikipedia.org\/wiki\/Carl_Friedrich_Gauss\">Carl Friedrich Gauss<\/a>) or the bell curve distribution.<\/p>\n<p>The distribution covers the probability of real-valued events from many different problem domains, making it a common and well-known distribution, hence the name \u201c<em>normal<\/em>.\u201d A continuous random variable that has a normal distribution is said to be \u201c<em>normal<\/em>\u201d or \u201c<em>normally distributed<\/em>.\u201d<\/p>\n<p>Some examples of domains that have normally distributed events include:<\/p>\n<ul>\n<li>The heights of people.<\/li>\n<li>The weights of babies.<\/li>\n<li>The scores on a test.<\/li>\n<\/ul>\n<p>The distribution can be defined using two parameters:<\/p>\n<ul>\n<li><strong>Mean<\/strong> (<em>mu<\/em>): The expected value.<\/li>\n<li><strong>Variance<\/strong> (<em>sigma^2<\/em>): The spread from the mean.<\/li>\n<\/ul>\n<p>Often, the standard deviation is used instead of the variance, which is calculated as the square root of the variance, e.g. normalized.<\/p>\n<ul>\n<li><strong>Standard Deviation<\/strong> (<em>sigma<\/em>): The average spread from the mean.<\/li>\n<\/ul>\n<p>A distribution with a mean of zero and a standard deviation of 1 is called a standard normal distribution, and often data is reduced or \u201c<em>standardized<\/em>\u201d to this for analysis for ease of interpretation and comparison.<\/p>\n<p>We can define a distribution with a mean of 50 and a standard deviation of 5 and sample random numbers from this distribution. We can achieve this using the <a href=\"https:\/\/docs.scipy.org\/doc\/numpy\/reference\/generated\/numpy.random.normal.html\">normal() NumPy function<\/a>.<\/p>\n<p>The example below samples and prints 10 numbers from this distribution.<\/p>\n<pre class=\"crayon-plain-tag\"># sample a normal distribution\r\nfrom numpy.random import normal\r\n# define the distribution\r\nmu = 50\r\nsigma = 5\r\nn = 10\r\n# generate the sample\r\nsample = normal(mu, sigma, n)\r\nprint(sample)<\/pre>\n<p>Running the example prints 10 numbers randomly sampled from the defined normal distribution.<\/p>\n<pre class=\"crayon-plain-tag\">[48.71009029 49.36970461 45.58247748 51.96846616 46.05793544 40.3903483\r\n 48.39189421 50.08693721 46.85896352 44.83757824]<\/pre>\n<p>A sample of data can be checked to see if it is random by plotting it and checking for the familiar normal shape, or by using statistical tests. If the samples of observations of a random variable are normally distributed, then they can be summarized by just the mean and variance, calculated directly on the samples.<\/p>\n<p>We can calculate the probability of each observation using the probability density function. A plot of these values would give us the tell-tale bell shape.<\/p>\n<p>We can define a normal distribution using the <a href=\"https:\/\/docs.scipy.org\/doc\/scipy\/reference\/generated\/scipy.stats.norm.html\">norm() SciPy function<\/a> and then calculate properties such as the moments, PDF, CDF, and more.<\/p>\n<p>The example below calculates the probability for integer values between 30 and 70 in our distribution and plots the result, then does the same for the cumulative probability.<\/p>\n<pre class=\"crayon-plain-tag\"># pdf and cdf for a normal distribution\r\nfrom scipy.stats import norm\r\nfrom matplotlib import pyplot\r\n# define distribution parameters\r\nmu = 50\r\nsigma = 5\r\n# create distribution\r\ndist = norm(mu, sigma)\r\n# plot pdf\r\nvalues = [value for value in range(30, 70)]\r\nprobabilities = [dist.pdf(value) for value in values]\r\npyplot.plot(values, probabilities)\r\npyplot.show()\r\n# plot cdf\r\ncprobs = [dist.cdf(value) for value in values]\r\npyplot.plot(values, cprobs)\r\npyplot.show()<\/pre>\n<p>Running the example first calculates the probability for integers in the range [30, 70] and creates a line plot of values and probabilities.<\/p>\n<p>The plot shows the Gaussian or bell-shape with the peak of highest probability around the expected value or mean of 50 with a probability of about 8%.<\/p>\n<div id=\"attachment_8736\" style=\"width: 1290px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-8736\" class=\"size-full wp-image-8736\" src=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2019\/08\/Line-Plot-of-Events-vs-Probability-or-the-Probability-Density-Function-for-the-Normal-Distribution.png\" alt=\"Line Plot of Events vs Probability or the Probability Density Function for the Normal Distribution\" width=\"1280\" height=\"960\" srcset=\"http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2019\/08\/Line-Plot-of-Events-vs-Probability-or-the-Probability-Density-Function-for-the-Normal-Distribution.png 1280w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2019\/08\/Line-Plot-of-Events-vs-Probability-or-the-Probability-Density-Function-for-the-Normal-Distribution-300x225.png 300w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2019\/08\/Line-Plot-of-Events-vs-Probability-or-the-Probability-Density-Function-for-the-Normal-Distribution-768x576.png 768w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2019\/08\/Line-Plot-of-Events-vs-Probability-or-the-Probability-Density-Function-for-the-Normal-Distribution-1024x768.png 1024w\" sizes=\"(max-width: 1280px) 100vw, 1280px\"><\/p>\n<p id=\"caption-attachment-8736\" class=\"wp-caption-text\">Line Plot of Events vs Probability or the Probability Density Function for the Normal Distribution<\/p>\n<\/div>\n<p>The cumulative probabilities are then calculated for observations over the same range, showing that at the mean, we have covered about 50% of the expected values and very close to 100% after the value of about 65 or 3 standard deviations from the mean (50 + (3 * 5)).<\/p>\n<div id=\"attachment_8737\" style=\"width: 1290px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-8737\" class=\"size-full wp-image-8737\" src=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2019\/08\/Line-Plot-of-Events-vs-Cumulative-Probability-or-the-Cumulative-Density-Function-for-the-Normal-Distribution.png\" alt=\"Line Plot of Events vs. Cumulative Probability or the Cumulative Density Function for the Normal Distribution\" width=\"1280\" height=\"960\" srcset=\"http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2019\/08\/Line-Plot-of-Events-vs-Cumulative-Probability-or-the-Cumulative-Density-Function-for-the-Normal-Distribution.png 1280w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2019\/08\/Line-Plot-of-Events-vs-Cumulative-Probability-or-the-Cumulative-Density-Function-for-the-Normal-Distribution-300x225.png 300w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2019\/08\/Line-Plot-of-Events-vs-Cumulative-Probability-or-the-Cumulative-Density-Function-for-the-Normal-Distribution-768x576.png 768w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2019\/08\/Line-Plot-of-Events-vs-Cumulative-Probability-or-the-Cumulative-Density-Function-for-the-Normal-Distribution-1024x768.png 1024w\" sizes=\"(max-width: 1280px) 100vw, 1280px\"><\/p>\n<p id=\"caption-attachment-8737\" class=\"wp-caption-text\">Line Plot of Events vs. Cumulative Probability or the Cumulative Density Function for the Normal Distribution<\/p>\n<\/div>\n<p>In fact, the normal distribution has a heuristic or rule of thumb that defines the percentage of data covered by a given range by the number of standard deviations from the mean. It is called the <a href=\"https:\/\/en.wikipedia.org\/wiki\/68%E2%80%9395%E2%80%9399.7_rule\">68-95-99.7 rule<\/a>, which is the approximate percentage of the data covered by ranges defined by 1, 2, and 3 standard deviations from the mean.<\/p>\n<p>For example, in our distribution with a mean of 50 and a standard deviation of 5, we would expect 95% of the data to be covered by values that are 2 standard deviations from the mean, or 50 \u2013 (2 * 5) and 50 + (2 * 5) or between 40 and 60.<\/p>\n<p>We can confirm this by calculating the exact values using the percentage-point function.<\/p>\n<p>The middle 95% would be defined by the percentage point function value for 2.5% at the low end and 97.5% at the high end, where 97.5 \u2013 2.5 gives the middle 95%.<\/p>\n<p>The complete example is listed below.<\/p>\n<pre class=\"crayon-plain-tag\"># calculate the values that define the middle 95%\r\nfrom scipy.stats import norm\r\n# define distribution parameters\r\nmu = 50\r\nsigma = 5\r\n# create distribution\r\ndist = norm(mu, sigma)\r\nlow_end = dist.ppf(0.025)\r\nhigh_end = dist.ppf(0.975)\r\nprint('Middle 95%% between %.1f and %.1f' % (low_end, high_end))<\/pre>\n<p>Running the example gives the exact outcomes that define the middle 95% of expected outcomes that are very close to our standard-deviation-based heuristics of 40 and 60.<\/p>\n<pre class=\"crayon-plain-tag\">Middle 95% between 40.2 and 59.8<\/pre>\n<p>An important related distribution is the <a href=\"https:\/\/en.wikipedia.org\/wiki\/Log-normal_distribution\">Log-Normal probability distribution<\/a>.<\/p>\n<h2>Exponential Distribution<\/h2>\n<p>The <a href=\"https:\/\/en.wikipedia.org\/wiki\/Exponential_distribution\">exponential distribution<\/a> is a continuous probability distribution where a few outcomes are the most likely with a rapid decrease in probability to all other outcomes.<\/p>\n<p>It is the continuous random variable equivalent to the <a href=\"https:\/\/en.wikipedia.org\/wiki\/Geometric_distribution\">geometric probability distribution<\/a> for discrete random variables.<\/p>\n<p>Some examples of domains that have exponential distribution events include:<\/p>\n<ul>\n<li>The time between clicks on a Geiger counter.<\/li>\n<li>The time until the failure of a part.<\/li>\n<li>The time until the default of a loan.<\/li>\n<\/ul>\n<p>The distribution can be defined using one parameter:<\/p>\n<ul>\n<li><strong>Scale <\/strong>(<em>Beta<\/em>): The mean and standard deviation of the distribution.<\/li>\n<\/ul>\n<p>Sometimes the distribution is defined more formally with a parameter <em>lambda<\/em> or rate. The <em>beta<\/em> parameter is defined as the reciprocal of the <em>lambda<\/em> parameter (<em>beta = 1\/lambda<\/em>)<\/p>\n<ul>\n<li><strong>Rate<\/strong> (<em>lambda<\/em>) = Rate of change in the distribution.<\/li>\n<\/ul>\n<p>We can define a distribution with a mean of 50 and sample random numbers from this distribution. We can achieve this using the <a href=\"https:\/\/docs.scipy.org\/doc\/numpy\/reference\/generated\/numpy.random.exponential.html\">exponential() NumPy function<\/a>.<\/p>\n<p>The example below samples and prints 10 numbers from this distribution.<\/p>\n<pre class=\"crayon-plain-tag\"># sample an exponential distribution\r\nfrom numpy.random import exponential\r\n# define the distribution\r\nbeta = 50\r\nn = 10\r\n# generate the sample\r\nsample = exponential(beta, n)\r\nprint(sample)<\/pre>\n<p>Running the example prints 10 numbers randomly sampled from the defined distribution.<\/p>\n<pre class=\"crayon-plain-tag\">[  3.32742946  39.10165624  41.86856606  85.0030387   28.18425491\r\n  68.20434637 106.34826579  19.63637359  17.13805423  15.91135881]<\/pre>\n<p>We can define an exponential distribution using the <a href=\"https:\/\/docs.scipy.org\/doc\/scipy\/reference\/generated\/scipy.stats.expon.html\">expon() SciPy function<\/a> and then calculate properties such as the moments, PDF, CDF, and more.<\/p>\n<p>The example below defines a range of observations between 50 and 70 and calculates the probability and cumulative probability for each and plots the result.<\/p>\n<pre class=\"crayon-plain-tag\"># pdf and cdf for an exponential distribution\r\nfrom scipy.stats import expon\r\nfrom matplotlib import pyplot\r\n# define distribution parameter\r\nbeta = 50\r\n# create distribution\r\ndist = expon(beta)\r\n# plot pdf\r\nvalues = [value for value in range(50, 70)]\r\nprobabilities = [dist.pdf(value) for value in values]\r\npyplot.plot(values, probabilities)\r\npyplot.show()\r\n# plot cdf\r\ncprobs = [dist.cdf(value) for value in values]\r\npyplot.plot(values, cprobs)\r\npyplot.show()<\/pre>\n<p>Running the example first creates a line plot of outcomes versus probabilities, showing a familiar exponential probability distribution shape.<\/p>\n<div id=\"attachment_8738\" style=\"width: 1290px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-8738\" class=\"size-full wp-image-8738\" src=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2019\/08\/Line-Plot-of-Events-vs-Probability-or-the-Probability-Density-Function-for-the-Exponential-Distribution.png\" alt=\"Line Plot of Events vs. Probability or the Probability Density Function for the Exponential Distribution\" width=\"1280\" height=\"960\" srcset=\"http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2019\/08\/Line-Plot-of-Events-vs-Probability-or-the-Probability-Density-Function-for-the-Exponential-Distribution.png 1280w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2019\/08\/Line-Plot-of-Events-vs-Probability-or-the-Probability-Density-Function-for-the-Exponential-Distribution-300x225.png 300w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2019\/08\/Line-Plot-of-Events-vs-Probability-or-the-Probability-Density-Function-for-the-Exponential-Distribution-768x576.png 768w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2019\/08\/Line-Plot-of-Events-vs-Probability-or-the-Probability-Density-Function-for-the-Exponential-Distribution-1024x768.png 1024w\" sizes=\"(max-width: 1280px) 100vw, 1280px\"><\/p>\n<p id=\"caption-attachment-8738\" class=\"wp-caption-text\">Line Plot of Events vs. Probability or the Probability Density Function for the Exponential Distribution<\/p>\n<\/div>\n<p>Next, the cumulative probabilities for each outcome are calculated and graphed as a line plot, showing that after perhaps a value of 55 that almost 100% of the expected values will be observed.<\/p>\n<div id=\"attachment_8739\" style=\"width: 1290px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-8739\" class=\"size-full wp-image-8739\" src=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2019\/08\/Line-Plot-of-Events-vs-Cumulative-Probability-or-the-Cumulative-Density-Function-for-the-Exponential-Distribution.png\" alt=\"Line Plot of Events vs. Cumulative Probability or the Cumulative Density Function for the Exponential Distribution\" width=\"1280\" height=\"960\" srcset=\"http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2019\/08\/Line-Plot-of-Events-vs-Cumulative-Probability-or-the-Cumulative-Density-Function-for-the-Exponential-Distribution.png 1280w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2019\/08\/Line-Plot-of-Events-vs-Cumulative-Probability-or-the-Cumulative-Density-Function-for-the-Exponential-Distribution-300x225.png 300w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2019\/08\/Line-Plot-of-Events-vs-Cumulative-Probability-or-the-Cumulative-Density-Function-for-the-Exponential-Distribution-768x576.png 768w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2019\/08\/Line-Plot-of-Events-vs-Cumulative-Probability-or-the-Cumulative-Density-Function-for-the-Exponential-Distribution-1024x768.png 1024w\" sizes=\"(max-width: 1280px) 100vw, 1280px\"><\/p>\n<p id=\"caption-attachment-8739\" class=\"wp-caption-text\">Line Plot of Events vs. Cumulative Probability or the Cumulative Density Function for the Exponential Distribution<\/p>\n<\/div>\n<p>An important related distribution is the double exponential distribution, also called the <a href=\"https:\/\/en.wikipedia.org\/wiki\/Laplace_distribution\">Laplace distribution<\/a>.<\/p>\n<h2>Pareto Distribution<\/h2>\n<p>A <a href=\"https:\/\/en.wikipedia.org\/wiki\/Pareto_distribution\">Pareto distribution<\/a> is named after <a href=\"https:\/\/en.wikipedia.org\/wiki\/Vilfredo_Pareto\">Vilfredo Pareto<\/a> and is may be referred to as a <a href=\"https:\/\/en.wikipedia.org\/wiki\/Power_law\">power-law distribution<\/a>.<\/p>\n<p>It is also related to the <a href=\"https:\/\/en.wikipedia.org\/wiki\/Pareto_principle\">Pareto principle<\/a> (or 80\/20 rule) which is a heuristic for continuous random variables that follow a Pareto distribution, where 80% of the events are covered by 20% of the range of outcomes, e.g. most events are drawn from just 20% of the range of the continuous variable.<\/p>\n<p>The Pareto principle is just a heuristic for a specific Pareto distribution, specifically the Pareto Type II distribution, that is perhaps most interesting and on which we will focus.<\/p>\n<p>Some examples of domains that have Pareto distributed events include:<\/p>\n<ul>\n<li>The income of households in a country.<\/li>\n<li>The total sales of books.<\/li>\n<li>The scores by players on a sports team.<\/li>\n<\/ul>\n<p>The distribution can be defined using one parameter:<\/p>\n<ul>\n<li><strong>Shape<\/strong> (<em>alpha<\/em>): The steepness of the decease in probability.<\/li>\n<\/ul>\n<p>Values for the shape parameter are often small, such as between 1 and 3, with the Pareto principle given when alpha is set to 1.161.<\/p>\n<p>We can define a distribution with a shape of 1.1 and sample random numbers from this distribution. We can achieve this using the <a href=\"https:\/\/docs.scipy.org\/doc\/numpy\/reference\/generated\/numpy.random.pareto.html\">pareto() NumPy function<\/a>.<\/p>\n<pre class=\"crayon-plain-tag\"># sample a pareto distribution\r\nfrom numpy.random import pareto\r\n# define the distribution\r\nalpha = 1.1\r\nn = 10\r\n# generate the sample\r\nsample = pareto(alpha, n)\r\nprint(sample)<\/pre>\n<p>Running the example prints 10 numbers randomly sampled from the defined distribution.<\/p>\n<pre class=\"crayon-plain-tag\">[0.5049704  0.0140647  2.13105224 3.10991217 2.87575892 1.06602639\r\n 0.22776379 0.37405415 0.96618778 3.94789299]<\/pre>\n<p>We can define a Pareto distribution using the <a href=\"https:\/\/docs.scipy.org\/doc\/scipy\/reference\/generated\/scipy.stats.pareto.html\">pareto() SciPy function<\/a> and then calculate properties, such as the moments, PDF, CDF, and more.<\/p>\n<p>The example below defines a range of observations between 1 and about 10 and calculates the probability and cumulative probability for each and plots the result.<\/p>\n<pre class=\"crayon-plain-tag\"># pdf and cdf for a pareto distribution\r\nfrom scipy.stats import pareto\r\nfrom matplotlib import pyplot\r\n# define distribution parameter\r\nalpha = 1.5\r\n# create distribution\r\ndist = pareto(alpha)\r\n# plot pdf\r\nvalues = [value\/10.0 for value in range(10, 100)]\r\nprobabilities = [dist.pdf(value) for value in values]\r\npyplot.plot(values, probabilities)\r\npyplot.show()\r\n# plot cdf\r\ncprobs = [dist.cdf(value) for value in values]\r\npyplot.plot(values, cprobs)\r\npyplot.show()<\/pre>\n<p>Running the example first creates a line plot of outcomes versus probabilities, showing a familiar Pareto probability distribution shape.<\/p>\n<div id=\"attachment_8740\" style=\"width: 1290px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-8740\" class=\"size-full wp-image-8740\" src=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2019\/08\/Line-Plot-of-Events-vs-Probability-or-the-Probability-Density-Function-for-the-Pareto-Distribution.png\" alt=\"Line Plot of Events vs. Probability or the Probability Density Function for the Pareto Distribution\" width=\"1280\" height=\"960\" srcset=\"http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2019\/08\/Line-Plot-of-Events-vs-Probability-or-the-Probability-Density-Function-for-the-Pareto-Distribution.png 1280w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2019\/08\/Line-Plot-of-Events-vs-Probability-or-the-Probability-Density-Function-for-the-Pareto-Distribution-300x225.png 300w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2019\/08\/Line-Plot-of-Events-vs-Probability-or-the-Probability-Density-Function-for-the-Pareto-Distribution-768x576.png 768w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2019\/08\/Line-Plot-of-Events-vs-Probability-or-the-Probability-Density-Function-for-the-Pareto-Distribution-1024x768.png 1024w\" sizes=\"(max-width: 1280px) 100vw, 1280px\"><\/p>\n<p id=\"caption-attachment-8740\" class=\"wp-caption-text\">Line Plot of Events vs. Probability or the Probability Density Function for the Pareto Distribution<\/p>\n<\/div>\n<p>Next, the cumulative probabilities for each outcome are calculated and graphed as a line plot, showing a rise that is less steep than the exponential distribution seen in the previous section.<\/p>\n<div id=\"attachment_8741\" style=\"width: 1290px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-8741\" class=\"size-full wp-image-8741\" src=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2019\/08\/Line-Plot-of-Events-vs-Cumulative-Probability-or-the-Cumulative-Density-Function-for-the-Pareto-Distribution.png\" alt=\"Line Plot of Events vs. Cumulative Probability or the Cumulative Density Function for the Pareto Distribution\" width=\"1280\" height=\"960\" srcset=\"http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2019\/08\/Line-Plot-of-Events-vs-Cumulative-Probability-or-the-Cumulative-Density-Function-for-the-Pareto-Distribution.png 1280w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2019\/08\/Line-Plot-of-Events-vs-Cumulative-Probability-or-the-Cumulative-Density-Function-for-the-Pareto-Distribution-300x225.png 300w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2019\/08\/Line-Plot-of-Events-vs-Cumulative-Probability-or-the-Cumulative-Density-Function-for-the-Pareto-Distribution-768x576.png 768w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2019\/08\/Line-Plot-of-Events-vs-Cumulative-Probability-or-the-Cumulative-Density-Function-for-the-Pareto-Distribution-1024x768.png 1024w\" sizes=\"(max-width: 1280px) 100vw, 1280px\"><\/p>\n<p id=\"caption-attachment-8741\" class=\"wp-caption-text\">Line Plot of Events vs. Cumulative Probability or the Cumulative Density Function for the Pareto Distribution<\/p>\n<\/div>\n<h2>Further Reading<\/h2>\n<p>This section provides more resources on the topic if you are looking to go deeper.<\/p>\n<h3>Books<\/h3>\n<ul>\n<li>Chapter 2: Probability Distributions, <a href=\"https:\/\/amzn.to\/2JwHE7I\">Pattern Recognition and Machine Learning<\/a>, 2006.<\/li>\n<li>Section 3.9: Common Probability Distributions, <a href=\"https:\/\/amzn.to\/2lnc3vL\">Deep Learning<\/a>, 2016.<\/li>\n<li>Section 2.3: Some common discrete distributions, <a href=\"https:\/\/amzn.to\/2xKSTCP\">Machine Learning: A Probabilistic Perspective<\/a>, 2012.<\/li>\n<\/ul>\n<h3>API<\/h3>\n<ul>\n<li><a href=\"https:\/\/docs.scipy.org\/doc\/scipy\/reference\/tutorial\/stats\/continuous.html\">Continuous Statistical Distributions, SciPy<\/a>.<\/li>\n<li><a href=\"https:\/\/docs.scipy.org\/doc\/numpy\/reference\/routines.random.html\">Random sampling (numpy.random), NumPy<\/a>.<\/li>\n<\/ul>\n<h3>Articles<\/h3>\n<ul>\n<li><a href=\"https:\/\/en.wikipedia.org\/wiki\/Normal_distribution\">Normal distribution, Wikipedia<\/a>.<\/li>\n<li><a href=\"https:\/\/en.wikipedia.org\/wiki\/68%E2%80%9395%E2%80%9399.7_rule\">68\u201395\u201399.7 rule, Wikipedia<\/a>.<\/li>\n<li><a href=\"https:\/\/en.wikipedia.org\/wiki\/Exponential_distribution\">Exponential distribution, Wikipedia<\/a>.<\/li>\n<li><a href=\"https:\/\/en.wikipedia.org\/wiki\/Pareto_distribution\">Pareto distribution, Wikipedia<\/a>.<\/li>\n<\/ul>\n<h2>Summary<\/h2>\n<p>In this tutorial, you discovered continuous probability distributions used in machine learning.<\/p>\n<p>Specifically, you learned:<\/p>\n<ul>\n<li>The probability of outcomes for continuous random variables can be summarized using continuous probability distributions.<\/li>\n<li>How to parametrize, define, and randomly sample from common continuous probability distributions.<\/li>\n<li>How to create probability density and cumulative density plots for common continuous probability distributions.<\/li>\n<\/ul>\n<p>Do you have any questions?<br \/>\nAsk your questions in the comments below and I will do my best to answer.<\/p>\n<p>The post <a rel=\"nofollow\" href=\"https:\/\/machinelearningmastery.com\/continuous-probability-distributions-for-machine-learning\/\">Continuous Probability Distributions for Machine Learning<\/a> appeared first on <a rel=\"nofollow\" href=\"https:\/\/machinelearningmastery.com\/\">Machine Learning Mastery<\/a>.<\/p>\n<\/div>\n<p><a href=\"https:\/\/machinelearningmastery.com\/continuous-probability-distributions-for-machine-learning\/\">Go to Source<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Author: Jason Brownlee The probability for a continuous random variable can be summarized with a continuous probability distribution. Continuous probability distributions are encountered in machine [&hellip;] <span class=\"read-more-link\"><a class=\"read-more\" href=\"https:\/\/www.aiproblog.com\/index.php\/2019\/09\/22\/continuous-probability-distributions-for-machine-learning\/\">Read More<\/a><\/span><\/p>\n","protected":false},"author":1,"featured_media":2606,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_bbp_topic_count":0,"_bbp_reply_count":0,"_bbp_total_topic_count":0,"_bbp_total_reply_count":0,"_bbp_voice_count":0,"_bbp_anonymous_reply_count":0,"_bbp_topic_count_hidden":0,"_bbp_reply_count_hidden":0,"_bbp_forum_subforum_count":0,"footnotes":""},"categories":[24],"tags":[],"_links":{"self":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/2605"}],"collection":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/comments?post=2605"}],"version-history":[{"count":0,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/2605\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media\/2606"}],"wp:attachment":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media?parent=2605"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/categories?post=2605"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/tags?post=2605"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}