{"id":2619,"date":"2019-09-26T19:00:57","date_gmt":"2019-09-26T19:00:57","guid":{"rendered":"https:\/\/www.aiproblog.com\/index.php\/2019\/09\/26\/a-gentle-introduction-to-joint-marginal-and-conditional-probability\/"},"modified":"2019-09-26T19:00:57","modified_gmt":"2019-09-26T19:00:57","slug":"a-gentle-introduction-to-joint-marginal-and-conditional-probability","status":"publish","type":"post","link":"https:\/\/www.aiproblog.com\/index.php\/2019\/09\/26\/a-gentle-introduction-to-joint-marginal-and-conditional-probability\/","title":{"rendered":"A Gentle Introduction to Joint, Marginal, and Conditional Probability"},"content":{"rendered":"<p>Author: Jason Brownlee<\/p>\n<div>\n<p>Probability quantifies the uncertainty of the outcomes of a random variable.<\/p>\n<p>It is relatively easy to understand and compute the probability for a single variable. Nevertheless, in machine learning, we often have many random variables that interact in often complex and unknown ways.<\/p>\n<p>There are specific techniques that can be used to quantify the probability for multiple random variables, such as the joint, marginal, and conditional probability. These techniques provide the basis for a probabilistic understanding of fitting a predictive model to data.<\/p>\n<p>In this post, you will discover a gentle introduction to joint, marginal, and conditional probability for multiple random variables.<\/p>\n<p>After reading this post, you will know:<\/p>\n<ul>\n<li>Joint probability is the probability of two events occurring simultaneously.<\/li>\n<li>Marginal probability is the probability of an event irrespective of the outcome of another variable.<\/li>\n<li>Conditional probability is the probability of one event occurring in the presence of a second event.<\/li>\n<\/ul>\n<p>Discover bayes opimization, naive bayes, maximum likelihood, distributions, cross entropy, and much more <a href=\"https:\/\/machinelearningmastery.com\/probability-for-machine-learning\/\" rel=\"nofollow\">in my new book<\/a>, with 28 step-by-step tutorials and full Python source code.<\/p>\n<p>Let\u2019s get started.<\/p>\n<div id=\"attachment_8772\" style=\"width: 650px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-8772\" class=\"size-full wp-image-8772\" src=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2019\/09\/A-Gentle-Introduction-to-Joint-Marginal-and-Conditional-Probability.jpg\" alt=\"A Gentle Introduction to Joint, Marginal, and Conditional Probability\" width=\"640\" height=\"427\" srcset=\"http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2019\/09\/A-Gentle-Introduction-to-Joint-Marginal-and-Conditional-Probability.jpg 640w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2019\/09\/A-Gentle-Introduction-to-Joint-Marginal-and-Conditional-Probability-300x200.jpg 300w\" sizes=\"(max-width: 640px) 100vw, 640px\"><\/p>\n<p id=\"caption-attachment-8772\" class=\"wp-caption-text\">A Gentle Introduction to Joint, Marginal, and Conditional Probability<br \/>Photo by <a href=\"https:\/\/www.flickr.com\/photos\/alwbutler\/6929964622\/\">Masterbutler<\/a>, some rights reserved.<\/p>\n<\/div>\n<h2>Overview<\/h2>\n<p>This tutorial is divided into three parts; they are:<\/p>\n<ol>\n<li>Probability of One Random Variable<\/li>\n<li>Probability of Multiple Random Variables<\/li>\n<li>Probability of Independence and Exclusivity<\/li>\n<\/ol>\n<h2>Probability of One Random Variable<\/h2>\n<p>Probability quantifies the likelihood of an event.<\/p>\n<p>Specifically, it quantifies how likely a specific outcome is for a random variable, such as the flip of a coin, the roll of a dice, or drawing a playing card from a deck.<\/p>\n<blockquote>\n<p>Probability gives a measure of how likely it is for something to happen.<\/p>\n<\/blockquote>\n<p>\u2014 Page 57, <a href=\"https:\/\/amzn.to\/2jULJsu\">Probability: For the Enthusiastic Beginner<\/a>, 2016.<\/p>\n<p>For a random variable <em>x<\/em>, <em>P(x)<\/em> is a function that assigns a probability to all values of <em>x<\/em>.<\/p>\n<ul>\n<li>Probability Density of x = P(x)<\/li>\n<\/ul>\n<p>The probability of a specific event <em>A<\/em> for a random variable x is denoted as <em>P(x=A)<\/em>, or simply as <em>P(A).<\/em><\/p>\n<ul>\n<li>Probability of Event A = P(A)<\/li>\n<\/ul>\n<p>Probability is calculated as the number of desired outcomes divided by the total possible outcomes, in the case where all outcomes are equally likely.<\/p>\n<ul>\n<li>Probability = (number of desired outcomes) \/ (total number of possible outcomes)<\/li>\n<\/ul>\n<p>This is intuitive if we think about a discrete random variable such as the roll of a die. For example, the probability of a die rolling a 5 is calculated as one outcome of rolling a 5 (1) divided by the total number of discrete outcomes (6) or 1\/6 or about 0.1666 or about 16.666%.<\/p>\n<p>The sum of the probabilities of all outcomes must equal one. If not, we do not have valid probabilities.<\/p>\n<ul>\n<li>Sum of the Probabilities for All Outcomes = 1.0.<\/li>\n<\/ul>\n<p>The probability of an impossible outcome is zero. For example, it is impossible to roll a 7 with a standard six-sided die.<\/p>\n<ul>\n<li>Probability of Impossible Outcome = 0.0<\/li>\n<\/ul>\n<p>The probability of a certain outcome is one. For example, it is certain that a value between 1 and 6 will occur when rolling a six-sided die.<\/p>\n<ul>\n<li>Probability of Certain Outcome = 1.0<\/li>\n<\/ul>\n<p>The probability of an event not occurring, called the complement.<\/p>\n<p>This can be calculated by one minus the probability of the event, or <em>1 \u2013 P(A)<\/em>. For example, the probability of not rolling a 5 would be 1 \u2013 P(5) or 1 \u2013 0.166 or about 0.833 or about 83.333%.<\/p>\n<ul>\n<li>Probability of Not Event A = 1 \u2013 P(A)<\/li>\n<\/ul>\n<p>Now that we are familiar with the probability of one random variable, let\u2019s consider probability for multiple random variables.<\/p>\n<div class=\"woo-sc-hr\"><\/div>\n<p><center><\/p>\n<h3>Want to Learn Probability for Machine Learning<\/h3>\n<p>Take my free 7-day email crash course now (with sample code).<\/p>\n<p>Click to sign-up and also get a free PDF Ebook version of the course.<\/p>\n<p><a href=\"https:\/\/machinelearningmastery.lpages.co\/leadbox\/16cf92561172a2%3A164f8be4f346dc\/4623731828588544\/\" target=\"_blank\" style=\"background: rgb(255, 206, 10); color: rgb(255, 255, 255); text-decoration: none; font-family: Helvetica, Arial, sans-serif; font-weight: bold; font-size: 16px; line-height: 20px; padding: 10px; display: inline-block; max-width: 300px; border-radius: 5px; text-shadow: rgba(0, 0, 0, 0.25) 0px -1px 1px; box-shadow: rgba(255, 255, 255, 0.5) 0px 1px 3px inset, rgba(0, 0, 0, 0.5) 0px 1px 3px;\" rel=\"noopener noreferrer\">Download Your FREE Mini-Course<\/a><script data-leadbox=\"16cf92561172a2:164f8be4f346dc\" data-url=\"https:\/\/machinelearningmastery.lpages.co\/leadbox\/16cf92561172a2%3A164f8be4f346dc\/4623731828588544\/\" data-config=\"%7B%7D\" type=\"text\/javascript\" src=\"https:\/\/machinelearningmastery.lpages.co\/leadbox-1568216021.js\"><\/script><\/p>\n<p><\/center><\/p>\n<div class=\"woo-sc-hr\"><\/div>\n<h2>Probability of Multiple Random Variables<\/h2>\n<p>In machine learning, we are likely to work with many random variables.<\/p>\n<p>For example, given a table of data, such as in excel, each row represents a separate observation or event, and each column represents a separate random variable.<\/p>\n<p>Variables may be either discrete, meaning that they take on a finite set of values, or continuous, meaning they take on a real or numerical value.<\/p>\n<p>As such, we are interested in the probability across two or more random variables.<\/p>\n<p>This is complicated as there are many ways that random variables can interact, which, in turn, impacts their probabilities.<\/p>\n<p>This can be simplified by reducing the discussion to just two random variables (<em>X, Y<\/em>), although the principles generalize to multiple variables.<\/p>\n<p>And further, to discuss the probability of just two events, one for each variable (<em>X=A, Y=B<\/em>), although we could just as easily be discussing groups of events for each variable.<\/p>\n<p>Therefore, we will introduce the probability of multiple random variables as the probability of event <em>A<\/em> and event <em>B<\/em>, which in shorthand is <em>X=A<\/em> and <em>Y=B<\/em>.<\/p>\n<p>We assume that the two variables are related or dependent in some way.<\/p>\n<p>As such, there are three main types of probability we might want to consider; they are:<\/p>\n<ul>\n<li><strong>Joint Probability<\/strong>: Probability of events <em>A<\/em> and <em>B<\/em>.<\/li>\n<li><strong>Marginal Probability<\/strong>:\u00a0Probability of event <em>A<\/em> given variable <em>B<\/em>.<\/li>\n<li><strong>Conditional Probability<\/strong>: Probability of event <em>A<\/em> given event <em>B<\/em>.<\/li>\n<\/ul>\n<p>These types of probability form the basis of much of predictive modeling with problems such as classification and regression. For example:<\/p>\n<ul>\n<li>The probability of a row of data is the joint probability across each input variable.<\/li>\n<li>The probability of a specific value of one input variable is the marginal probability across the values of the other input variables.<\/li>\n<li>The predictive model itself is an estimate of the conditional probability of an output given an input example.<\/li>\n<\/ul>\n<p>Joint, marginal, and conditional probability are foundational in machine learning.<\/p>\n<p>Let\u2019s take a closer look at each in turn.<\/p>\n<h3>Joint Probability of Two Variables<\/h3>\n<p>We may be interested in the probability of two simultaneous events, e.g. the outcomes of two different random variables.<\/p>\n<p>The probability of two (or more) events is called the <a href=\"https:\/\/en.wikipedia.org\/wiki\/Joint_probability_distribution\">joint probability<\/a>. The joint probability of two or more random variables is referred to as the joint probability distribution.<\/p>\n<p>For example, the joint probability of event <em>A<\/em> and event <em>B<\/em> is written formally as:<\/p>\n<ul>\n<li>P(A and B)<\/li>\n<\/ul>\n<p>The \u201c<em>and<\/em>\u201d or conjunction is denoted using the upside down capital \u201c<em>U<\/em>\u201d operator \u201c<em>^<\/em>\u201d or sometimes a comma \u201c,\u201d.<\/p>\n<ul>\n<li>P(A ^ B)<\/li>\n<li>P(A, B)<\/li>\n<\/ul>\n<p>The joint probability for events <em>A<\/em> and <em>B<\/em> is calculated the probability of event <em>A<\/em> given event <em>B<\/em> multiplied by the probability of event <em>B<\/em>.<\/p>\n<p>This can be stated formally as follows:<\/p>\n<ul>\n<li>P(A and B) = P(A given B) * P(B)<\/li>\n<\/ul>\n<p>The calculation of the joint probability is sometimes called the fundamental rule of probability or the \u201c<em>product rule<\/em>\u201d of probability.<\/p>\n<p>Here, <em>P(A given B)<\/em> is the probability of event A given that event B has occurred, called the conditional probability, described below.<\/p>\n<p>The joint probability is symmetrical, meaning that <em>P(A and B)<\/em> is the same as <em>P(B and A)<\/em>.<\/p>\n<h3>Marginal Probability<\/h3>\n<p>We may be interested in the probability of an event for one random variable, irrespective of the outcome of another random variable.<\/p>\n<p>For example, the probability of <em>X=A<\/em> for all outcomes of <em>Y<\/em>.<\/p>\n<p>The probability of one event in the presence of all (or a subset of) outcomes of the other random variable is called the <a href=\"https:\/\/en.wikipedia.org\/wiki\/Marginal_distribution\">marginal probability<\/a> or the marginal distribution. The marginal probability of one random variable in the presence of additional random variables is referred to as the marginal probability distribution.<\/p>\n<p>It is called the marginal probability because if all outcomes and probabilities for the two variables were laid out together in a table (<em>X<\/em> as columns, <em>Y<\/em> as rows), then the marginal probability of one variable (<em>X<\/em>) would be the sum of probabilities for the other variable (Y rows) on the margin of the table.<\/p>\n<p>There is no special notation for the marginal probability; it is just the sum or union over all the probabilities of all events for the second variable for a given fixed event for the first variable.<\/p>\n<ul>\n<li>P(X=A) = sum P(X=A, Y=yi) for all y<\/li>\n<\/ul>\n<p>This is another important foundational rule in probability, referred to as the \u201c<em>sum rule<\/em>.\u201d<\/p>\n<p>The marginal probability is different from the conditional probability (described next) because it considers the union of all events for the second variable rather than the probability of a single event.<\/p>\n<h3>Conditional Probability<\/h3>\n<p>We may be interested in the probability of an event given the occurrence of another event.<\/p>\n<p>The probability of one event given the occurrence of another event is called the <a href=\"https:\/\/en.wikipedia.org\/wiki\/Conditional_probability\">conditional probability<\/a>. The conditional probability of one to one or more random variables is referred to as the conditional probability distribution.<\/p>\n<p>For example, the conditional probability of event <em>A<\/em> given event <em>B<\/em> is written formally as:<\/p>\n<ul>\n<li>P(A given B)<\/li>\n<\/ul>\n<p>The \u201c<em>given<\/em>\u201d is denoted using the pipe \u201c|\u201d operator; for example:<\/p>\n<ul>\n<li>P(A | B)<\/li>\n<\/ul>\n<p>The conditional probability for events <em>A<\/em> given event <em>B<\/em> is calculated as follows:<\/p>\n<ul>\n<li>P(A given B) = P(A and B) \/ P(B)<\/li>\n<\/ul>\n<p>This calculation assumes that the probability of event <em>B<\/em> is not zero, e.g. is not impossible.<\/p>\n<p>The notion of event <em>A<\/em> given event <em>B<\/em> does not mean that event <em>B<\/em> has occurred (e.g. is certain); instead, it is the probability of event <em>A<\/em> occurring after or in the presence of event <em>B<\/em> for a given trial.<\/p>\n<h2>Probability of Independence and Exclusivity<\/h2>\n<p>When considering multiple random variables, it is possible that they do not interact.<\/p>\n<p>We may know or assume that two variables are not dependent upon each other instead are independent.<\/p>\n<p>Alternately, the variables may interact but their events may not occur simultaneously, referred to as exclusivity.<\/p>\n<p>We will take a closer look at the probability of multiple random variables under these circumstances in this section.<\/p>\n<h3>Independence<\/h3>\n<p>If one variable is not dependent on a second variable, this is called <a href=\"https:\/\/en.wikipedia.org\/wiki\/Independence_(probability_theory)\">independence<\/a> or statistical independence.<\/p>\n<p>This has an impact on calculating the probabilities of the two variables.<\/p>\n<p>For example, we may be interested in the joint probability of independent events <em>A<\/em> and <em>B<\/em>, which is the same as the probability of <em>A<\/em> and the probability of <em>B.<\/em><\/p>\n<p>Probabilities are combined using multiplication, therefore the joint probability of independent events is calculated as the probability of event <em>A<\/em> multiplied by the probability of event <em>B<\/em>.<\/p>\n<p>This can be stated formally as follows:<\/p>\n<ul>\n<li><strong>Joint Probability<\/strong>: P(A and B) = P(A) * P(B)<\/li>\n<\/ul>\n<p>As we might intuit, the marginal probability for an event for an independent random variable is simply the probability of the event.<\/p>\n<p>It is the idea of probability of a single random variable that are familiar with:<\/p>\n<ul>\n<li><strong>Marginal Probability<\/strong>: P(A)<\/li>\n<\/ul>\n<p>We refer to the marginal probability of an independent probability as simply the probability.<\/p>\n<p>Similarly, the conditional probability of <em>A<\/em> given <em>B<\/em> when the variables are independent is simply the probability of <em>A<\/em> as the probability of <em>B<\/em> has no effect. For example:<\/p>\n<ul>\n<li><strong>Conditional Probability<\/strong>: P(A given B) = P(A)<\/li>\n<\/ul>\n<p>We may be familiar with the notion of statistical independence from sampling. This assumes that one sample is unaffected by prior samples and does not affect future samples.<\/p>\n<p>Many machine learning algorithms assume that samples from a domain are independent to each other and come from the same probability distribution, referred to as <a href=\"https:\/\/en.wikipedia.org\/wiki\/Independent_and_identically_distributed_random_variables\">independent and identically distributed<\/a>, or i.i.d. for short.<\/p>\n<h3>Exclusivity<\/h3>\n<p>If the occurrence of one event excludes the occurrence of other events, then the events are said to be <a href=\"https:\/\/en.wikipedia.org\/wiki\/Mutual_exclusivity\">mutually exclusive<\/a>.<\/p>\n<p>The probability of the events are said to be disjoint, meaning that they cannot interact, are strictly independent.<\/p>\n<p>If the probability of event <em>A<\/em> is mutually exclusive with event <em>B<\/em>, then the joint probability of event <em>A<\/em> and event <em>B<\/em> is zero.<\/p>\n<ul>\n<li>P(A and B) = 0.0<\/li>\n<\/ul>\n<p>Instead, the probability of an outcome can be described as event <em>A<\/em> or event <em>B<\/em>, stated formally as follows:<\/p>\n<ul>\n<li>P(A or B) = P(A) + P(B)<\/li>\n<\/ul>\n<p>The \u201cor\u201d is also called a union and is denoted as a capital \u201c<em>U<\/em>\u201d letter; for example:<\/p>\n<ul>\n<li>P(A or B) = P(A U B)<\/li>\n<\/ul>\n<p>If the events are not mutually exclusive, we may be interested in the outcome of either event.<\/p>\n<p>The probability of non-mutually exclusive events is calculated as the probability of event <em>A<\/em> and the probability of event <em>B<\/em> minus the probability of both events occurring simultaneously.<\/p>\n<p>This can be stated formally as follows:<\/p>\n<ul>\n<li>P(A or B) = P(A) + P(B) \u2013 P(A and B)<\/li>\n<\/ul>\n<h2>Further Reading<\/h2>\n<p>This section provides more resources on the topic if you are looking to go deeper.<\/p>\n<h3>Books<\/h3>\n<ul>\n<li><a href=\"https:\/\/amzn.to\/2jULJsu\">Probability: For the Enthusiastic Beginner<\/a>, 2016.<\/li>\n<li><a href=\"https:\/\/amzn.to\/2JwHE7I\">Pattern Recognition and Machine Learning<\/a>, 2006.<\/li>\n<li><a href=\"https:\/\/amzn.to\/2xKSTCP\">Machine Learning: A Probabilistic Perspective<\/a>, 2012.<\/li>\n<\/ul>\n<h3>Articles<\/h3>\n<ul>\n<li><a href=\"https:\/\/en.wikipedia.org\/wiki\/Probability\">Probability, Wikipedia<\/a>.<\/li>\n<li><a href=\"https:\/\/en.wikipedia.org\/wiki\/Notation_in_probability_and_statistics\">Notation in probability and statistics, Wikipedia<\/a>.<\/li>\n<li><a href=\"https:\/\/en.wikipedia.org\/wiki\/Independence_(probability_theory)\">Independence (probability theory), Wikipedia<\/a>.<\/li>\n<li><a href=\"https:\/\/en.wikipedia.org\/wiki\/Independent_and_identically_distributed_random_variables\">Independent and identically distributed random variables, Wikipedia<\/a>.<\/li>\n<li><a href=\"https:\/\/en.wikipedia.org\/wiki\/Mutual_exclusivity\">Mutual exclusivity, Wikipedia<\/a>.<\/li>\n<li><a href=\"https:\/\/en.wikipedia.org\/wiki\/Marginal_distribution\">Marginal distribution, Wikipedia<\/a>.<\/li>\n<li><a href=\"https:\/\/en.wikipedia.org\/wiki\/Joint_probability_distribution\">Joint probability distribution, Wikipedia<\/a>.<\/li>\n<li><a href=\"https:\/\/en.wikipedia.org\/wiki\/Conditional_probability\">Conditional probability, Wikipedia<\/a>.<\/li>\n<\/ul>\n<h2>Summary<\/h2>\n<p>In this post, you discovered a gentle introduction to joint, marginal, and conditional probability for multiple random variables.<\/p>\n<p>Specifically, you learned:<\/p>\n<ul>\n<li>Joint probability is the probability of two events occurring simultaneously.<\/li>\n<li>Marginal probability is the probability of an event irrespective of the outcome of another variable.<\/li>\n<li>Conditional probability is the probability of one event occurring in the presence of a second event.<\/li>\n<\/ul>\n<p>Do you have any questions?<br \/>\nAsk your questions in the comments below and I will do my best to answer.<\/p>\n<p>The post <a rel=\"nofollow\" href=\"https:\/\/machinelearningmastery.com\/joint-marginal-and-conditional-probability-for-machine-learning\/\">A Gentle Introduction to Joint, Marginal, and Conditional Probability<\/a> appeared first on <a rel=\"nofollow\" href=\"https:\/\/machinelearningmastery.com\/\">Machine Learning Mastery<\/a>.<\/p>\n<\/div>\n<p><a href=\"https:\/\/machinelearningmastery.com\/joint-marginal-and-conditional-probability-for-machine-learning\/\">Go to Source<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Author: Jason Brownlee Probability quantifies the uncertainty of the outcomes of a random variable. It is relatively easy to understand and compute the probability for [&hellip;] <span class=\"read-more-link\"><a class=\"read-more\" href=\"https:\/\/www.aiproblog.com\/index.php\/2019\/09\/26\/a-gentle-introduction-to-joint-marginal-and-conditional-probability\/\">Read More<\/a><\/span><\/p>\n","protected":false},"author":1,"featured_media":2620,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_bbp_topic_count":0,"_bbp_reply_count":0,"_bbp_total_topic_count":0,"_bbp_total_reply_count":0,"_bbp_voice_count":0,"_bbp_anonymous_reply_count":0,"_bbp_topic_count_hidden":0,"_bbp_reply_count_hidden":0,"_bbp_forum_subforum_count":0,"footnotes":""},"categories":[24],"tags":[],"_links":{"self":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/2619"}],"collection":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/comments?post=2619"}],"version-history":[{"count":0,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/2619\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media\/2620"}],"wp:attachment":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media?parent=2619"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/categories?post=2619"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/tags?post=2619"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}