{"id":2634,"date":"2019-10-01T06:31:59","date_gmt":"2019-10-01T06:31:59","guid":{"rendered":"https:\/\/www.aiproblog.com\/index.php\/2019\/10\/01\/correlation-does-not-equal-causation-but-how-exactly-do-you-determine-causation\/"},"modified":"2019-10-01T06:31:59","modified_gmt":"2019-10-01T06:31:59","slug":"correlation-does-not-equal-causation-but-how-exactly-do-you-determine-causation","status":"publish","type":"post","link":"https:\/\/www.aiproblog.com\/index.php\/2019\/10\/01\/correlation-does-not-equal-causation-but-how-exactly-do-you-determine-causation\/","title":{"rendered":"Correlation does not equal causation but How exactly do you determine causation?"},"content":{"rendered":"<p>Author: ajit jaokar<\/p>\n<div>\n<h1><a href=\"https:\/\/storage.ning.com\/topology\/rest\/1.0\/file\/get\/3636885413?profile=original\" target=\"_blank\" rel=\"noopener noreferrer\"><img decoding=\"async\" src=\"https:\/\/storage.ning.com\/topology\/rest\/1.0\/file\/get\/3636885413?profile=RESIZE_710x\" class=\"align-full\"><\/a><\/h1>\n<p>\u00a0<\/p>\n<p>\u00a0<\/p>\n<h2>Introduction<\/h2>\n<p>\u00a0<\/p>\n<p><strong><em>Co-relation does not equal causation<\/em><\/strong> \u2013 is a mantra drilled into a Data Scientist from an early age<\/p>\n<p>That\u2019s fine ..<\/p>\n<p>But very few talk of the follow-on question ..<\/p>\n<p><strong><em>How exactly do you determine causation?<\/em><\/strong><\/p>\n<p>This problem is further compounded because most books and examples are based on standard datasets (ex: Boston, Iris etc) .<\/p>\n<p>These examples do not discuss causation because the features chosen are already determined to be causal (ex the factors affecting house prices are chosen to be causal)<\/p>\n<p>So, if we start from the beginning (without simplified examples) how do you know if a particular variable is a causal variable?<\/p>\n<p>Firstly, causality cannot be determined from data alone.<\/p>\n<p>Data gives co-relation, but data alone cannot determine causation<\/p>\n<p>To determine causation, we need to perform an <strong>experiment or a controlled study<\/strong><\/p>\n<h2>Background<\/h2>\n<p>In a statistical sense, two or more variables are related if their values change correspondingly i.e. increase or decrease together. On the other hand, if there is a causal relationship between two variables, then the occurrence of one depends on the other i.e. they exhibit a cause and effect relationship. For example, smoking causes lung cancer is a causal relationship while smoking is correlated to alcoholism but does not cause alcoholism. \u00a0<\/p>\n<p>Correlation is typically measured using Pearson\u2019s coefficient or Spearman\u2019s coefficient. If there is correlation, then further investigation is needed to establish if there is a causal relationship.<\/p>\n<h2>How can causation be established?<\/h2>\n<p>The most effective way of establishing causation is by means of a controlled study.<\/p>\n<p>In a controlled study, the sample or population is split in two, with both groups being comparable in almost every way.<\/p>\n<p>The two groups then receive different treatments, and the outcomes of each group are assessed.\u00a0<\/p>\n<p>For example, in medical research, one group is given a placebo whereas the other group is given a new medication.<\/p>\n<p>So, in a nutshell &#8211; <em>&#8220;To find out what happens when you change something, it is necessary to change it.&#8221;&#8230;<\/em>There are things you learn from perturbing a system that you&#8217;ll never find out from any amount of passive observation.<\/p>\n<p>Source: <a href=\"http:\/\/people.umass.edu\/~stanek\/pdffiles\/causal-holland.pdf\">http:\/\/people.umass.edu\/~stanek\/pdffiles\/causal-holland.pdf<\/a><\/p>\n<p>\u00a0<\/p>\n<p>The design of controlled experiments is a non-trivial exercise:<\/p>\n<ul>\n<li>You may have measurement error problems<\/li>\n<li>subjects might drop the study or not follow instructions, among other issues.<\/li>\n<li>You will need to make assumptions about how things are related to determine inference.<\/li>\n<li>You may have incomplete\/imprecise data<\/li>\n<li>Target causal quantity of interest may not be well defined<\/li>\n<li>Confounding variables. A confounder is a variable that influences both the dependent variable and independent variable, causing a spurious association.<\/li>\n<li>Selection bias (self-selection, truncated samples)<\/li>\n<li>Measurement error (that can induce confounding, not only noise)<\/li>\n<li>Misspecification (e.g., wrong functional form)<\/li>\n<li>External validity problems (wrong inference to target population)<\/li>\n<\/ul>\n<p>Adapted from <a href=\"https:\/\/stats.stackexchange.com\/questions\/2245\/statistics-and-causal-inference\">source<\/a><\/p>\n<p>Finally, there are some methods like the <a href=\"https:\/\/en.wikipedia.org\/wiki\/Granger_causality\">Granger causality<\/a> that is a statistical method which demonstrates some causality (with limitations)<\/p>\n<p>\u00a0<\/p>\n<h2>Sources<\/h2>\n<p><a href=\"https:\/\/abs.gov.au\/websitedbs\/a3121120.nsf\/home\/statistical+language+-+correlation+and+causation\">https:\/\/abs.gov.au\/websitedbs\/a3121120.nsf\/home\/statistical+language+-+correlation+and+causation<\/a><\/p>\n<p><a href=\"https:\/\/towardsdatascience.com\/why-do-we-need-causality-in-data-science-aec710da021e\">Why do we need causality in data science<\/a><\/p>\n<p>Image source: <a href=\"https:\/\/www.khanacademy.org\/science\/high-school-biology\/hs-biology-foundations\/hs-biology-and-the-scientific-method\/a\/experiments-and-observations\">Khan academy<\/a><\/p>\n<\/div>\n<p><a href=\"https:\/\/www.datasciencecentral.com\/xn\/detail\/6448529:BlogPost:892693\">Go to Source<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Author: ajit jaokar \u00a0 \u00a0 Introduction \u00a0 Co-relation does not equal causation \u2013 is a mantra drilled into a Data Scientist from an early age [&hellip;] <span class=\"read-more-link\"><a class=\"read-more\" href=\"https:\/\/www.aiproblog.com\/index.php\/2019\/10\/01\/correlation-does-not-equal-causation-but-how-exactly-do-you-determine-causation\/\">Read More<\/a><\/span><\/p>\n","protected":false},"author":1,"featured_media":469,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_bbp_topic_count":0,"_bbp_reply_count":0,"_bbp_total_topic_count":0,"_bbp_total_reply_count":0,"_bbp_voice_count":0,"_bbp_anonymous_reply_count":0,"_bbp_topic_count_hidden":0,"_bbp_reply_count_hidden":0,"_bbp_forum_subforum_count":0,"footnotes":""},"categories":[26],"tags":[],"_links":{"self":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/2634"}],"collection":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/comments?post=2634"}],"version-history":[{"count":0,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/2634\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media\/465"}],"wp:attachment":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media?parent=2634"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/categories?post=2634"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/tags?post=2634"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}