{"id":586,"date":"2018-06-06T06:37:58","date_gmt":"2018-06-06T06:37:58","guid":{"rendered":"https:\/\/www.aiproblog.com\/index.php\/2018\/06\/06\/is-model-bias-a-threat-to-equal-and-fair-treatment-maybe-maybe-not\/"},"modified":"2018-06-06T06:37:58","modified_gmt":"2018-06-06T06:37:58","slug":"is-model-bias-a-threat-to-equal-and-fair-treatment-maybe-maybe-not","status":"publish","type":"post","link":"https:\/\/www.aiproblog.com\/index.php\/2018\/06\/06\/is-model-bias-a-threat-to-equal-and-fair-treatment-maybe-maybe-not\/","title":{"rendered":"Is Model Bias a Threat to Equal and Fair Treatment?  Maybe, Maybe Not."},"content":{"rendered":"<p>Author: William Vorhies<\/p>\n<div>\n<p><strong><em>Summary:<\/em><\/strong><em>\u00a0 There is a great hue and cry about the danger of bias in our predictive models when applied to high significance events like who gets a loan, insurance, a good school assignment, or bail.\u00a0 It\u2019s not as simple as it seems and here we try to take a more nuanced look.\u00a0 The result is not as threatening as many headlines make it seem.<\/em><\/p>\n<p>\u00a0<\/p>\n<p><a href=\"http:\/\/api.ning.com\/files\/YGhOOL-l9HJlz2fXBmEEHXCgh3bjOEDDGUFc1Ocp5WysVJU1jtLc3Gz1RMitG4fLW4PpMU04xyYEfN84EjR6QVIeDfaRk0dT\/historyoffairness.png\" target=\"_self\"><img decoding=\"async\" src=\"http:\/\/api.ning.com\/files\/YGhOOL-l9HJlz2fXBmEEHXCgh3bjOEDDGUFc1Ocp5WysVJU1jtLc3Gz1RMitG4fLW4PpMU04xyYEfN84EjR6QVIeDfaRk0dT\/historyoffairness.png?width=350\" width=\"350\" class=\"align-right\"><\/a>Is social bias in our models a threat to equal and fair treatment?\u00a0 Here\u2019s a sample of recent headlines:<\/p>\n<ul>\n<li>Biased Algorithms Are Everywhere, and No One Seems to Care<\/li>\n<li>Researchers Combat Gender and Racial Bias in Artificial Intelligence<\/li>\n<li>Bias in Machine Learning and How to Stop It<\/li>\n<li>AI and Machine Learning Bias Has Dangerous Implications<\/li>\n<li>AI Professor Details Real-World Dangers of Algorithm Bias<\/li>\n<li><a target=\"_blank\" rel=\"noopener\"><\/a><span>When Algorithms Discriminate<\/span><\/li>\n<\/ul>\n<p>Holy smokes!\u00a0 The sky is falling.\u00a0 There\u2019s even an entire conference dedicated to the topic:\u00a0 the conference on Fairness, Accountability, and Transparency (FAT* \u2013 <span style=\"font-size: 8pt;\">it\u2019s their acronym, I didn\u2019t make this up<\/span>) now in its fifth year.<\/p>\n<p>But as you dig into this topic it\u2019s a lot more nuanced.\u00a0 As H.L. Mencken said, \u201cFor every complex problem there is a simple solution.\u00a0 And it\u2019s always wrong.\u201d<\/p>\n<p>\u00a0<\/p>\n<p><span style=\"font-size: 12pt;\"><strong>Exactly What Type of Bias are We Talking About?<\/strong><\/span><\/p>\n<p>There are actually many different types of bias.\u00a0 Microsoft identifies five types: association bias, automation bias, interaction bias, confirmation bias, and dataset bias.\u00a0 Others use different taxonomies.<\/p>\n<p>However, we shouldn\u2019t mix up socially harmful bias that can have a concrete and negative impact on our lives (who gets a loan, insurance, a job, a house, or bail) for types of bias based on our self-selection, like what we choose to read or who we elect to friend.\u00a0 It\u2019s the important stuff we want to focus on.<\/p>\n<p>\u00a0<\/p>\n<p><span style=\"font-size: 12pt;\"><strong>How Many Sources of Bias in Modeling Are There?<\/strong><\/span><\/p>\n<p>First, we\u2019re not talking about the academic and well understood tradeoff between bias and variance.\u00a0 <strong>We\u2019re talking about what causes models to be wrong, but with a high degree of confidence for some subsets of the population<\/strong>.<\/p>\n<p>\u00a0<\/p>\n<p><span style=\"font-size: 12pt;\"><strong>It\u2019s Not the Type of Model That\u2019s Used<\/strong><\/span><\/p>\n<p>Black box models, which mean ANNs of all stripes and particularly deep neural nets (DNNs) are constantly being fingered as a culprit.\u00a0 I\u2019ll grant you that ANNs in general and particularly those used in AI for image, text, and facial recognition generally lack sufficient transparency to describe exactly why a specific person was scored as accepted or rejected by the model.<\/p>\n<p>But the same is true of boosted, deep forest, ensembles, or many other techniques.\u00a0 Right now, only CNNs and RNN\/LSTMs are being used in AI applications but there are no examples I could find of systems using these models which have actually been widely adopted and are causing harm.\u00a0<\/p>\n<p>Yes, some DNNs did categorize people of color as gorillas but no one is being denied a valuable human or social service based on that type of system.\u00a0 We recognized the problem before it impacted anyone. \u00a0It\u2019s the good old \u2018customer preference\u2019 scoring models that pick winners and losers where we need to look.<\/p>\n<p>The important thing to understand is that even using the most transparent of GLM and simple decision tree models as our regulated industries are required to do, bias can still sneak in.<\/p>\n<p>\u00a0<\/p>\n<p><span style=\"font-size: 12pt;\"><strong>Costs and Benefits<\/strong><\/span><\/p>\n<p>Remember that the value of using more complex if less transparent techniques is increased accuracy.\u00a0 That means lower cost, greater efficiency, and even the elimination of some types of hidden human bias.<\/p>\n<p>We\u2019ve decided as a society that we want our regulated industries, primarily insurance and lending to use simple and completely explainable models.\u00a0 We intentionally gave up some of this efficiency.\u00a0<\/p>\n<p>While transparency has been agreed to be of value here, don\u2019t lose sight of the fact that this explicitly means that some individuals are paying more than they have to, or less than they should if the risks and rewards were more accurately modeled.<\/p>\n<p>Since we mentioned how models can eliminate some types of hidden human bias, here\u2019s a short note on a <a href=\"http:\/\/www.nber.org\/papers\/w9873\"><em>2003 study<\/em><\/a> that showed that when recruiters were given identical resumes to review, they selected more applicants with white-sounding names.\u00a0 When reviewing resumes with the names redacted but selected by the algorithm as potential good hires, the bias was eliminated.\u00a0 So perhaps as often as algorithms can introduce bias, they can also protect against it.<\/p>\n<p>\u00a0<\/p>\n<p><span style=\"font-size: 12pt;\"><strong>The Problem Is Always in the Data<\/strong><\/span><\/p>\n<p><a href=\"http:\/\/api.ning.com\/files\/YGhOOL-l9HLFtCjGWTL1XW0kgKGcfXZml6zFWUrBNMMglbIVojKIHNfOg6BHhrrH478xjEA63CJMg3pQHPRWmXsrV60l8aBf\/gender_balance.png\" target=\"_self\"><img decoding=\"async\" src=\"http:\/\/api.ning.com\/files\/YGhOOL-l9HLFtCjGWTL1XW0kgKGcfXZml6zFWUrBNMMglbIVojKIHNfOg6BHhrrH478xjEA63CJMg3pQHPRWmXsrV60l8aBf\/gender_balance.png?width=250\" width=\"250\" class=\"align-right\"><\/a>If there were enough data that equally represented outcomes for each social characteristic we want to protect (typically race, gender, sex, age, and religion in regulated industries) then modeling could always be fair.<\/p>\n<p>Sounds simple but it\u2019s actually more nuanced than that.\u00a0 It requires that you <strong>first define exactly what type of fairness you want<\/strong>.\u00a0 Is it by representation?\u00a0 Should each protected group be represented in equal numbers (equal parity) or proportionate to their percentage in the population (proportional parity\/disparate impact)?<\/p>\n<p>Or are you more concerned about the impact of the false positives and false negatives that can be minimized but never eliminated in modeling?\u00a0<\/p>\n<p>This is particularly important if your model impacts a very small percentage of the population (e.g. criminals or people with uncommon diseases).\u00a0 In which case you have to further decide if you want to protect from false positives or false negatives, or at least have parity in these occurrences for each protected group.<\/p>\n<p>IBM, Microsoft, and others are in the process of trying to provide tools to detect different types of bias, but the <a href=\"http:\/\/dsapp.uchicago.edu\/\"><em>Center for Data Science and Public Policy<\/em><\/a> at University of Chicago has already released an open source toolkit called Aequitas which can evaluate models for bias.\u00a0 They offer this decision tree for deciding which type of bias you want to focus on (it probably is not possible to solve for all four types and six variations at once).\u00a0<a href=\"http:\/\/api.ning.com\/files\/YGhOOL-l9HIjQcU6TSwrK1sB8hhSDtOS7j4Xy7lme79yKe9PdPVdb6k-7y5cGOY7iP2U4HnHfTar4yxQZXztFYqfRvax2yOg\/fairnesstree.png\" target=\"_self\"><img decoding=\"async\" src=\"http:\/\/api.ning.com\/files\/YGhOOL-l9HIjQcU6TSwrK1sB8hhSDtOS7j4Xy7lme79yKe9PdPVdb6k-7y5cGOY7iP2U4HnHfTar4yxQZXztFYqfRvax2yOg\/fairnesstree.png?width=550\" width=\"550\" class=\"align-center\"><\/a><\/p>\n<p>Too often the press and even some well-educated pundits have suggested radical solutions that simply don\u2019t take these facts into consideration.\u00a0 For example, the AI Now Institute published this recommendation as first among its 10 suggestions.\u00a0<\/p>\n<p><em>\u201c<\/em><em>Core public agencies, such as those responsible for criminal justice, healthcare, welfare, and education (e.g. \u201chigh stakes\u201d domains) should no longer use \u2018black box\u2019 AI and algorithmic systems.\u201d<\/em><\/p>\n<p>Their suggestion: submit any proposed model to field proof testing not unlike that used before drugs are allowed to be prescribed by the FDA.\u00a0 Delay benefit maybe, but for just how long and at what cost to conduct these tests.\u00a0 And what about model drift and refresh?<\/p>\n<p>Before we consider such radical steps, we need to think through the utility these systems are providing, what human biases they are eliminating, and specifically what type of bias we want to protect against.<\/p>\n<p>Having said that, this organization along with others is onto something when they point the finger at public agencies.<\/p>\n<p><strong>\u00a0<\/strong><\/p>\n<p><span style=\"font-size: 12pt;\"><strong>Where Are We Most Likely to be Harmed?<\/strong><\/span><\/p>\n<p>In addition to reviewing the literature I also called up some friends responsible for modeling in regulated industries, specifically insurance and lending.\u00a0 I\u2019ll talk more about that a little further down but I came away with the very strong impression that where we\u2019ve defined \u2018regulated\u2019 industries for modeling purposes, defined specifically what data they can and cannot use, and then made them accountable to typically state-level agencies who review these issues, that bias is a very small problem.<\/p>\n<p><strong>\u00a0<\/strong><\/p>\n<p><span style=\"font-size: 12pt;\"><strong>We Need to Watch Out for Unregulated Industries \u2013 The Most Important of Which are Public Agencies<\/strong><\/span><\/p>\n<p>This is not a pitch to extend data regulation to lots of other private sector industries.\u00a0 We\u2019ve already pretty much covered that waterfront.\u00a0 Turns out that the sort of \u201chigh stakes\u201d domains referred to above are pretty much all in the public sector.\u00a0<\/p>\n<p>Since the public sector isn\u2019t known for investing heavily in data science talent this leaves us with the double whammy of high impact and modest insight into the problem.<\/p>\n<p>However, using examples called out by these sources does not necessarily show that the models these agencies use are biased.\u00a0 In many cases they are simply wrong in their assumptions.\u00a0 Two examples called out by the AI Now Institute in fact date back three and four years and don\u2019t clearly show bias.<\/p>\n<p><span style=\"font-size: 12pt;\"><strong>Teacher Evaluation:<\/strong>\u00a0<\/span> This is a controversial model currently being litigated in court in NY that rates teachers based on how much their students have progressed <a href=\"http:\/\/www.slate.com\/blogs\/schooled\/2015\/08\/11\/vam_lawsuit_in_new_york_state_here_s_why_the_entire_education_reform_movement.html\">(<em>student growth percent)<\/em><\/a>.\u00a0 Long story short, a teacher on Long Island regularly rated highly effective was suddenly demoted to ineffective based on the improvement rate of her cohort of students but outside of her control.\u00a0 It\u2019s a little complicated but it smacks of bad modeling and bad assumptions, not bias in the model.<\/p>\n<p><strong><span style=\"font-size: 12pt;\">Student School Matching Algorithms:<\/span> \u00a0<\/strong>Good schools have become a scarce resource sought after by parents.\u00a0 The nonprofit IIPSC created an <a href=\"https:\/\/www.edweek.org\/ew\/articles\/2013\/12\/04\/13algorithm_ep.h33.html\"><em>allocation model used to assign students<\/em><\/a> to schools in New York, Boston, Denver, and New Orleans.\u00a0 The core is an algorithm that generates one best school offer for every student.\u00a0<\/p>\n<p>The model combines data from three sources: The schools families actually want their children to attend, listed in order of preference; the number of available seats in each grade at every school in the system; and the set of rules that governs admission to each school.\u00a0<\/p>\n<p>From the write-up this sounds more like an expert system than a predictive model.\u00a0 Further, evidence is that it does not improve the lot of the most disadvantaged students.\u00a0 At best the system fails transparency.\u00a0 At worst the underlying model may be completely flawed.<\/p>\n<p>It also illustrates a risk unique to the public sector.\u00a0 The system is widely praised by school administrators since it dramatically decreased the work created by overlapping deadlines, multiple applications, and some admissions game playing.\u00a0 So it appears to have benefited the agency but not necessarily the students.<\/p>\n<p><span style=\"font-size: 12pt;\"><strong>COMPAS Recidivism Prediction:<\/strong><\/span> There is one example of bias we can all probably agree on from the public sector and that\u2019s COMPAS, a predictive model widely used in the courts to predict who will reoffend.\u00a0 Judges across the United States use COMPAS to guide their decisions about sentencing and bail.\u00a0 A <a href=\"https:\/\/www.propublica.org\/article\/machine-bias-risk-assessments-in-criminal-sentencing\"><em>well-known study<\/em><\/a> showed that the system was biased against blacks but not in the way you might expect.\u00a0<\/p>\n<p>COMPAS was found to correctly predict recidivism for black and white defendants at roughly the same rate.\u00a0 However the false positive rate for blacks was almost twice as high for blacks as for whites.\u00a0 That is, when COMPAS was wrong (predicted to reoffend but did not) it did so twice as often for blacks.\u00a0 Interestingly it made a symmetrical false negative prediction for whites (predicted not to reoffend but did).<\/p>\n<p>Curiously these were the only three examples offered for risk in the public sector, only one of which seems to be legitimately a case of modeling bias.\u00a0 However, given the expansion of predictive analytics and AI, the public sector as a \u201chigh stakes\u201d arena for our personal freedoms looks like a good place for this discussion to begin.<\/p>\n<p>\u00a0<\/p>\n<p><span style=\"font-size: 12pt;\"><strong>How Regulated Industries Really Work<\/strong><\/span><\/p>\n<p>Space doesn\u2019t allow a deep dive into this topic but let\u2019s start with these three facts:<\/p>\n<ol>\n<li>Certain types of information are off limits for modeling. This includes the obvious protected categories, typically race, gender, sex, age, and religion.\u00a0 This extends to data which could be proxies for these variables like geography.\u00a0 I\u2019m told that these businesses also elect not to use variables that might look like \u2018bad PR\u2019.\u00a0 These include variables such as having children or lifestyle patterns like LGBTQ even though these are probably correlated with different levels of risk.<\/li>\n<li>Modeling techniques are restricted to completely transparent and explainable simple techniques like GLM and simple decision trees. Yes that negatively impacts accuracy.<\/li>\n<li>State agencies like the Department of Insurance can and do question the variables chosen in modeling. In business like insurance in most states they have the authority to approve profit margin ranges for the company\u2019s total portfolio of offerings.\u00a0 In practice this means that some may be high and others may be loss-leaders but competition levels this out in the medium time frame.<\/li>\n<\/ol>\n<p>What doesn\u2019t occur is testing for bias beyond these regulated formulas and restrictions.\u00a0 There\u2019s no specific test for disparate impact or for bias in false positives or false negatives.\u00a0 The regulation is presumed to suffice backed up by the fact that there are no monopolies here, and competition rapidly weeds out the outliers.<\/p>\n<p>Similarly you may have wondered about equal parity or proportionate parity testing.\u00a0 It\u2019s just not possible.\u00a0 An insurance company for example will have dozens of policy programs each targeting some subset of the population where they believe there is competitive advantage.\u00a0<\/p>\n<p>So long as those targeted subsets don\u2019t encroach on the protected variables they are OK.\u00a0 So for example, it\u2019s perfectly OK to have a policy aimed at school teachers and another at a city-wide geography in a city dominated by blue collar manufacturing.\u00a0 There\u2019s no way to correctly test for parity in these examples since they are designed to be demographically unique.<\/p>\n<p>\u00a0<\/p>\n<p><span style=\"font-size: 12pt;\"><strong>How Much Bias is Too Much Bias?<\/strong><\/span><\/p>\n<p>You may be interested to know that the government has already ruled on this question, and while there are somewhat different rules used in special circumstance, the answer is 80%.<\/p>\n<p>The calculation is simple, if you hired 60% of the applicants in an unprotected class and 40% in a protected class the calculation is 40\/60 or 66% and does not meet the 80% threshold.\u00a0 But 40% versus 50% would be calculated as 80% and would meet the requirement.<\/p>\n<p>This rule dates back to 1971 when the State of California Fair Employment Practice Commission (FEPC) assembled a working group of 32 professionals to make this determination.\u00a0 By 1978 it was codified at the federal level by the EEOC and the DOJ for Title VII enforcement.<\/p>\n<p>\u00a0<\/p>\n<p><span style=\"font-size: 12pt;\"><strong>So Is Bias in Modeling Really a Problem?<\/strong><\/span><\/p>\n<p>Well yes and no.\u00a0 Our regulated industries seem to be doing a pretty good job.\u00a0 Not perhaps up to the statistical standards of data science.\u00a0 They don\u2019t guard against all four types of bias identified by the University of Chicago, but from a practical standpoint, there are not a lot of complaints.<\/p>\n<p>The public sector as a \u201chigh stakes\u201d arena deserves our attention, but of the limited number of examples put forth to prove bias, only one, COMPAS clearly illustrates a statistical bias problem.\u00a0<\/p>\n<p>Still, given the rapid expansion of analytics and the limited data science talent in this sector, I vote for keeping an eye on it.\u00a0 Not however with the requirement to immediately stop using algorithms at all.<\/p>\n<p>To paraphrase a well turned observation, if I evaluate the threat of modeling bias on a scale of 0 to 10, where 0 is the tooth fairy and 10 is Armageddon, I\u2019m going to give this about a 2 until better proof is presented.<\/p>\n<p>\u00a0\u00a0<\/p>\n<p><a href=\"https:\/\/www.datasciencecentral.com\/profiles\/blog\/list?user=0h5qapp2gbuf8\"><em>Other articles by Bill Vorhies.<\/em><\/a><\/p>\n<p>\u00a0<\/p>\n<p>About the author:\u00a0 Bill Vorhies is Editorial Director for Data Science Central and has practiced as a data scientist since 2001.\u00a0 \u00a0\u00a0He can be reached at:<\/p>\n<p><a href=\"mailto:Bill@DataScienceCentral.com\">Bill@DataScienceCentral.com<\/a><\/p>\n<\/div>\n<p><a href=\"https:\/\/www.datasciencecentral.com\/xn\/detail\/6448529:BlogPost:729778\">Go to Source<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Author: William Vorhies Summary:\u00a0 There is a great hue and cry about the danger of bias in our predictive models when applied to high significance [&hellip;] <span class=\"read-more-link\"><a class=\"read-more\" href=\"https:\/\/www.aiproblog.com\/index.php\/2018\/06\/06\/is-model-bias-a-threat-to-equal-and-fair-treatment-maybe-maybe-not\/\">Read More<\/a><\/span><\/p>\n","protected":false},"author":1,"featured_media":471,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_bbp_topic_count":0,"_bbp_reply_count":0,"_bbp_total_topic_count":0,"_bbp_total_reply_count":0,"_bbp_voice_count":0,"_bbp_anonymous_reply_count":0,"_bbp_topic_count_hidden":0,"_bbp_reply_count_hidden":0,"_bbp_forum_subforum_count":0,"footnotes":""},"categories":[26],"tags":[],"_links":{"self":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/586"}],"collection":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/comments?post=586"}],"version-history":[{"count":0,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/586\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media\/472"}],"wp:attachment":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media?parent=586"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/categories?post=586"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/tags?post=586"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}