{"id":8850,"date":"2026-02-19T23:25:00","date_gmt":"2026-02-19T23:25:00","guid":{"rendered":"https:\/\/www.aiproblog.com\/index.php\/2026\/02\/19\/study-ai-chatbots-provide-less-accurate-information-to-vulnerable-users\/"},"modified":"2026-02-19T23:25:00","modified_gmt":"2026-02-19T23:25:00","slug":"study-ai-chatbots-provide-less-accurate-information-to-vulnerable-users","status":"publish","type":"post","link":"https:\/\/www.aiproblog.com\/index.php\/2026\/02\/19\/study-ai-chatbots-provide-less-accurate-information-to-vulnerable-users\/","title":{"rendered":"Study: AI chatbots provide less-accurate information to vulnerable users"},"content":{"rendered":"<p>Author: Media Lab<\/p>\n<div>\n<p>Large language models (LLMs) have been championed as tools that could democratize access to information worldwide, offering knowledge in a user-friendly interface regardless of a person\u2019s background or location. However, new research from MIT\u2019s Center for Constructive Communication (CCC) suggests these artificial intelligence systems may actually perform worse for the very users who could most benefit from them.<\/p>\n<p>A study conducted by researchers at CCC, which is based at the MIT Media Lab, found that state-of-the-art AI chatbots \u2014 including OpenAI\u2019s GPT-4, Anthropic\u2019s Claude 3 Opus, and Meta\u2019s Llama 3 \u2014 sometimes provide less-accurate and less-truthful responses to users who have lower English proficiency, less formal education, or who originate from outside the United States. The models also refuse to answer questions at higher rates for these users, and in some cases, respond with condescending or patronizing language.<\/p>\n<p>\u201cWe were motivated by the prospect of LLMs helping to address inequitable information accessibility worldwide,\u201d says lead author Elinor Poole-Dayan SM \u201925, a technical associate in the MIT Sloan School of Management who led the research as a CCC affiliate and master\u2019s student in media arts and sciences. \u201cBut that vision cannot become a reality without ensuring that model biases and harmful tendencies are safely mitigated for all users, regardless of language, nationality, or other demographics.\u201d<\/p>\n<p>A paper describing the work, \u201c<a href=\"https:\/\/arxiv.org\/abs\/2406.17737\" target=\"_blank\" rel=\"noopener\">LLM Targeted Underperformance Disproportionately Impacts Vulnerable Users<\/a>,\u201d was presented at the AAAI Conference on Artificial Intelligence in January.<\/p>\n<p><strong>Systematic underperformance across multiple dimensions<\/strong><\/p>\n<p>For this research, the team tested how the three LLMs responded to questions from two datasets: TruthfulQA and SciQ. TruthfulQA is designed to measure a model\u2019s truthfulness (by relying on common misconceptions and literal truths about the real world), while SciQ contains science exam questions testing factual accuracy. The researchers prepended short user biographies to each question, varying three traits: education level, English proficiency, and country of origin.<\/p>\n<p>Across all three models and both datasets, the researchers found significant drops in accuracy when questions came from users described as having less formal education or being non-native English speakers. The effects were most pronounced for users at the intersection of these categories: those with less formal education who were also non-native English speakers saw the largest declines in response quality.<\/p>\n<p>The research also examined how country of origin affected model performance. Testing users from the United States, Iran, and China with equivalent educational backgrounds, the researchers found that Claude 3 Opus in particular performed significantly worse for users from Iran on both datasets.<\/p>\n<p>\u201cWe see the largest drop in accuracy for the user who is both a non-native English speaker and less educated,\u201d says Jad Kabbara, a research scientist at CCC and a co-author on the paper. \u201cThese results show that the negative effects of model behavior with respect to these user traits compound in concerning ways, thus suggesting that such models deployed at scale risk spreading harmful behavior or misinformation downstream to those who are least able to identify it.\u201d<\/p>\n<p><strong>Refusals and condescending language<\/strong><\/p>\n<p>Perhaps most striking were the differences in how often the models refused to answer questions altogether. For example, Claude 3 Opus refused to answer nearly 11 percent of questions for less educated, non-native English-speaking users \u2014 compared to just 3.6 percent for the control condition with no user biography.<\/p>\n<p>When the researchers manually analyzed these refusals, they found that Claude responded with condescending, patronizing, or mocking language 43.7 percent of the time for less-educated users, compared to less than 1 percent for highly educated users. In some cases, the model mimicked broken English or adopted an exaggerated dialect.<\/p>\n<p>The model also refused to provide information on certain topics specifically for less-educated users from Iran or Russia, including questions about nuclear power, anatomy, and historical events \u2014 even though it answered the same questions correctly for other users.<\/p>\n<p>\u201cThis is another indicator suggesting that the alignment process might incentivize models to withhold information from certain users to avoid potentially misinforming them, although the model clearly knows the correct answer and provides it to other users,\u201d says Kabbara.<\/p>\n<p><strong>Echoes of human bias<\/strong><\/p>\n<p>The findings mirror documented patterns of human sociocognitive bias. Research in the social sciences has shown that native English speakers often perceive non-native speakers as less educated, intelligent, and competent, regardless of their actual expertise. Similar biased perceptions have been documented among teachers evaluating non-native English-speaking students.<\/p>\n<p>\u201cThe value of large language models is evident in their extraordinary uptake by individuals and the massive investment flowing into the technology,\u201d says Deb Roy, professor of media arts and sciences, CCC director, and a co-author on the paper. \u201cThis study is a reminder of how important it is to continually assess systematic biases that can quietly slip into these systems, creating unfair harms for certain groups without any of us being fully aware.\u201d<\/p>\n<p>The implications are particularly concerning given that personalization features \u2014 like ChatGPT\u2019s Memory, which tracks user information across conversations \u2014 are becoming increasingly common. Such features risk differentially treating already-marginalized groups.<\/p>\n<p>\u201cLLMs have been marketed as tools that will foster more equitable access to information and revolutionize personalized learning,\u201d says Poole-Dayan. \u201cBut our findings suggest they may actually exacerbate existing inequities by systematically providing misinformation or refusing to answer queries to certain users. The people who may rely on these tools the most could receive subpar, false, or even harmful information.\u201d<\/p>\n<\/div>\n<p><a href=\"https:\/\/news.mit.edu\/2026\/study-ai-chatbots-provide-less-accurate-information-vulnerable-users-0219\">Go to Source<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Author: Media Lab Large language models (LLMs) have been championed as tools that could democratize access to information worldwide, offering knowledge in a user-friendly interface [&hellip;] <span class=\"read-more-link\"><a class=\"read-more\" href=\"https:\/\/www.aiproblog.com\/index.php\/2026\/02\/19\/study-ai-chatbots-provide-less-accurate-information-to-vulnerable-users\/\">Read More<\/a><\/span><\/p>\n","protected":false},"author":1,"featured_media":461,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_bbp_topic_count":0,"_bbp_reply_count":0,"_bbp_total_topic_count":0,"_bbp_total_reply_count":0,"_bbp_voice_count":0,"_bbp_anonymous_reply_count":0,"_bbp_topic_count_hidden":0,"_bbp_reply_count_hidden":0,"_bbp_forum_subforum_count":0,"footnotes":""},"categories":[24],"tags":[],"_links":{"self":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/8850"}],"collection":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/comments?post=8850"}],"version-history":[{"count":0,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/8850\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media\/473"}],"wp:attachment":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media?parent=8850"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/categories?post=8850"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/tags?post=8850"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}