{"id":5069,"date":"2021-10-03T06:33:59","date_gmt":"2021-10-03T06:33:59","guid":{"rendered":"https:\/\/www.aiproblog.com\/index.php\/2021\/10\/03\/data-labeling-instructions-gateway-to-success-in-crowdsourcing-and-enduring-impact-on-ai\/"},"modified":"2021-10-03T06:33:59","modified_gmt":"2021-10-03T06:33:59","slug":"data-labeling-instructions-gateway-to-success-in-crowdsourcing-and-enduring-impact-on-ai","status":"publish","type":"post","link":"https:\/\/www.aiproblog.com\/index.php\/2021\/10\/03\/data-labeling-instructions-gateway-to-success-in-crowdsourcing-and-enduring-impact-on-ai\/","title":{"rendered":"Data-Labeling Instructions: Gateway to Success in Crowdsourcing and Enduring Impact on AI"},"content":{"rendered":"<p>Author: Daria Baidakova<\/p>\n<div>\n<p><a href=\"https:\/\/storage.ning.com\/topology\/rest\/1.0\/file\/get\/9623544096?profile=original\" target=\"_blank\" rel=\"noopener\"><img decoding=\"async\" src=\"https:\/\/storage.ning.com\/topology\/rest\/1.0\/file\/get\/9623544096?profile=RESIZE_710x\" width=\"720\" class=\"align-full\"><\/a><\/p>\n<p>Photo by <a href=\"https:\/\/unsplash.com\/@claytonrobbins?utm_source=unsplash&amp;utm_medium=referral&amp;utm_content=creditCopyText\">Clayton Robbins<\/a> on <a href=\"https:\/\/unsplash.com\/s\/photos\/guidelines?utm_source=unsplash&amp;utm_medium=referral&amp;utm_content=creditCopyText\">Unsplash<\/a><\/p>\n<p><span style=\"font-weight: 400;\">AI development today rests on the shoulders of Machine Learning algorithms that require huge amounts of data to be fed into training models. This data needs to be of consistently high quality to correctly represent the real world, and to achieve that, the data needs to be labeled accurately throughout. A number of data labeling methods exist today, from in-house to synthetic labeling. Crowdsourcing finds itself among the most cost- and time-effective of the labeling approaches (<\/span><a href=\"https:\/\/www.ijcai.org\/Proceedings\/16\/Papers\/301.pdf\"><span style=\"font-weight: 400;\">Wang and Zhou, 2016<\/span><\/a><span style=\"font-weight: 400;\">).<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Crowdsourcing is human-handled, manual data labeling that uses the principle of aggregation to complete assignments. In this scenario, a large number of performers complete various tasks\u2014from transcribing audio files and classifying images to visiting on-site locations and measuring walking distance\u2014and their best efforts are subsequently combined to achieve the desired outcome.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Research shows that crowdsourcing has become one of the most sought-after data labeling approaches to date with companies like<\/span> <a href=\"https:\/\/www.mturk.com\/\"><span style=\"font-weight: 400;\">MTurk<\/span><\/a><span style=\"font-weight: 400;\">,<\/span> <a href=\"https:\/\/thehive.ai\/\"><span style=\"font-weight: 400;\">Hive<\/span><\/a><span style=\"font-weight: 400;\">,<\/span> <a href=\"https:\/\/www.clickworker.com\/\"><span style=\"font-weight: 400;\">Clickworker<\/span><\/a><span style=\"font-weight: 400;\">, and<\/span> <a href=\"https:\/\/toloka.ai\/\"><span style=\"font-weight: 400;\">Toloka<\/span><\/a> <span style=\"font-weight: 400;\">attracting millions of performers the world over (<\/span><a href=\"https:\/\/www.researchgate.net\/publication\/283842772_Crowdsourcing_and_the_Evolution_of_a_Business_Ecosystem\"><span style=\"font-weight: 400;\">Guittard et al., 2015<\/span><\/a><span style=\"font-weight: 400;\">). In some cases, such as with <a href=\"https:\/\/www.linkedin.com\/feed\/update\/urn:li:activity:6830489133316624384\" target=\"_blank\" rel=\"noopener\">Toloka App Services<\/a>, the process has been refined to become almost automatic, requiring only clear guidelines and examples from requesters to receive the labeled data shortly after.<\/span><\/p>\n<p><b>Importance of instructions<\/b><\/p>\n<p><span style=\"font-weight: 400;\">This brings us to the main point \u2013 instructions. As our lives are becoming more AI-dependent, the evolution of AI is in turn becoming increasingly more reliant on ML algorithms for training. These algorithms cannot survive without accurate data labeling. And, therefore, instructions on how to label data correctly are the gateway to success in both crowdsourcing and AI development. Ultimately, poor instructions lead to a poor AI product, nevermind the other factors.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Whereas many crowdsourcing platforms work to hone their delivery pipelines, simplifying the procedure as much as possible, instructions often remain a sore point. No matter how well-oiled the whole data-labeling mechanism is, there\u2019s no way around having clear instructions for crowd workers that can be easily understood and followed accordingly.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Since 95% of all ML labels are supervised, i.e. done by hand (<\/span><a href=\"https:\/\/www.researchgate.net\/publication\/347079938_Data_Labeling_An_Empirical_Investigation_into_Industrial_Challenges_and_Mitigation_Strategies\"><span style=\"font-weight: 400;\">Fredriksson et al., 2020<\/span><\/a><span style=\"font-weight: 400;\">), the instructions aspect of crowdsourcing should never be overlooked or underplayed. However, research also indicates that when it comes to having a systematic approach to labeling data and prepping crowd workers, most requesters often don\u2019t know what to do beyond general notions (<\/span><a href=\"https:\/\/www.researchgate.net\/publication\/347079938_Data_Labeling_An_Empirical_Investigation_into_Industrial_Challenges_and_Mitigation_Strategies\"><span style=\"font-weight: 400;\">Fredriksson et al., 2020<\/span><\/a><span style=\"font-weight: 400;\">).<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Disagreements between requesters, as well as expert annotators, and crowd workers continue to pop up and can only be resolved by refining instructions (<\/span><a href=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2017\/04\/pn4864-changA.pdf\"><span style=\"font-weight: 400;\">Chang et al, 2017<\/span><\/a><span style=\"font-weight: 400;\">). For instance, Konstantin Kashkarov, a crowd worker with Toloka, admits that he has disagreed with the instructions from various requesters more than a few times in his career as a labeler (<\/span><a href=\"https:\/\/crowdscience.ai\/conference_events\/vldb21\"><span style=\"font-weight: 400;\">VLDB Discussion, 2021<\/span><\/a><span style=\"font-weight: 400;\">) that\u2014according to him\u2014contained errors and inconsistencies.<\/span> <a href=\"https:\/\/www.researchgate.net\/publication\/311490300_Parting_Crowds_Characterizing_Divergent_Interpretations_in_Crowdsourced_Annotation_Tasks\"><span style=\"font-weight: 400;\">Kairam and Heer (2016)<\/span><\/a> <span style=\"font-weight: 400;\">stipulate that these inconsistencies indeed translate into labeling troubles unless they\u2019re swiftly addressed by the majority voting of crowd workers.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">However, how to do this efficiently still remains an open question: practice shows that greater numbers (as opposed to fewer experts) are not necessarily reflective of facticity, especially in narrow, domain-specific tasks \u2013 and the narrower, the more so. In other words, just because there are many crowd performers involved in a given project doesn\u2019t mean these performers won\u2019t make labeling mistakes; in fact, most of them might make the same or similar mistakes if instructions don\u2019t resolve ambiguity. And since instructions are the stepping stone to the whole ML ecosystem, even the tiniest misinterpretation and subsequent labeling irregularity can lead to noisy data sets and imprecise AI. In some cases, this can even potentially endanger our lives, such as when AI is used to help diagnose illnesses or administer drugs.\u00a0\u00a0\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">So, how do we make instructions accurate and make sure data labeling is error-free? To answer this question, we must first look at the types of problems many labelers face.\u00a0\u00a0<\/span><\/p>\n<p><b>Frequent issues and grey areas<\/b><\/p>\n<p><span style=\"font-weight: 400;\">When it comes to crowdsourcing instructions, research indicates that things are not at all cut and dried. On the one hand, it\u2019s been shown that crowd performers will only go as far as they have to, and hence all responsibility to do with the comprehension of tasks ultimately falls on the requester. Both confusing interfaces and unclear instructions tend to result in improperly completed assignments. Any ambiguity is bound to lead to labeling mistakes, while the crowd workers cannot be expected to ask clarifying questions, and as a general rule, they won\u2019t (<\/span><a href=\"https:\/\/jmlr.org\/papers\/volume18\/17-234\/17-234.pdf\"><span style=\"font-weight: 400;\">Rao and Michel, 2016<\/span><\/a><span style=\"font-weight: 400;\">).<\/span><\/p>\n<p><span style=\"font-weight: 400;\">At the same time, crowdsourcing practitioners, among them Toloka, experimented with this notion, and it turned out that crowd workers were savvier than previously expected. Ivan Stelmakh, a PhD student at Carnegie Mellon University and a panelist at the recent<\/span> <a href=\"https:\/\/vldb.org\/2021\/\"><span style=\"font-weight: 400;\">VLDB conference<\/span><\/a><span style=\"font-weight: 400;\">, explains that his team tried giving confusing instructions to performers on Toloka on purpose expecting a poor performance. They were surprised to discover that the results were still very robust, which implies that somehow the performers were able to understand\u2014whether instinctively or, more likely, through experience\u2014what to do and how. This implies that (a) it\u2019s not just up to the instructions but those who read those instructions, and (b) the more experienced the readers are, the higher the probability of satisfactory results.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Another conclusion that follows has to do with how simple or complex the task in question is. According to<\/span> <a href=\"https:\/\/www.cmu.edu\/tepper\/faculty-and-research\/faculty-by-area\/profiles\/lipton-zachary.html\"><span style=\"font-weight: 400;\">Zack Lipton<\/span><\/a><span style=\"font-weight: 400;\">, who conducts ML research at Carnegie Mellon, the outcome very much depends on whether it is a standard or non-standard task. A simple task with confusing instructions can be completed by experienced performers without major problems. This isn\u2019t the case with unusual or rare tasks: even experienced crowd workers may struggle to come up with acceptable answers if the instructions aren\u2019t clear, because they have no domain-specific experience to fall back on.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Importantly,<\/span> <a href=\"https:\/\/arxiv.org\/abs\/2010.02114\"><span style=\"font-weight: 400;\">Lipton\u2019s experiments demonstrated<\/span><\/a> <span style=\"font-weight: 400;\">that with such tasks different versions of instructions play a direct role in the ultimate outcome. Therefore, it appears that Rao and Michel\u2019s argument about the role of initial guidelines tends to outweigh Ivan Stelmakh\u2019s observation of the performers\u2019 self-guiding ability as the task\u2019s difficulty rises.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Furthermore, according to<\/span> <a href=\"https:\/\/scholar.google.co.uk\/citations?user=yFIbz2kAAAAJ&amp;hl=en\"><span style=\"font-weight: 400;\">Mohamed Amgad<\/span><\/a><span style=\"font-weight: 400;\">, a Fellow at Northwestern University who also made an appearance at VLDB, this rule applies to unusual tasks even when the instructions are completely clear. In other words, there\u2019s something inherent about erring during manual task completion, and this problem becomes more pronounced as the tasks become less common and more advanced. In the end, it comes down to variability that can only be eliminated with experience (not just clear instructions), so the underlying issue\u2014according to him\u2014is sometimes embedded in the task itself, not its explanation. To put it bluntly, even if there are very clear instructions on how to build a rocket, most of us will probably struggle with this task unless we have some background in engineering and physics.<\/span><\/p>\n<p><b>Confusion and the bias problem<\/b><\/p>\n<p><span style=\"font-weight: 400;\">As we\u2019ve seen, ambiguity in instructions seems to become more of a worry as the task becomes more sophisticated, finally reaching a stage when even clear instructions may lead to substandard results. And sometimes, it turns out, this prevailing inherence goes beyond the crowd workers\u2019 experience right into the realm of personal interpretation. According to<\/span> <a href=\"https:\/\/wearetechwomen.com\/inspirational-woman-olga-megorskaya-ceo-co-founder-toloka\/\"><span style=\"font-weight: 400;\">Olga Megorskaya<\/span><\/a><span style=\"font-weight: 400;\">, CEO of Toloka, imminent biases exist in datasets that are related to the actual data, guidelines, and also personality and background of the labelers.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">This is known as<\/span> <i><span style=\"font-weight: 400;\">subjective data<\/span><\/i> <span style=\"font-weight: 400;\">and<\/span> <i><span style=\"font-weight: 400;\">biased labeling<\/span><\/i> <span style=\"font-weight: 400;\">in the scientific community, a ubiquitous problem that\u2019s qualitatively different from individual errors, bec<\/span><span style=\"font-weight: 400;\">ause it reflects a common, sometimes hidden group tendency (<\/span><a href=\"https:\/\/www.researchgate.net\/publication\/304747211_Learning_from_crowdsourced_labeled_data_a_survey\"><span style=\"font-weight: 400;\">Zhang and Sheng, 2016<\/span><\/a> <span style=\"font-weight: 400;\">via<\/span> <span style=\"font-weight: 400;\">Wauthier &amp; Jordan 2011 and Faltings et al. 2014<\/span><span style=\"font-weight: 400;\">). In the best case scenario, this tendency can reflect a particular view that another group might not share, making the labeling results only partially accurate. In the worst case scenario, the outcome can be prejudicial and offensive, such as when in a<\/span> <a href=\"https:\/\/algorithmwatch.org\/en\/google-vision-racism\/\"><span style=\"font-weight: 400;\">widely publicized Google case<\/span><\/a> <span style=\"font-weight: 400;\">the dark-skinned individuals were mislabeled to be holding a gun, while the light-skinned individuals with the very same device were judged to be holding a harmless thermometer.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Importantly, biased labeling arises not merely from expert vs. non-expert differences or individual preferences, but rather from<\/span> <span style=\"font-weight: 400;\">varying<\/span> <span style=\"font-weight: 400;\">metrics and scales used in decision-making. Often this is the result of one\u2019s socio-cultural background and points of reference. What\u2019s more, this phenomenon may not be apparent to requesters, and so the detection of these biases and their modeling can be very challenging, resulting in a \u201cnegative effect on inference algorithms and training\u201d (<\/span><a href=\"https:\/\/www.researchgate.net\/publication\/304747211_Learning_from_crowdsourced_labeled_data_a_survey\"><span style=\"font-weight: 400;\">Zhang and Sheng, 2016<\/span><\/a><span style=\"font-weight: 400;\">).<\/span><\/p>\n<p><span style=\"font-weight: 400;\">From the standpoint of statistics, such biases are essentially systematic errors that can be potentially overcome by enlarging sample sizes or collecting different datasets. This means that what\u2019s often more important is not clear instructions, but clear examples \u2013 and enough of them for the crowd workers to see a particular pattern in order to steer clear of erring. At the same time, these biases can be so pronounced that they\u2019re often entirely culture-based. A question like \u201cwho has the prettiest face in this picture?\u201d or \u201cidentify the most dangerous animal\u201d can result in different answers from different individuals within different socio-cultural groups where standards of beauty and local fauna can vary significantly. Often, these differences come down to geography.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">A question like \u201cidentify a blue object in this image\u201d can, too, yield very different results from individuals from Russia vs. Japan vs. India, where<\/span> <a href=\"https:\/\/www.mentalfloss.com\/article\/515205\/why-does-japan-have-blue-traffic-lights-instead-green\"><span style=\"font-weight: 400;\">the colors green, blue, and yellow are not classified in the same way<\/span><\/a><span style=\"font-weight: 400;\">.<\/span><\/p>\n<\/p>\n<p><span style=\"font-weight: 400;\"><a href=\"https:\/\/storage.ning.com\/topology\/rest\/1.0\/file\/get\/9623556269?profile=original\" target=\"_blank\" rel=\"noopener\"><img decoding=\"async\" src=\"https:\/\/storage.ning.com\/topology\/rest\/1.0\/file\/get\/9623556269?profile=RESIZE_710x\" class=\"align-full\"><\/a><\/span><\/p>\n<p><span style=\"font-weight: 400;\"><span>ANTTI T. NISSINEN,\u00a0<\/span><a href=\"https:\/\/www.flickr.com\/photos\/veisto\/8389740249\/\">FLICKR<\/a><span>\u00a0\/\/\u00a0<\/span><a href=\"https:\/\/creativecommons.org\/licenses\/by\/2.0\/\">CC BY 2.0<\/a><\/span><\/p>\n<p><span style=\"font-weight: 400;\">Yet another example comes from a recent survey that was meant to detect hateful speech and abusive language. It turns out that most English speakers may not have the same standards and acceptance levels compared to those of other linguistic backgrounds; for this reason, to get at the bigger picture, members from other, smaller groups should be consulted.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">According to<\/span> <a href=\"https:\/\/www.tudelft.nl\/ewi\/over-de-faculteit\/afdelingen\/software-technology\/web-information-systems\/people\/jie-yang\"><span style=\"font-weight: 400;\">Jie Yang<\/span><\/a><span style=\"font-weight: 400;\">, Assistant Professor at Delft University in the Netherlands, the bias problem should be subject to the top-down approach to labeling. This means that all potential biases have to be considered in advance, i.e. when deciding on the kinds of results that are required and thus who exactly should be completing the tasks to obtain these results.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Moreover, according to<\/span> <a href=\"http:\/\/www.vldb.org\/pvldb\/vol13\/p2522-krivosheev.pdf\"><span style=\"font-weight: 400;\">Krivosheev et al (2020)<\/span><\/a><span style=\"font-weight: 400;\">, this overlaps with<\/span> <span style=\"font-weight: 400;\">the issue of<\/span> <i><span style=\"font-weight: 400;\">confusion of observations<\/span><\/i><span style=\"font-weight: 400;\">, when crowd workers\u2014including those who try their best to do everything by the book\u2014confuse items of similar classes. This happens because the items\u2019 interpretability is embedded within the task, but the description fails to provide enough examples and explanations to point to the desired interpretation. An instance of this phenomenon would be having to identify Freddie Mercury in an image \u2013 but does actor Rami Malek playing Freddie count or not?<\/span><\/p>\n<\/p>\n<p><span style=\"font-weight: 400;\"><a href=\"https:\/\/storage.ning.com\/topology\/rest\/1.0\/file\/get\/9623568073?profile=original\" target=\"_blank\" rel=\"noopener\"><img decoding=\"async\" src=\"https:\/\/storage.ning.com\/topology\/rest\/1.0\/file\/get\/9623568073?profile=RESIZE_710x\" width=\"400\" class=\"align-full\"><\/a><\/span><\/p>\n<\/p>\n<p><span style=\"font-weight: 400;\">If this confusion issue is present, then the effect observed by Rao and Michel and corroborated by Lipton\u2019s experiments can be multiplied manifold.<\/span><\/p>\n<p><b>Suggested solutions<\/b><\/p>\n<p><span style=\"font-weight: 400;\">Despite some inherent issues related to the type of tasks and performers involved, instructions still remain a pivotal factor in the success of data-labeling projects. According to Megorskaya, even one small change in the guidelines can affect the whole data set; ergo, the question that needs to be addressed is to what extent exactly do changes in instructions have a say in the AI end-product and how to minimize any negative effect?<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Jie Yang stipulates that while bias poses a serious obstacle to accurate labeling, crucially, there\u2019s no panacea available: any attempt to resolve bias would be entirely domain-dependent. At the same time, as we\u2019ve seen, this is a multi-faceted problem \u2013 accurate results rely on clear instructions and, beyond that, on the tasks themselves (how rare\/difficult) and also performers (their experience and socio-cultural background). Consequently, and somewhat expectedly, no universal solution encompassing all of these aspects currently exists.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Nonetheless,<\/span> <a href=\"https:\/\/www.researchgate.net\/publication\/335221485_Machine_Learning_with_Crowdsourcing_A_Brief_Summary_of_the_Past_Research_and_Future_Directions\"><span style=\"font-weight: 400;\">Zhang et al (2013)<\/span><\/a> <span style=\"font-weight: 400;\">proposes a strategy that attempts to control quality by having periodical checkpoints meant to discard both low quality labels and labelers amid completion.<\/span> <a href=\"https:\/\/www.jmlr.org\/papers\/volume18\/17-234\/17-234.pdf\"><span style=\"font-weight: 400;\">Vaughan (2018)<\/span><\/a> <span style=\"font-weight: 400;\">further suggests that before proceeding with any task, projects should be piloted by creating micro pools and testing both UI and crowd workers. It\u2019s been shown that there\u2019s a negative correlation between confusion in instructions and labeling accuracy, as well as acceptance of tasks<\/span> <a href=\"https:\/\/www.jmlr.org\/papers\/volume18\/17-234\/17-234.pdf\"><span style=\"font-weight: 400;\">(Vaughan, 2018 via Jain et al., 2017<\/span><\/a><span style=\"font-weight: 400;\">). In other words, the more examples there are and the clearer the task, the more workers will be willing to participate and the better\/quicker will be the result. Be that as it may, while this strategy can help resolve confusion, research indicates that these steps may still be insufficient in combating bias in subjective data.<\/span><\/p>\n<p><a href=\"https:\/\/www.researchgate.net\/publication\/304747211_Learning_from_crowdsourced_labeled_data_a_survey\"><span style=\"font-weight: 400;\">Zhang and Sheng (2016)<\/span><\/a> <span style=\"font-weight: 400;\">suggest a different track \u2013 historical information on labelers should be evaluated to consider assigning different \u201cweight\u201d or impact factor to different workers. This weight should depend on their levels of domain expertise and other relevant socio-cultural, as well as educational background. To put it in simpler terms, for better or worse, not all crowd workers should always be treated equally in the context of their labeling output.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Among other suggestions that follow the same logic is Active Multi-label Crowd Consensus (AMCC) put forth by<\/span> <a href=\"https:\/\/arxiv.org\/pdf\/1911.02789.pdf\"><span style=\"font-weight: 400;\">Tu et al. (2019)<\/span><\/a><span style=\"font-weight: 400;\">. This model attempts to assess crowd workers to account for their commonality and differences and subsequently group them according to this rubric. Each group shares a particular trait in this scenario that\u2019s reflected in the labeling results that can be followed and dissected much more easily. This model is supposed to reduce the influence of any unreliable workers and those lacking the right background or expertise for successful task completion.<\/span><\/p>\n<p><b>The bottom line<\/b><\/p>\n<p><span style=\"font-weight: 400;\">Clear instructions are instrumental in realizing data-labeling projects. Concurrently, other factors emerge to share responsibility as tasks become rarer and more challenging. At some point, problems can be expected to arise even when instructions are clear, because the crowd workers tackling the task have little experience to fall back onto.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Accordingly, inherent biases and, to a lesser extent, confusion of observations will persist, which stem not only from the clarity of instructions and examples, but also from choosing the right performers. In certain situations, the crowd workers\u2019 socio-cultural background may play as much of a role as their domain expertise.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">While some of these problematic factors can be addressed using widely accepted quality assurance tools, no universal solution exists apart from (a) selecting those workers that have the right experience and expertise, and (b) those who are able to address the tasks based on their specific background that has been judged to be pertinent to the assignment.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Since instructions remain at the epicenter of the accuracy problem all the same, it is recommended that the<\/span> <b>following points be considered<\/b> <span style=\"font-weight: 400;\">when preparing instructions:<\/span><\/p>\n<ul>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">Instructions must always be written, so that they\u2019re easy to understand!<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">Crowdsourcing platforms may help with preparing instructions, they can enforce these instructions, facilitate the labeling process, check for consistency, and verify results; however, it is the requester who ultimately needs to explain beyond any doubt what is being asked, and how exactly they want the data to be labeled.\u00a0<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">Plentiful and clear\/unambiguous examples must always be supplied.<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">You need to keep in mind that as a rule of thumb, there\u2019s a positive correlation between task difficulty and clarity of instructions: less clarity means less accuracy.<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">Understanding instructions shouldn\u2019t require any extraordinary skills; if such skills are implicit, you need to acknowledge that only experienced workers will know what to do.<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">If you have an uncommon task, instructions must be crystal clear and with contrasting examples (i.e. what is wrong) for even the most experienced workers to be able to follow.<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">Confusion can be resolved with clear examples, but in some cases, even experienced crowd workers might provide noisy data sets if the bias problem is not addressed prior.<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">The best countermeasure to bias is that the crowd workers ought to be selected not merely based on their expertise, but also their socio-cultural background that must always match the task\u2019s demands.<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">Managerial responsibility has to be maintained throughout the planning process: micro decisions will lead to macro results, with even the tiniest detail potentially having far-reaching implications.<\/span><\/li>\n<\/ul>\n<\/div>\n<p><a href=\"https:\/\/www.datasciencecentral.com\/xn\/detail\/6448529:BlogPost:1070299\">Go to Source<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Author: Daria Baidakova Photo by Clayton Robbins on Unsplash AI development today rests on the shoulders of Machine Learning algorithms that require huge amounts of [&hellip;] <span class=\"read-more-link\"><a class=\"read-more\" href=\"https:\/\/www.aiproblog.com\/index.php\/2021\/10\/03\/data-labeling-instructions-gateway-to-success-in-crowdsourcing-and-enduring-impact-on-ai\/\">Read More<\/a><\/span><\/p>\n","protected":false},"author":1,"featured_media":474,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_bbp_topic_count":0,"_bbp_reply_count":0,"_bbp_total_topic_count":0,"_bbp_total_reply_count":0,"_bbp_voice_count":0,"_bbp_anonymous_reply_count":0,"_bbp_topic_count_hidden":0,"_bbp_reply_count_hidden":0,"_bbp_forum_subforum_count":0,"footnotes":""},"categories":[26],"tags":[],"_links":{"self":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/5069"}],"collection":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/comments?post=5069"}],"version-history":[{"count":0,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/5069\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media\/465"}],"wp:attachment":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media?parent=5069"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/categories?post=5069"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/tags?post=5069"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}