{"id":2695,"date":"2019-10-15T06:31:33","date_gmt":"2019-10-15T06:31:33","guid":{"rendered":"https:\/\/www.aiproblog.com\/index.php\/2019\/10\/15\/surprise-model-improvements-dont-always-drive-business-impact\/"},"modified":"2019-10-15T06:31:33","modified_gmt":"2019-10-15T06:31:33","slug":"surprise-model-improvements-dont-always-drive-business-impact","status":"publish","type":"post","link":"https:\/\/www.aiproblog.com\/index.php\/2019\/10\/15\/surprise-model-improvements-dont-always-drive-business-impact\/","title":{"rendered":"Surprise \u2013 Model Improvements Don\u2019t Always Drive Business Impact"},"content":{"rendered":"<p>Author: William Vorhies<\/p>\n<div>\n<p><strong><em>Summary:<\/em><\/strong><em>\u00a0 Data Scientists from Booking.com share many lessons learned in the process of constantly improving their sophisticated ML models.\u00a0 Not the least of which is that improving your models doesn\u2019t always lead to improving business outcomes.<\/em><\/p>\n<p>\u00a0<\/p>\n<p><a href=\"https:\/\/storage.ning.com\/topology\/rest\/1.0\/file\/get\/3662070231?profile=original\" target=\"_blank\" rel=\"noopener noreferrer\"><img decoding=\"async\" src=\"https:\/\/storage.ning.com\/topology\/rest\/1.0\/file\/get\/3662070231?profile=RESIZE_710x\" width=\"350\" class=\"align-right\"><\/a>The adoption of AI\/ML in business is at an inflection point.\u00a0 Recent adopters are still building teams and trying to plan portfolios of projects than make sense and can create value.\u00a0 On the other side of the curve are mature adopters, many of whom are large well known ecommerce companies and now have years of experience in implementation, team management, and most important the extraction of value.<\/p>\n<p>There are obviously not as many mature adopters and even fewer who have been willing to share the broader lessons they\u2019ve learned.\u00a0 Eventually, perhaps a decade from now this knowledge and experience will flow around more evenly, but for now it\u2019s rare.<\/p>\n<p>All the more important to recognize the contribution made in August by Lucas Bernardi, Themis Mavridis, and Pablo Estevez of Booking.com who shared their paper \u201c<a href=\"https:\/\/www.kdd.org\/kdd2019\/accepted-papers\/view\/150-successful-machine-learning-models-6-lessons-learned-at-booking.com\"><em><u>150 Successful Machine Learning Models: 6 Lessons Learned<\/u><\/em><\/a>\u201d at KDD \u201919 this last August.\u00a0 The goal of the paper is to speak to the issue of how ML can create value in an industrial environment where measurable commercial gain is the goal.<\/p>\n<p>There are many things that make Booking.com\u2019s environment unique as the world\u2019s largest on line travel agent.\u00a0 Their technical focus is on recommenders and information retrieval where almost all users are cold-start, given that we don\u2019t travel that often or to the same places.\u00a0 This not music, books, movies, or even fashion where some patterns can be built up.<\/p>\n<p>Booking.com has multiple interdisciplinary teams of data scientists, business managers, UX designers, and systems engineers who constantly work together to tease out meaningful patterns from the frequently incomplete input provided by potential travelers.\u00a0<\/p>\n<p>There are models that focus on specific triggers in an event stream but also models that are semantic in nature and can provide clues for many different internal uses.\u00a0 For example, what is the likely flexibility of the user for travel dates, destinations, or accommodation type?\u00a0 Is the user actually shopping for a family vacation (users frequently omit the number of children) or a business trip.<\/p>\n<p>All of these less known factors need to be teased out of navigation, scrolling, and search models along with the little information provided by the user.\u00a0 One lesson shared is that optimal user presentation including fonts, use of images, text, offers, and all the other elements of UX are not represented by a single optimal standard but vary for a fairly large number of subgroups of users, and then across multiple languages and cultures.<\/p>\n<p>And the stakes are high.\u00a0 Connect a user with a bad experience and they won\u2019t come back.\u00a0 The supply of accommodations may technically be constrained but its variety and detail can overwhelm a user.\u00a0 And of course pricing changes by the hour based on availability and elasticity.<\/p>\n<p>\u00a0<\/p>\n<p><span style=\"font-size: 12pt;\"><strong>Improved Model Performance Does Not Necessarily Mean Improved Business Impact<\/strong><\/span><\/p>\n<p>The one lesson out of all this great shared learning that struck me however was the disconnect between model performance and business impact.\u00a0 Let\u2019s start at the end by looking briefly at their conclusions.<\/p>\n<p>Of the roughly 150 business critical models they looked at 23 classifiers and rankers that in the normal course of business had come through their ML management process as candidates for improvement.\u00a0 The technical measures for improvement were either ROC AUC for classifiers or Mean Reciprocal Rank for rankers.<\/p>\n<p>By business category they ranged over preference models, interface optimization, copy curation and augmentation, time and destination flexibility, and booking likelihood models.<\/p>\n<p>Before and after performance is judged by Randomized Control Trials which can be multi-armed and are designed to only consider the response of users who are targeted by the change.<\/p>\n<p>Their internal process is hypothesis based and experimentally driven.\u00a0 The majority of the 46 pair comparisons showed improvement but not all.\u00a0 However, when compared to KPIs that matter to business performance like conversion rate, customer service tickets, or cancellations the results were much different.<\/p>\n<p>\u00a0<a href=\"https:\/\/storage.ning.com\/topology\/rest\/1.0\/file\/get\/3662071066?profile=original\" target=\"_blank\" rel=\"noopener noreferrer\"><img decoding=\"async\" src=\"https:\/\/storage.ning.com\/topology\/rest\/1.0\/file\/get\/3662071066?profile=RESIZE_710x\" width=\"500\" class=\"align-center\"><\/a><\/p>\n<p>Visually we can see that there\u2019s little correlation between model improvement and business improvement (vertical axis \u2013 conversion rate), confirmed by a Pearson correlation of -0.1 at the 90% confidence interval.<\/p>\n<p>\u00a0<\/p>\n<p><span style=\"font-size: 12pt;\"><strong>So What Happened?<\/strong><\/span><\/p>\n<p>The authors offer up the following cautionary lessons:<\/p>\n<p><strong>It\u2019s as Good as It\u2019s Going to Get:<\/strong>\u00a0 If you\u2019ve been working at improving your system for quite a while it\u2019s quite possible you\u2019re approaching \u2018value performance saturation\u2019.\u00a0 You can\u2019t improve a system incrementally forever and future gains may simply be too small to matter.<\/p>\n<p><strong>Test Segments are Saturated:<\/strong>\u00a0 Booking\u2019s test protocol is to run A-B or multi-arm tests against one another further enhanced by only testing those users that are impacted by the change.\u00a0 As models improve performance the disagreement rate for target users goes down limiting the number of users that can be tested.\u00a0 The ability to detect any gains from the improvement therefore also shrinks.<\/p>\n<p><strong>Too Much Accuracy Can Seem Creepy to Customers:<\/strong>\u00a0 Otherwise known as the uncanny valley effect, if your models get too good this effect can be \u2018unsettling\u2019 to customers and actually work against their user experience and your business gain.\u00a0<a href=\"https:\/\/storage.ning.com\/topology\/rest\/1.0\/file\/get\/3662071652?profile=original\" target=\"_blank\" rel=\"noopener noreferrer\"><img decoding=\"async\" src=\"https:\/\/storage.ning.com\/topology\/rest\/1.0\/file\/get\/3662071652?profile=RESIZE_710x\" width=\"400\" class=\"align-center\"><\/a><\/p>\n<\/p>\n<p><strong>You Have to Pick Something to Model:<\/strong>\u00a0 Since these are basically supervised ML models we have to have a measurable outcome closely related to the behavior being modeled.\u00a0 That might for example be Click Through Rate (CTR) which is supposed to have a strong correlation with the business KPI of Conversion Rate.\u00a0 But as models get better, driving CTR may just be driving clicks and not conversions.<\/p>\n<p>Here\u2019s as excerpt from the authors showing an example of this over optimization:<\/p>\n<p><em>\u201cAn example of this is a model that learned to recommend very similar hotels to the one a user is looking at, encouraging the user to click (presumably to compare all the very similar hotels), eventually drowning them into the paradox of choice and hurting conversion. In general, over-optimizing proxies leads to distracting the user away from their goal.\u201d<\/em><\/p>\n<p>There are many good lessons in this paper although the authors warn that these may not completely generalize beyond their own environment.\u00a0 Still, their explanation of their \u201cProblem Construction Process\u201d as well as their measurement protocols are well worth considering for adoption, along with the several other more technical findings they present.\u00a0 And of course the warning to make sure model improvement is actually driving the desired business outcomes.<\/p>\n<p>\u00a0<\/p>\n<p>\u00a0<\/p>\n<p><a href=\"https:\/\/www.datasciencecentral.com\/profiles\/blog\/list?user=0h5qapp2gbuf8\"><em><u>Other articles by Bill Vorhies<\/u><\/em><\/a><\/p>\n<p>\u00a0<\/p>\n<p>About the author:\u00a0 Bill is Contributing Editor for Data Science Central.\u00a0 Bill is also President &#038; Chief Data Scientist at Data-Magnum and has practiced as a data scientist since 2001.\u00a0 His articles have been read more than 2 million times.<\/p>\n<p>He can be reached at:<\/p>\n<p><a href=\"mailto:Bill@DataScienceCentral.com\">Bill@DataScienceCentral.com<\/a> <span>or<\/span> <a href=\"mailto:Bill@Data-Magnum.com\">Bill@Data-Magnum.com<\/a><\/p>\n<\/div>\n<p><a href=\"https:\/\/www.datasciencecentral.com\/xn\/detail\/6448529:BlogPost:898268\">Go to Source<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Author: William Vorhies Summary:\u00a0 Data Scientists from Booking.com share many lessons learned in the process of constantly improving their sophisticated ML models.\u00a0 Not the least [&hellip;] <span class=\"read-more-link\"><a class=\"read-more\" href=\"https:\/\/www.aiproblog.com\/index.php\/2019\/10\/15\/surprise-model-improvements-dont-always-drive-business-impact\/\">Read More<\/a><\/span><\/p>\n","protected":false},"author":1,"featured_media":460,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_bbp_topic_count":0,"_bbp_reply_count":0,"_bbp_total_topic_count":0,"_bbp_total_reply_count":0,"_bbp_voice_count":0,"_bbp_anonymous_reply_count":0,"_bbp_topic_count_hidden":0,"_bbp_reply_count_hidden":0,"_bbp_forum_subforum_count":0,"footnotes":""},"categories":[26],"tags":[],"_links":{"self":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/2695"}],"collection":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/comments?post=2695"}],"version-history":[{"count":0,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/2695\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media\/457"}],"wp:attachment":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media?parent=2695"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/categories?post=2695"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/tags?post=2695"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}