{"id":589,"date":"2018-06-07T06:47:01","date_gmt":"2018-06-07T06:47:01","guid":{"rendered":"https:\/\/www.aiproblog.com\/index.php\/2018\/06\/07\/outlier-detection-with-time-series-data-mining\/"},"modified":"2018-06-07T06:47:01","modified_gmt":"2018-06-07T06:47:01","slug":"outlier-detection-with-time-series-data-mining","status":"publish","type":"post","link":"https:\/\/www.aiproblog.com\/index.php\/2018\/06\/07\/outlier-detection-with-time-series-data-mining\/","title":{"rendered":"Outlier detection with time-series data mining"},"content":{"rendered":"<p>Author: Mab Alam<\/p>\n<div>\n<p><span>In a <a href=\"https:\/\/www.datasciencecentral.com\/profiles\/blogs\/time-series-data-mining-amp-applications\" target=\"_blank\" rel=\"noopener\">previous blog I wrote about 6 potential applications of time series data<\/a>. To recap,<\/span> they are the following:<\/p>\n<ol>\n<li>Trend analysis<\/li>\n<li>Outlier\/anomaly detection<\/li>\n<li>Examining shocks\/unexpected variation<\/li>\n<li>Association analysis<\/li>\n<li>Forecasting<\/li>\n<li>Predictive analytics<\/li>\n<\/ol>\n<p>Here I am focusing on outlier and anomaly <span>detection. Important to note that outliers and anomalies can be synonymous, but there are few differences, although I am not going into those nuances.<\/span><\/p>\n<\/p>\n<p><span><b>WHAT IS AN OUTLIER?<\/b><\/span><\/p>\n<p>In terms of definition, an outlier is an observation that significantly differs from other observations of the same feature. If a time series is plotted, outliers are usually the unexpected spikes or dips of observations at given points in time. A temporal dataset with outliers have several characteristics:<\/p>\n<ul>\n<li>There is systematic pattern (which is deterministic) and some variation (which is stochastic)<\/li>\n<li><span>Only a few data points are outliers<\/span><\/li>\n<li><span>Outliers are significantly different from the rest of the data<\/span><\/li>\n<\/ul>\n<p><span><strong>WHY DETECT OUTLIERS?<\/strong><\/span><\/p>\n<p><span>Broadly for two reasons:<\/span><\/p>\n<p><span>(1) In business applications the project managers should know if an outlier represents an error. Or are there specific reasons they should be concerned of (if undesired) or excited about (if desired).<\/span><\/p>\n<p><span>(2) In research and statistical modeling projects outliers impact model performance. So they are removed during model fitting to enhance prediction accuracy.<\/span><\/p>\n<\/p>\n<p><span><b>REAL WORLD APPLICATION DOMAINS (few of many)<\/b><\/span><\/p>\n<ul>\n<li><a href=\"https:\/\/ieeexplore.ieee.org\/document\/1623916\/\" target=\"_self\">Financial market<\/a><span>: Price manipulation, fraudulent transactions and fraud detection in banking and stock market exchange<\/span><\/li>\n<li><span>Credit card: fraud detection algorithms detect any unusual\/fraudulent financial transactions or credit card theft<\/span><\/li>\n<li><span>Computer network: Detecting network intrusion based on anomalous traffic in computer networks<\/span><\/li>\n<li><a href=\"http:\/\/publications.lib.chalmers.se\/records\/fulltext\/242944\/242944.pdf\"><span>Process industries<\/span><\/a><span>: Anomaly detection in pulp and paper industries and other process industries<\/span><\/li>\n<li><span>Aviation: Aircraft sensor monitoring to observe any potential malfunction<\/span><\/li>\n<li><span>Healthcare: Abnormal patient conditions based on reading electrocardiogram (ECG) recordings of heart beat pulses<\/span><\/li>\n<li><a href=\"http:\/\/cs.dartmouth.edu\/~ac\/Pubs\/kdd06-attack.pdf\"><span>Recommender systems<\/span><\/a><span>: Detection of attacks in Recommendation Systems to alter recommendations<\/span><\/li>\n<li><a href=\"https:\/\/www.hindawi.com\/journals\/mpe\/2014\/879736\/\"><span>Hydrology<\/span><\/a><span>: Real time monitoring of hydrological monitoring and management of water resources<\/span><\/li>\n<li><span>Web analytics: Detecting unexpected growth or drop in website visit and monitoring any significant statistical variations and anomalies<\/span><\/li>\n<li><span>Weather forecast: Real-time weather monitoring based on satellite, radar and ground measurements to detect extreme events<\/span><\/li>\n<li><span>Acoustic monitoring: real-time acoustic monitoring of<\/span> <a href=\"https:\/\/www.pmel.noaa.gov\/acoustics\/\"><span>oceanic activities<\/span><\/a> <span>for research and other applications such as<\/span> <a href=\"https:\/\/www.wwf.org.uk\/conservationtechnology\/acoustic.html\"><span>environmental conservation<\/span><\/a><\/li>\n<li><span>Geology: Observation of earthquake and seismic activities due to anthropogenic causes such as nuclear tests.<\/span><\/li>\n<li><span>Astronomy: In astronomy detecting outliers in the observation of features and characteristics of stars and galaxies. Most famous of all is probably the recent<\/span>\u00a0gravitational waves detection.<\/li>\n<\/ul>\n<p style=\"text-align: center;\"><img loading=\"lazy\" decoding=\"async\" class=\"align-center\" src=\"https:\/\/lh5.googleusercontent.com\/qAMMCWlp-QTtS5piHmgPgQwG3C6MMXfvXoK3C9jRut2_g20KyCcKaGmK0-tenNvaTpoNhIL3x5nDCogHfxJtYX4BRAllKzu7TgiwEgjXMs-pckcXGfRrojslJiNZc5FUcu3QkkionOHmthoWlw\" width=\"422\" height=\"357\"><span style=\"font-size: 8pt;\">LIGO measurement of the gravitational waves at the Livingston (right) and Hanford (left) detectors,<\/span><\/p>\n<p style=\"text-align: center;\"><span style=\"font-size: 8pt;\">compared with the theoretical predicted values (Source: Wikipedia)<\/span><\/p>\n<\/p>\n<p><span><b>TOOLS AND METHODS (few of many)<\/b><\/span><\/p>\n<ul>\n<li><span>STL: This decomposes time series data into 3 components: seasonality, trend and residue<\/span><\/li>\n<li><span>Generalized Extreme Student Deviation<\/span><\/li>\n<li><span>ARIMA: There is a R package called<\/span> <a href=\"https:\/\/cran.r-project.org\/web\/packages\/tsoutliers\/tsoutliers.pdf\"><span>tsoutliers<\/span><\/a> <span>to do exactly this.<\/span><\/li>\n<li><a href=\"https:\/\/github.com\/twitter\/AnomalyDetection\"><span>AnomalyDetection R package<\/span><\/a><\/li>\n<li><span>Mean Absolute Deviation (MAD) for real time monitoring of streaming data<\/span><\/li>\n<li><span>Exponential smoothing: By observing the deviation of unsmoothed data with the smoothed ones<\/span><\/li>\n<li><a href=\"https:\/\/www.hindawi.com\/journals\/mpe\/2014\/879736\/\"><span>Sliding window<\/span><\/a><span>-based forecasting method: This approach uses a forecasting model built using past data within a given window and predicts a future value. If an observed value significantly differs from predicted value, it\u2019s an outlier.<\/span><\/li>\n<li><a href=\"https:\/\/pdfs.semanticscholar.org\/22ae\/4f1325fc14e061adbfa378562dac7fa6974e.pdf\"><span>Peer Group Analysis<\/span><\/a><\/li>\n<\/ul>\n<\/div>\n<p><a href=\"https:\/\/www.datasciencecentral.com\/xn\/detail\/6448529:BlogPost:726446\">Go to Source<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Author: Mab Alam In a previous blog I wrote about 6 potential applications of time series data. To recap, they are the following: Trend analysis [&hellip;] <span class=\"read-more-link\"><a class=\"read-more\" href=\"https:\/\/www.aiproblog.com\/index.php\/2018\/06\/07\/outlier-detection-with-time-series-data-mining\/\">Read More<\/a><\/span><\/p>\n","protected":false},"author":1,"featured_media":464,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_bbp_topic_count":0,"_bbp_reply_count":0,"_bbp_total_topic_count":0,"_bbp_total_reply_count":0,"_bbp_voice_count":0,"_bbp_anonymous_reply_count":0,"_bbp_topic_count_hidden":0,"_bbp_reply_count_hidden":0,"_bbp_forum_subforum_count":0,"footnotes":""},"categories":[26],"tags":[],"_links":{"self":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/589"}],"collection":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/comments?post=589"}],"version-history":[{"count":0,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/589\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media\/468"}],"wp:attachment":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media?parent=589"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/categories?post=589"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/tags?post=589"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}