{"id":2009,"date":"2019-04-13T06:31:23","date_gmt":"2019-04-13T06:31:23","guid":{"rendered":"https:\/\/www.aiproblog.com\/index.php\/2019\/04\/13\/big-data-isnt-a-concept-its-a-problem-to-solve\/"},"modified":"2019-04-13T06:31:23","modified_gmt":"2019-04-13T06:31:23","slug":"big-data-isnt-a-concept-its-a-problem-to-solve","status":"publish","type":"post","link":"https:\/\/www.aiproblog.com\/index.php\/2019\/04\/13\/big-data-isnt-a-concept-its-a-problem-to-solve\/","title":{"rendered":"Big Data Isn\u2019t a Concept \u2014 It\u2019s a Problem to Solve"},"content":{"rendered":"<p>Author: datascience@berkeley Staff<\/p>\n<div>\n<style type=\"text\/css\"><!--\nh2 { color:#826ac7; } strong{ color: #54489D; }\n--><\/style>\n<p><img decoding=\"async\" src=\"https:\/\/corp-mktg.s3.amazonaws.com\/cask\/prod\/ucb-mids\/content\/f8bba2809d6f412abdb2c8e9a44b4bfd\/4093_What-Is-Big-Data-SOLP_hero.jpg\" alt=\"\" width=\"768\"><\/p>\n<p>An estimated <a href=\"https:\/\/www.bbc.com\/news\/uk-30978995\" target=\"_blank\" rel=\"noopener noreferrer\">5.9 million surveillance cameras keep watch over the United Kingdom.<\/a> While this may sound intimidating to those unaware they are being surveilled, this network of closed-circuit TV cameras helped British authorities piece together the mysterious poisoning of Sergei Skripal, a former Russian intelligence officer turned double agent, and his daughter, Yulia. Super recognizers, people hired for their above-average ability to recognize faces, sorted through thousands of hours of video footage and eventually homed in on two particular suspects. The pair had flown into London\u2019s Gatwick Airport and then traveled to Salisbury, where they carried out the attack. Through the investigation, the British police identified and charged the men, Russian intelligence officers, with attempted murder.<\/p>\n<div class=\"display-6 u--margin-bottom-4 u--margin-top-4 u--ac\" style=\"color: #1fa0cd;\">\n<hr>\n<p>Having a goal, an explicit purpose in collecting and analyzing a dataset, is how scientists can harness the power of data to solve problems and answer questions.<\/p>\n<hr>\n<\/div>\n<p>The public would call this big data in action: a mass of video footage combed by specially trained police analysts to solve an international crime. The case is a story of heroes and villains, of cracking a case with an attention to detail worthy of Sherlock Holmes. But when we reduce our language to the catch-all term <em>big data<\/em>, we lose the story. We run the risk of forgetting why we collect data in the first place: to make our world better through granular details, like an oil painter with a palette knife.\u00a0<\/p>\n<p>Knowing the story makes data valuable. Having a goal, an explicit purpose in collecting and analyzing a dataset, is how scientists can harness the power of data to solve problems and answer questions, ranging from the query of who poisoned the Skripals to the lighthearted question of <a href=\"https:\/\/www.nytimes.com\/interactive\/2018\/08\/09\/opinion\/do-songs-of-the-summer-sound-the-same.html\" target=\"_blank\" rel=\"noopener noreferrer\">why contemporary summer songs tend to sound similar<\/a>.<\/p>\n<div class=\"display-6 u--margin-bottom-4 u--margin-top-4 u--ac\" style=\"color: #1fa0cd;\">\n<hr>\n<p>The way we talk about data matters, because it shapes the way we think about data. And the ways we apply, fund, and support data today will shape the future of our society.<\/p>\n<hr>\n<\/div>\n<p>Modern data analytics allows scientists to answer complex questions using highly specific techniques. However, the public continues to use the generalized term big data and all of its iterations \u2014 big data technology, big data analytics, and big data tools \u2014 to describe their methods.<\/p>\n<p>The way we talk about data matters, because it shapes the way we think about data. And the ways we apply, fund, and support data today will shape the future of our society, according to AnnaLee Saxenian, dean of the UC Berkeley School of Information (I School).<\/p>\n<p>So, why do we still hear the term \u201cbig data\u201d? Dean Saxenian offers her insights on where the term came from and which words we should use instead.\u00a0<\/p>\n<h2>What Is Big Data?<\/h2>\n<p>At the beginning of the information age, big data seemed to aptly describe the technological, cultural, and economic shifts of the early 2000s.<\/p>\n<p>\u201cWe started to have access to a whole bunch of new forms of data: data from the web, data from mobile devices, and, more recently, data from sensor networks,\u201d said Dean Saxenian. Previously, much of the data that scholars used was based on surveys and other kinds of administrative information. The numbers were neatly organized into predetermined categories: for example, the number of employees who rate their job experiences as satisfactory, or the number of college graduates who earn more than $50,000 per year.\u00a0<\/p>\n<p>But this new digital data was different and demonstrated what theorists call the three V\u2019s: variety, velocity, and volume.<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/corp-mktg.s3.amazonaws.com\/cask\/prod\/ucb-mids\/content\/bb567329176048e8ab6a640b3117dc15\/4093_What-Is-Big-Data-SOLP_3Vs-Final.jpg\" alt=\"\" width=\"768\"><\/p>\n<div class=\"display-6 u--margin-top-4 u--margin-bottom-1 u--ac\" style=\"color: #1fa0cd;\">\n<hr>\n<p>\u201cI think [the big data concept] became popular because it did capture the fact that we felt like we were all of a sudden flooded with data.\u201d<\/p><\/div>\n<div class=\"u--margin-bottom-4 u--ac\">\n<p>\u2014 AnnaLee Saxenian, dean of the UC Berkeley School of Information (I School)<\/p>\n<hr>\n<\/div>\n<p>Digitally sourced data has<strong> variety <\/strong>in that they are collected with varying degrees of structure. Data can be heavily unstructured; audio, video, and social media posts can be considered unstructured data. A company can gather more structured data on customers\u2019 clicks on its website, or a person can track her heart rate and physical activity with a wearable device, but data must then be organized in order to be useful. Multi-structured data can involve combinations of structured and unstructured data, organized by similar attributes.<\/p>\n<p>This new kind of data has<strong> velocity<\/strong>, meaning the numbers come in fast and can be processed very quickly. Today, a company can real-time process data collected from mobile devices using analytics and data mining tools.\u00a0<\/p>\n<p>Most compellingly, this data has<strong> volume<\/strong>. The latest technologies developed around the turn of the millennium yielded what Dean Saxenian calls \u201ca firehose of data.\u201d Around this time, the term big data was born.<\/p>\n<p>\u201cI think [the big data concept] became popular because it did capture the fact that we felt like we were all of a sudden flooded with data,\u201d Dean Saxenian said. The magnitude of this moment is difficult to overstate. As a graduate student before the dawning of the digital age, Dean Saxenian had to go to the library at UC Berkeley to make photocopies of government census data from hard copy volumes. With millennium-era technology, anyone can access this same data in seconds \u2014 and not just from one census, but from all of them. \u201cNobody [at the time] knew how to deal with it,\u201d Saxenian said.<\/p>\n<div class=\"display-6 u--margin-bottom-4 u--margin-top-4 u--ac\" style=\"color: #1fa0cd;\">\n<hr>\n<p>Today, the concept of big data is not only less compelling, but it\u2019s also potentially misleading.<\/p>\n<hr>\n<\/div>\n<p>But in the past two decades, big data has been cut down to size. Data scientists have created new tools for collecting, storing, and analyzing these vast amounts of information. \u201cIn some sense, the \u2018big\u2019 part has become less compelling,\u201d Saxenian said.<\/p>\n<h2>Moving Away from Big Data<\/h2>\n<p>Today, the concept of big data is not only less compelling, but it\u2019s also potentially misleading. Size is only one of many important aspects of a data set. The term big data hints at a misconception that high volume means good data and strong insights.<\/p>\n<p>\u201cWe want students and consumers of our research to understand that volume isn\u2019t sufficient to getting good answers,\u201d Saxenian said.<\/p>\n<p>The story we tell about the data \u2014 the questions we ask about the numbers and the way we organize them \u2014 matters as much as, if not more than, the size of the set. Professionals working with data should focus on cleaning the data well, classifying it correctly, and understanding the causal story.<\/p>\n<div class=\"display-6 u--margin-bottom-4 u--margin-top-4 u--ac\" style=\"color: #1fa0cd;\">\n<hr>\n<p>By thinking systematically about data, from our language to our methods, we can better position ourselves to use data science for the good of our communities.<\/p>\n<hr>\n<\/div>\n<p>The UC Berkeley I School challenges students in the online Master of Information and Data Science program to approach data with intentionality, beginning with the way they talk about data. They learn to dig deeper by asking basic questions: Where does the data come from? How was it collected and was the process ethical? What kinds of questions can this data set answer, and which can it not?<\/p>\n<div class=\"u--text-align-center\"><a class=\"button button--large button--cta show-form u--margin-top-3 u--margin-bottom-3\" href=\"https:\/\/datascience.berkeley.edu\/blog\/recent.atom\/#\">Learn to approach data with intentionality through the online Master of Information and Data Science program. <\/a><\/div>\n<p>This process is part of data science. A more useful shorthand than big data, the words imply the rigorous approach to analytics and data mining that Dean Saxenian supports.<\/p>\n<p>\u201cData science, like social and other sciences, is not just about using the tools,\u201d she said. \u201cIt\u2019s also using the tools in a way that allows you to solve problems and make sense of data in a systematic way.\u201d Ultimately, a data set is not so much a painting to be admired but a window to be utilized; scientists use data to see the world and our society\u2019s problems more clearly.<\/p>\n<p>By thinking systematically about data, from our language to our methods, we can better position ourselves to use data science for the good of our communities. \u201cApproaching it more intentionally,\u201d Dean Saxenian concluded, \u201cwill give us the best shot at being good stewards for future generations of the technology.\u201d<\/p>\n<div class=\"container container--md u--background-color-light u--padding-3 u--text-align-center u--margin-bottom-3\">\n<h2 class=\"h5\">Share this on social media:<\/h2>\n<p><a href=\"https:\/\/www.facebook.com\/sharer\/sharer.php?u=https:\/\/datascience.berkeley.edu\/blog\/big-data-is-a-problem-to-solve\/?utm_source=facebook&#038;utm_medium=social&#038;utm_campaign=blog\" target=\"_blank\" rel=\"noopener noreferrer\">Facebook<\/a>\u00a0|\u00a0 <a href=\"https:\/\/www.linkedin.com\/shareArticle?mini=true&#038;url=u=https:\/\/datascience.berkeley.edu\/blog\/big-data-is-a-problem-to-solve\/L?utm_source=linkedin&#038;utm_medium=social&#038;utm_campaign=blog\" target=\"_blank\" rel=\"noopener noreferrer\">LinkedIn<\/a>\u00a0|\u00a0 <a href=\"https:\/\/twitter.com\/intent\/tweet?text=u=https:\/\/datascience.berkeley.edu\/blog\/big-data-is-a-problem-to-solve\/?utm_source=twitter&#038;utm_medium=social&#038;utm_campaign=blog\" target=\"_blank\" rel=\"noopener noreferrer\">Twitter<\/a><\/p>\n<\/div>\n<p>Citation for this content: datascience@berkeley, <a href=\"https:\/\/datascience.berkeley.edu\/\">the online Master of Information and Data Science from UC Berkeley<\/a><\/p>\n<p>\u00a0<\/p>\n<\/div>\n<p><a href=\"https:\/\/datascience.berkeley.edu\/blog\/what-is-big-data\/\">Go to Source<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Author: datascience@berkeley Staff An estimated 5.9 million surveillance cameras keep watch over the United Kingdom. While this may sound intimidating to those unaware they are [&hellip;] <span class=\"read-more-link\"><a class=\"read-more\" href=\"https:\/\/www.aiproblog.com\/index.php\/2019\/04\/13\/big-data-isnt-a-concept-its-a-problem-to-solve\/\">Read More<\/a><\/span><\/p>\n","protected":false},"author":1,"featured_media":2010,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_bbp_topic_count":0,"_bbp_reply_count":0,"_bbp_total_topic_count":0,"_bbp_total_reply_count":0,"_bbp_voice_count":0,"_bbp_anonymous_reply_count":0,"_bbp_topic_count_hidden":0,"_bbp_reply_count_hidden":0,"_bbp_forum_subforum_count":0,"footnotes":""},"categories":[26],"tags":[],"_links":{"self":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/2009"}],"collection":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/comments?post=2009"}],"version-history":[{"count":0,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/2009\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media\/2010"}],"wp:attachment":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media?parent=2009"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/categories?post=2009"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/tags?post=2009"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}