{"id":3034,"date":"2020-01-16T06:33:03","date_gmt":"2020-01-16T06:33:03","guid":{"rendered":"https:\/\/www.aiproblog.com\/index.php\/2020\/01\/16\/multi-gigabyte-r-data-table-for-ohio-voter-registration-history\/"},"modified":"2020-01-16T06:33:03","modified_gmt":"2020-01-16T06:33:03","slug":"multi-gigabyte-r-data-table-for-ohio-voter-registration-history","status":"publish","type":"post","link":"https:\/\/www.aiproblog.com\/index.php\/2020\/01\/16\/multi-gigabyte-r-data-table-for-ohio-voter-registration-history\/","title":{"rendered":"Multi Gigabyte R data.table for Ohio Voter Registration\/History"},"content":{"rendered":"<p>Author: steve miller<\/p>\n<div>\n<p><a href=\"https:\/\/storage.ning.com\/topology\/rest\/1.0\/file\/get\/3820585650?profile=original\" target=\"_blank\" rel=\"noopener noreferrer\"><img decoding=\"async\" src=\"https:\/\/storage.ning.com\/topology\/rest\/1.0\/file\/get\/3820585650?profile=RESIZE_710x\" class=\"align-full\"><\/a><\/p>\n<p><em>Summary: This blog details R data.table programming to handle multi-gigabyte data. It shows how the data can be efficiently loaded, &#8220;normalized&#8221;, and counted. Readers can readily copy and enhance the code below for their own analytic needs. An intermediate level of R coding sophistication is assumed.<\/em><\/p>\n<p>In my travels over the holidays, I came across an<span>&nbsp;<\/span><a href=\"https:\/\/www.nytimes.com\/2019\/10\/14\/us\/politics\/ohio-voter-purge.html?searchResultPosition=1\" rel=\"noopener noreferrer\" target=\"_blank\">article in the New York Times<\/a><span>&nbsp;<\/span>on voter purging in Ohio. Much of the research surrounding the article was driven by data on the<span>&nbsp;<\/span><a href=\"https:\/\/www6.sos.state.oh.us\/ords\/f?p=VOTERFTP:STWD:::#stwdVtrFiles\" rel=\"noopener noreferrer\" target=\"_blank\">voting history of Ohio residents<\/a><span>&nbsp;<\/span>readily available to the public.<\/p>\n<p>When I returned home, I downloaded the four large csv files and began to investigate. The data consisted of over 7.7M voter records with in excess of 100 attributes. The &#8220;denormalized&#8221; structure included roughly 50 person-location variables such as address and ward, and a close to 50 variable &#8220;repeating group&#8221; indicating voter participation in specific election events and characterized by a concatenated type-date attribute with an accompanying voted or not attribute.<\/p>\n<p>My self-directed task for this blog was to load the denormalized data as is, then create auxiliary &#8220;melted&#8221; data.tables that could readily be queried i.e. transform from wide to long. The query type of interest revolved on counts\/frequencies of the dimensions election type, date, and participation. The text will hopefully elucidate both the power and ease of programming with R&#8217;s data.table and tidyverse packages.<\/p>\n<p>The technology used is Wintel 10 along with JupyterLab 1.2.4 and R 3.6.2. The R data.table, tidyverse, magrittr, fst, feather, and knitr packages are featured.<\/p>\n<p>See the entire blog&nbsp;<a href=\"https:\/\/storage.ning.com\/topology\/rest\/1.0\/file\/get\/3820593655?profile=original\" target=\"_blank\" rel=\"noopener noreferrer\">here.<\/a><\/p>\n<p><a href=\"https:\/\/storage.ning.com\/topology\/rest\/1.0\/file\/get\/3820585650?profile=original\" target=\"_blank\" rel=\"noopener noreferrer\"><\/a><\/p>\n<\/div>\n<p><a href=\"https:\/\/www.datasciencecentral.com\/xn\/detail\/6448529:BlogPost:922827\">Go to Source<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Author: steve miller Summary: This blog details R data.table programming to handle multi-gigabyte data. It shows how the data can be efficiently loaded, &#8220;normalized&#8221;, and [&hellip;] <span class=\"read-more-link\"><a class=\"read-more\" href=\"https:\/\/www.aiproblog.com\/index.php\/2020\/01\/16\/multi-gigabyte-r-data-table-for-ohio-voter-registration-history\/\">Read More<\/a><\/span><\/p>\n","protected":false},"author":1,"featured_media":472,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_bbp_topic_count":0,"_bbp_reply_count":0,"_bbp_total_topic_count":0,"_bbp_total_reply_count":0,"_bbp_voice_count":0,"_bbp_anonymous_reply_count":0,"_bbp_topic_count_hidden":0,"_bbp_reply_count_hidden":0,"_bbp_forum_subforum_count":0,"footnotes":""},"categories":[26],"tags":[],"_links":{"self":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/3034"}],"collection":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/comments?post=3034"}],"version-history":[{"count":0,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/3034\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media\/475"}],"wp:attachment":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media?parent=3034"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/categories?post=3034"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/tags?post=3034"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}