Unless you have been living under a rock, you have probably heard the term Big Data. You may wonder just what Big Data is and what all the fuss is about. Perhaps you've read too many stories about huge data breaches or fathers finding out that their teenage daughters are pregnant by receiving baby-related promotions (Target, anyone?) and decided you wanted no part of that sort of potential privacy intrusion. Or you may simply have dismissed the term as one more tech buzzword.
The fact is, however, that harnessing the power of Big Data can lend an air of authority and depth to your writing that might tip the scales in your favor with an editor or publisher. And while you may have privacy concerns, Big Data resources that are likely to be useful to you as a writer usually divulge little or no personal information. What IS required to utilize Big Data in your writing is an instinct for asking the right questions and the persistence to pursue information -- skills many writers possess in abundance.
A 2013 article on the Forbes website lists one of the simplest definitions of Big Data:
"Big data is a collection of data from traditional and digital sources inside and outside your company that represents a source for ongoing discovery and analysis."
Information drives Big Data, and information is exploding at an exponential rate. To gain perspective on just how much information is involved, consider the fact that since the 1980s, per-capita data storage capacity has doubled every 40 months.
Many retail and other corporations such as Target and Amazon collect huge reservoirs of data. Customer preferences and purchase records are maintained not only to fill orders and stock shelves but to create customer profiles. When cross-referenced with demographic categories such as age, gender, zip code and ethnic background, customer profiles can be utilized to create targeted ads and other directed outreach efforts, such as the infamous Target miscues involving baby-related marketing.
Do you need or want to know the daily count of passengers for each stop on the public bus system in your town to include in a blog post or a magazine assignment? Would it be useful to include detailed information about flight delays and on-time performance for domestic airlines for a book you are writing? No matter what you are writing, chances are your work could benefit from including data and statistics.
For instance, if you are writing about ways to prevent a repeat of the housing crisis that began in 2008, you could include statistics comparing average housing prices and foreclosure rates for different neighborhoods. You could drill deeper to include information about vacant properties, crime statistics or seemingly mundane but relevant details such as the average distance to the nearest grocery stores. Consider the following examples of paragraphs that might be included in a book or article about on-time airline performance and the passenger experience.
Without Big Data: "If you have had the feeling during the past several months that flying has become a hit or miss prospect in regards to on-time performance, you may very well be right. Thousands of flights are delayed and cancelled across North America every month. Sometimes weather is the culprit; other times holiday rush periods are responsible."
With Big Data: "If you have had the feeling during the past several months that flying has become a hit or miss prospect in regards to on-time performance, you may very well be right. As of July 2014, there were more than 38,000 flight cancellations and 600,000+ total flight delays worldwide during the previous 30 days. Of these, nearly 18,000 cancellations and approximately 175,000 delays occurred in North America alone. Sometimes weather is the culprit; other times holiday rush periods are responsible."
As this example illustrates, detailed data can add a sense of authority to your work, especially if you are writing long-form nonfiction articles or books. In this case, the source is the FlightStats website. But similar examples can be generated by inserting statistical data from other categories, including Chicago neighborhood crime statistics, Hawaii tourism, and household clothing expenditures into general narrative text.
Chicago Neighborhood Crime Data:
Without Big Data: "The large number of shootings in Chicago over the 2014 Independence Day weekend generated nationwide headlines. A casual observer might form the conclusion from Chicago news stories reported during the holiday weekend that the entire city is plagued with violence and bloodshed. In fact, the overall violent crime rate for Chicago is lower than the rate for property crimes, which are more likely to occur in Chicago's affluent and densely populated downtown."Hawaii Tourism:
With Big Data: "The large number of shootings in Chicago over the 2014 Independence Day weekend generated nationwide headlines. A casual observer might form the conclusion from Chicago news stories reported during the holiday weekend that the entire city is plagued with violence and bloodshed. But according to the Crime in Chicago database, maintained by the Chicago Tribune, violent crime reports for West Englewood total 3.2 per 1,000 people, the highest in the city. By contrast, property crime reports in the Loop total 10.7 per 1,000 people, far and away the highest rate in the entire city. In other words, the overall violent crime rate is much lower than for property crimes, which are more likely to occur in Chicago's affluent and densely populated downtown."
Without Big Data: "Hawaii tourism suffered a steep decline as a result of the Great Recession in the United States and the tsunami that devastated Japan in March 2011. But since the US economy has moved into recovery, and as Japan has rebounded from its nuclear and natural disaster, tourism numbers in Hawaii have been on the upswing. Recent tourism figures are trending close to the record number of visitors notched before the recession. Indications point to a continued recovery in tourism the rest of the United States and from Japan -- the state's two most important markets."
With Big Data: "Hawaii tourism suffered a steep decline as a result of the Great Recession in the United States and the tsunami that devastated Japan in March 2011. But since the US economy has moved into recovery, and as Japan has rebounded from its nuclear and natural disaster, tourism numbers in Hawaii have been on the upswing. Recent figures are trending close to the record number of nearly 7.5 million visitors arriving by air before the recession, notched by the 2006 Annual Visitor Research Report issued by the government of Hawaii. Indications point to continued tourism recovery according to the Hawaii Tourism Authority, with more than 3.2 million visitors arriving by air from the rest of the United States through May 2014, and nearly 590,000 visitors arriving by air from Japan during the same period -- the state's two most important markets."
Without Big Data: "From 1985 to 2005, the average annual household expenditure for clothing increased every year, with the highest proportion of spending devoted clothing for adult women. But household spending for clothing experienced a significant drop in 2010, largely due to the recession. Nonetheless, the proportions for spending in 2010 were comparable to those of pre-recession levels, with spending for clothing for adult women leading all other categories."
With Big Data: "From 1985 to 2005, according to the Bureau of Labor Statistics, the average annual household expenditure for clothing increased every year from $1,405 in 1985 to $1,886 in 2005, with the highest proportion of spending, $499 in 1985 and $633 in 2005, devoted to clothing for adult women. But household spending for clothing experienced a significant drop for 2010, to $1,700, largely due to the recession. Nonetheless, the proportions for spending in 2010 were comparable to those of pre-recession levels, with $562 in spending for clothing for adult women leading all other categories."
In past decades, obtaining statistics to add meat to the bones of a story often required lengthy sessions in the library poring over dusty newspapers or rolling through reels of microfilm. Nowadays, many online resources exist that have already done much of the heavy lifting of streamlining Big Data into manageable parcels. Statistics are readily available for nearly every subject in the form of online data sets created by marketing departments, government entities and industrious individuals. Many of the most useful online Big Data sets are available for access via the Internet free of charge; others charge modest fees. Accessing a few data sets still necessitates a trip to the public library -- just like the good old days.
Uncle Sam has traditionally represented one of the best sources of Big Data. Online portals such as the Library of Congress, the Bureau of Labor Statistics, the Centers for Disease Control and Prevention and the Bureau of the Census are virtual treasure troves. Since the beginning of the Obama administration, the Data.gov website has also become a go-to, user-friendly database of federal information.
Think tanks like the Brookings Institution (liberal) and the Reason Foundation (conservative/libertarian) along with nonpartisan research portals like Pew Research Internet Project represent excellent sources for both data and analysis. Local municipalities and county governments are increasingly likely to make raw administrative and demographic data available to the public. If there are no online open data resources available in your area, send an email or place a call to the relevant government or municipal agency to inquire.
The advantage of using official government and other well-regarded sources like these is that you are more likely to obtain data that is rigorously collected and reported. However, even data from impeccable sources may be flawed. For instance, official crime statistics from a municipality may count a shooting where the victim was robbed only as a shooting -- or worse -- count the incident as two separate occurrences. Exercising due diligence in sourcing the data by checking out the About Us, FAQ or similar section of the website or portal to determine how the data is collected and analyzed is essential to ensure that the data you include is as accurate as possible.
Do not eliminate commercial sources of data out of hand. Many commercial websites draw their reports from well-regarded data sources. If you find a commercial site with data that you would like to use, check out its About Us or FAQ pages just as you would for any other site. For instance, Flight Stats, the source of the statistics quoted in the example above, is a commercial website. However, reading through the FAQ section of the website reveals that FlightStats sources its airline performance data from the Federal Aviation Administration ASDI Data Feed, the European Data Feed, the GDS (Sabre, Amadeus, Apollo, Galileo) data set and from direct airport and airline data feeds.
You don't need to be a professional researcher to unearth Big Data statistics and information to enhance your writing. A combination of keyword searching and examining various websites and portals should yield plentiful information. The list below represents search terms I used with a standard browser-based search engine to generate the data for the four examples listed above.
That said, extracting the data to enhance your writing frequently involves some digging even after you have discovered useful web portals. For instance, on the FlightStats website, I navigated to the Delays tab, which revealed the information I was seeking. The numbers included in the Hawaii tourism example were extracted from two sources: the Annual Visitor Research Report, a PDF document posted on the Visitor Statistics site of the Department of Business, Economic Development and Tourism portal maintained by Hawaii.gov; and the 2006 Annual Visitor Research Report, a second PDF document available on the Hawaii Tourism Authority website. The Chicago crime data stats were easier to access, displayed on the splash (entry) page of the Crime in Chicagoland portal. Finally, the statistics included in the fashion example were pulled from a June 2012 Spotlight on Statistics in Fashion slide show posted on the BLS website.
If your searches fail to turn up viable leads, or if you are mystified by the data you uncover, the reference librarian at your local or university library will undoubtedly be happy to assist you. Be prepared to provide a general description of the type of book or article you are researching as well as its potential market. You should also inform the librarian of previous search terms you have used and websites you have already consulted. This information will assist him or her in narrowing down potential resources. Many libraries provide research services free of charge; others charge a modest fee. If you have extensive search needs, you may need to set an appointment.
So don't be afraid to seek out and use the resources provided by Big Data in your next Big Project. It might provide just the details you need to earn a Big Paycheck!
Rather than functioning as an exhaustive list, the resources included below should serve as examples to inspire your own searches. The vast majority of these and other data sets are available online for access on your home computer. Several of these data sets also have mobile enabled sites or specialized mobile apps for your Smartphone or tablet. Sites listed are free of charge unless otherwise indicated.
Citation Links for Big Data Sample Paragraphs
Background Links – Internal Citation Links