Big data: You'll have it, but can you handle it?

By Michael C. Daconta

Connecting state and local government leaders

| April 25, 2011

Massive datasets created by new cloud platforms will put a heavy burden on government IT managers. Here’s what you can do about it.

So what is big data? First, it is not your father’s data. Some examples are cell phone geolocation data, sensor data, surveillance data, Wikipedia text, social media status updates and many other streams of continuous data. These streams might not be record- or document-oriented. Instead, they are often transient or might be aggregated from multiple sources.

In 1999, I was called in to troubleshoot a customer’s client/server application that had recently failed a government acceptance test by taking more than 20 minutes to complete queries during stress testing. After months of intense software redesign that included overcoming pushback from a recalcitrant software development team, we were able to increase query performance by 2,000 percent, and the system subsequently passed its acceptance test.

That experience taught me two hard-fought lessons: First, even though I am a staunch advocate of Donald Knuth’s admonition that “premature optimization is the root of all evil,” performance matters. And second, scalability is hard to achieve.

Or at least it used to be. Cloud computing is changing that. It is making scalability easier and enabling a proportional increase in the size and scope of data that organizations can process. These two ramifications — instant scalability and the advent of “big data” — are reshaping the computing and information management landscapes. Previously, big data would significantly degrade the scalability of an application, and programmers would therefore introduce throttling mechanisms or look to Moore’s law to bail them out of performance problems. But now, you can have your cake (big data) and eat it, too (scalability)! As we will see, nowhere is the need for processing big data more urgent that in the U.S. government.

United we map: For GIS storage, bigger is better

HHS starts health data section on Data.gov

For the Defense Department, the ability to rapidly exploit huge volumes of data can mean the difference between life and death. Thus, the Army recently announced it had deployed its first tactical cloud to Afghanistan. The Health and Human Services Department is funding grants to sift the huge volumes of data expected to follow adoption of electronic health care records. Meanwhile, the National Oceanic and Atmospheric Administration and Environmental Protection Agency routinely create huge quantities of sensor data as they monitor the physical environment.

Agency by agency, from the Securities and Exchange Commission to the Justice and Homeland Security departments and every other large government organization, volumes of data are increasing exponentially. They are already struggling with big data, want big data to improve analysis and make better decisions, or a combination of the two.

In fact, I am beginning to see big data as an emerging data type with a unique set of properties and challenges. Big data has different meta data and processing requirements, as is evident in parallel processing algorithms such as map/reduce.

With big data, all the common meta data attributes for accuracy, lineage, security and privacy take on increased importance because of the volume of data in question. Meanwhile, parallelization is a key part of processing big data to enable useful results in a reasonable time frame. Along with parallelization, visualization and summarization are core processing techniques for big data.

Given the scale of these datasets, a processing mistake or unauthorized spillage of big data means big trouble. In addition to increased prudence, we must add meta data attributes that are unique to big data, such as granularity, degree of aggregation, use of heuristics and the degree of preprocessing. Other possible meta data attributes that might be applicable include time span, geospatial info, transience, transactional capabilities, and many others.

For government IT managers and CIOs, big data is at the doorstep. Now is the time to rethink your data architectures to accommodate this new type of data. Big data will hold great promise or peril based on your ability to understand and take advantage of it.

Social media Case Studies

Browse The Atlas full case study database or read more case studies about Social media.

Forsyth County Sheriff’s Office powers up community engagement with the MySheriff mobile app.

Forsyth County, GA

Sheriff’s Office Expands Beyond Social Media for Public Safety Communications

Greene County, GA

Howey-in-the-Hills Saves $25,000 Per Year by Digitizing Agenda and Meeting Management

Howey-In-The-Hills, FL

BROWSE LOCAL GOV CASE STUDIES

NEXT STORY: Amazon gets cloud services back online

This website uses cookies to enhance user experience and to analyze performance and traffic on our website. We also share information about your use of our site with our social media, advertising and analytics partners. / Do Not Sell My Personal Information

Accept Cookies