Big metadata: 7 ways to leverage your data in the cloud
Connecting state and local government leaders
Well-designed metadata will enable the description, discovery and reuse of data assets in the cloud and help break down agency silos.
In President Obama’s 2012 re-election campaign, his technical team did something very bold: They broke down data silos by moving all the data into a single cloud repository. On top of Amazon's services, the team built Narwhal, a set of services that acted as an interface to a single shared data store for all of the campaign's applications, making it possible to quickly develop new applications and to integrate existing ones into the campaign's system. Those apps include sophisticated analytics programs like Dreamcatcher, a tool developed to "microtarget voters based on sentiments within text.”
In February of that year, Slate called this technology, “Obama’s White Whale,” and it is a fitting description for technology that is almost mythical in federal agencies that have talked about “breaking down data silos” for years.
Fortunately, migrating agency data to the cloud offers IT managers another opportunity to break down those silos, integrate their data and develop a unified data layer for all applications. In this article, I want to examine how to design metadata in the cloud to enable the description, discovery and reuse of data assets in the cloud. Here are the basic metadata description methods (what I like to think of as the “Magnificent Seven” of metadata!) and how to apply them to data in the cloud:
- Identification - Identification represents the ability to distinguish one data asset from another. Examples are attributes like name, <entity>ID, location and signatures. Most cloud-based NoSQL stores use key-value pairs where the key is a unique identifier. A best practice is to create unique identifiers; however, in relation to linked data, the best practice would be to make the identifier dereferenceable (like a URL).
- Static Measurement is used to measure of constant or very slow-changing characteristics of a target data asset. Examples include fixed attributes like format, size, creation date, creator, security classification and other content-specific measurements. In the cloud, lineage becomes critical to centralization and enabling trust.
- Dynamic measurement details variable or changing aspects of a data asset. A typical example requiring dynamic measurement is capturing state information on a data asset. Other examples are usage counts, ratings, sales, plays, location tracking, ranking, etc. Many of the cloud benefits involve dynamic characteristics like metered billing, uptime, server utilization and storage utilization.
- Degree scales measure both an artifact’s progress along a continuum and the meaningful inflection points along that continuum. Of course, in a numeric scale, the inflection points are a given. Examples of this are time scales, performance scales and opinion scales. In the cloud, usage scales and thresholds are key for automated scalability. Additionally, degree scales be used to measure subjective characteristics like user satisfaction. Some interesting cloud examples are types of infrastructure-as-a-service instances (tiny, small, medium, large, etc.) and the degree of application migration complexity in a migration scorecard.
- Categorization enables the division of a population of data assets into manageable groups based on commonalities of all the members within a group. A hierarchical arrangement of the groups facilitates discovery and roll-up. Examples of taxonomies are genre/subgenres in music, product taxonomies on Amazon.com, and even NIST’s Cloud Taxonomy. Categorization is very important to improve the discovery of your data assets and even your cloud applications.
- Relationships create predicates (also known as relationships) between the metadata record and its target data asset or between the metadata record and other metadata artifacts. Examples of relationships are Facebook “friends” (the social graph), Amazon recommendations and linked data. In the cloud, relationships are key to modeling how cloud items should interact with one another.
- Commentary provides free-form textual description for human readers of the metadata record. This is the most common form of metadata and should be indexed for discovery.
Once security and privacy concerns are alleviated by cloud providers who certifying their compliance with security and privacy controls (like FedRAMP), the push will be on to integrate and centralize data stores to provide agencies a 360-degree view of key customers or end-users.
Migrating agency data to the cloud is the perfect time to finally smash those data silos!
Michael C. Daconta (mdaconta@incadencecorp.com) is the Vice President of Advanced Technology at InCadence Strategic Solutions and the former Metadata Program Manager for the Homeland Security Department. His new book is entitled, The Great Cloud Migration: Your Roadmap to Cloud Computing, Big Data and Linked Data.