Big data system being tailored for feds
Connecting state and local government leaders
Lockheed Martin and Cleversafe are developing a federal version of the Dispersed Compute Storage solution, which combines Hadoop MapReduce and Cleversafe's dsNet to handle massive computation and storage requirements.
Lockheed Martin and Cleversafe are developing a federal version of the Cleversafe Dispersed Compute Storage solution designed for the unique needs of federal government agencies.
“The federal community has been out in front of big data, well ahead of many other market segments, and needs technology solutions today that are well suited for exabyte scale storage as well as massive computation,” said Tom Gordon, CTO and vice president of engineering of Lockheed Martin’s Information Systems and Global Solutions-National.
In March, the Obama administration launched a “Big Data Research and Development Initiative” aimed at improving the tools and techniques required to access, organize and glean pertinent information from huge volumes of digital data.
Related coverage:
White House launches $200M 'Big Data R&D' initiative
Six federal departments and agencies announced more than $200 million in new commitments to achieve the goals of the initiative, including the Defense and Energy departments, Defense Advanced Research Projects Agency, National Institutes of Health, National Science Foundation and the U.S. Geological Survey.
Big data involves datasets that grow so large they become awkward to work with using traditional database management tools. Similarly, traditional storage systems are not designed for large-scale distributed computation and data analysis.
Present implementations treat data storage and analysis of that data separately. Data is transferred from Storage Area Networks or Network Attached Storage across the network to perform the computations used to gather insight. However, the network quickly becomes the bottleneck, making multi-site computation over the wide-area network particularly challenging.
Cleversafe solves this problem by combining Hadoop MapReduce alongside its Dispersed Storage Network (dsNet) system on the same platform. The company also replaces the Hadoop Distributed File System with Information Dispersal Algorithms, allowing analytics at a scale previously unattainable through traditional HDFS configurations, according to Russ Kennedy, vice president of product strategy, marketing and customer solutions with CleverSafe.
Hadoop MapReduce is a programming model and software framework for writing applications that rapidly process vast amounts of data in parallel on large clusters of compute nodes. Although Hadoop MapReduce allows computations to be performed where data exist, it does have some limitation, Kennedy noted.
For one, HDFS uses a single server for metadata operations. If that server fails, users won’t have access to their data or data can be lost. Additionally, HDFS copies data three times to prevent failures and to protect data. This is not a big problem if users have terabytes of data, but if the data scales to petabytes or exabytes, management of that data becomes more difficult and overhead costs rise, Kennedy said.
Cleversafe’s dsNet system protects both data and metadata equally. By applying the company’s Information Dispersal technology to slice and disperse data, single points of failure are eliminated.
Cleversafe uses three devices in its product offering: An Accesser node, which slices up and then retrieves data; the Slicestor, which is the storage array that holds the data; and the Manager, a client that manages the storage network and offers various capacity reporting tools.
As data is distributed evenly across all Slicestor nodes, metadata can scale linearly and infinitely as new nodes are added, reducing any scalability bottlenecks and increasing performance, Kennedy noted. Cleversafe’s approach delivers the combination of analytics and storage in a geographically distributed single system, which lets organizations efficiently scale their big data environments to hundreds of petabytes and even exabytes, he said.
Base functionality for Dispersed Compute Storage is expected by the end of the year as part of the Cleversafe 3.0 software release. Additional functionality for tools support will be available in the first half of 2013 as part of the Cleversafe 3.1 software release
NEXT STORY: The AC/DC lesson: Why IPv4 will be with us a long time