As big data grows, technologies evolve into ecosystems
Big data platforms, both proprietary and open source, emerge as comprehensive solutions to handle magnitude, complexity and variety of data.
Government agencies are facing the same massive, data-driven challenges as other industry sectors, such as health care, financial services and retail, and that require laying out strategies that lead to the deployment of sophisticated big data platforms and analytics.
But what is the best way to accomplish this: By cobbling together various “point products” that address all of the big data processes, or by building a “big data platform” that integrates all of the capabilities organizations need to apply deep analytics?
A data tsunami – driven by the volume, variety and the velocity of data – is straining IT infrastructures. So a new and fundamentally different approach to ingesting, processing, analyzing and storing data is inevitable, according to industry experts, who advocate a strategic approach to manage the challenges.
Government agencies want to harness big data to tackle a host of issues, including crime, cyber security, fraud detection and prevention, intellectual property protection, operation efficiency, resource optimization and situational awareness.
“The ideal solution is to have comprehensive products that easily integrate together as a cohesive platform,” said Eric Sall, vice president of information management of IBM’s Software Group.
Applying individual products is only a band-aid approach because current IT systems were not designed to deal with the magnitude or complexity of the data or advanced workloads agencies now need to process, he said. Instead, a platform approach allows organizations to start small with the ability to scale, as opposed to taking a piecemeal approach that can often cost more, take longer, and deliver less in the long run, Sall said. “Combining the right platform approach with the appropriate big data and analytics architecture helps ensure full value is achieved from big data,” he said.
For its part, IBM has acquired, built and assembled a set of products that fits into its Info Sphere big data platform. The key components of the platform include Hadoop-based analytics, which processes and analyzes any data type across server clusters; stream computing software, which provides continuous analysis of massive volumes of streaming data with sub-millisecond response times; and a data warehouse for operational insight.
The IBM platform includes a set of supporting applications services – such as accelerators, and application development, integration, information, governance and system management tools – as well as business intelligence and analytic tools.
To avoid being locked in by one vendor, however, government agencies should carefully assess whether “one-stop shop” big data platforms from one company can meet all of their needs, said Charles Lewis, principal information systems engineer with Mitre, a research organization that helps government agencies assess technology. Lewis leans more toward open platforms that let users plug in any type of tool they need to address their data analytic requirements.
Over the past two years, Lockheed Martin has been building an eco-system of big data tools using open-source technology, said Caron Kogan, its big data strategic planning director. “We are working with open-source technologies, and in some cases, we will substitute a commercial product if that is the preference of the customer,” Kogan said. Tools from IBM or Teradata, for example, can be inserted into the framework, though many of Lockheed’s customers are asking for open-source tools, she said.
But because open-source tools require a lot of work to make them production-ready, Lockheed is building a framework so users do not have to build a platform from scratch, Kogan said. The biggest challenge for organizations is getting data analytics out of the traditional databases and business intelligence systems and into big data infrastructures, she said. The ecosystem will handle the entire range of big data activity, including data ingestion, processing, and store and querying capabilities, along with deployment of analytics. Lockheed also is focusing on tools and libraries that can be pre-packaged to help accelerate data analysis.
“Big data was born and lives in the open-source world,” and many of the solutions run on Linux servers, said Gunnar Hellekson, chief technology strategist for Red Hat. The Apache Hadoop framework for processing large volumes of structured and unstructured data kick-started the category, followed by NoSQL databases such as Cassandra and MongoDB that can handle massive amounts of data.
Red Hat is focused on reducing the gap between big data and actionable information by making all data available for analytics through a platform for capturing, processing, and integrating big data. Red Hat Enterprise Linux provides the underlying platform that incorporates Red Hat software-defined storage, middleware and the OpenStack hybrid cloud, Hellekson noted.
Even networking vendor Cisco Systems has built a big data ecosystem. The Cisco Common Big Data Platform powers big data applications and integrates them with traditional apps and systems, according to Kapil Bakshi, chief architect and strategist of Cisco Public Sector. The platform is based on Cisco’s Unified Computing System, which integrates compute, networking and storage capabilities onto a single integrated platform, and works with a slew of vendor offerings, including Cloudera, Datasta, Hortonworks, Intel, Oracle, Pivotal and SAP Hanna.
Cisco offers reference architectures to tie these vendors’ tools into the big data platform, providing data analytics, Hadoop, NoSQL databases, and massively parallel processing capabilities. A massively parallel processing database is a single system with many independent microprocessors running in parallel, as opposed to a distributed system, which runs a massive number of separate computers to solve a single problem.
The bottom line is that organizations need these massively parallel processing systems and other big data tools that can scale out to address the volume, velocity and variety of big data, whether they come from a proprietary vendor’s platform or a platform based on open technologies. It makes life simpler for organizations if their workforce can unlock the value of their data via an ecosystem of integrated tools, industry experts said.
NEXT STORY: New Christie 3D projector is no mirage