How to avoid lock-in and ensure data portability in the cloud
Despite the many obstacles to data portability, existing tools and techniques can help manage data in the cloud, keep it accessible and portable and ensure cross-provider compatibility.
In 2012, a Government Accountability Office report outlined seven major challenges to adopting the Office of Management and Budget’s cloud-first policy for IT deployments. Of these, ensuring data portability and interoperability within the cloud is the most daunting due to the large number of competing cloud technologies for data storage and retrieval.
Despite the many options available, a lack of mature and open computing standards for data interoperability has hindered cloud adoption. Each technology carries the risk of cloud provider lock-in that creates impediments to easily moving data into, around and out of the cloud.
Lock-in has the potential to obstruct portability and interoperability, so it has been a significant source of frustration for organizations looking to take advantage of the many proven benefits of cloud computing.
Despite the many obstacles to data portability, however, existing tools and techniques can help manage data in the cloud, keep it accessible and portable and ensure cross-provider compatibility.
Tip 1: Develop a cloud data approach
Before leaping into the cloud, it is essential to develop an enterprise plan for data portability and interoperability that encompasses both current and future states. It’s important to consider the current state because many trade-offs between system benefits and risks already have been addressed within the current infrastructure. This knowledge is extremely important because it helps determine the level of risk—including proprietary lock-in—that is acceptable to the organization.
Make sure to take into account not just the initial move to the cloud, but also the ability to move between cloud providers, share data among cloud providers and move out of the cloud. Many early cloud adopters underestimated the difficulty in switching cloud providers; it can be as disruptive as changing database engines in a legacy system. There is also a high cost associated with moving data around.
Finally, government agencies cannot make decisions about tools and solutions without first classifying data and identifying the appropriate security controls. Just like with internally hosted systems, cloud data providers offer different levels of data protection and access controls. This may limit what types of data can be moved into the cloud, as well as where and how it is stored.
Tip 2: Design applications for the cloud
Ideally, the data consumer is not affected by where and how the data is hosted. The best way to build applications is to use Web services for data access. New tools have dramatically reduced Web services development time, and API-based Web services keep context and data together as a single unit. This allows them to be easily consumed by multiple applications and moved among cloud providers. Any time a greenfield development effort or major redesign is undertaken, data Web service abstraction should be considered.
Tip 3: Use an abstraction layer in front of proprietary cloud services
Many cloud providers have developed their own API-based RESTful Web services for data storage and retrieval. Using these existing cloud data services can dramatically cut the cost and time of custom Web service development. The trade-off, however, is the risk of vendor lock-in which restricts portability. Agencies should consider putting an abstraction layer in place between an application and the cloud service that helps portability efforts by minimizing the amount of recoding that would be required should an agency wish to migrate to a new vendor.
Tip 4: Choose cloud services with multivendor adoption
Some cloud services have gained such widespread adoption that they are becoming pseudo open standards, with many vendors emulating the functionality on different cloud platforms. Using these services greatly reduces the risk of lock-in. One example is the Amazon S3 API; many cloud vendors have developed S3-compliant APIs of their own.
Tip 5: Favor portability and interoperability
The tools and solutions for different types of data differ greatly in their levels of maturity and openness. For instance, authorization and identity management solutions typically are highly mature and frequently use open standards, allowing various types of systems in diverse locations to interact freely with one another. Likewise, standards-compliant networked file systems are very mature with no proprietary lock-in. Cloud-based, large block storage services also have matured dramatically.
Data repositories, however, are still in a state of transition. Many cloud providers are offering service-based versions of traditional databases—such as Oracle and MSSQL and MYSQL, as well as popular NOSQL newcomers—alongside their proprietary services. Newer systems—such as Apache Cassandra, for instance—are maturing as data components and provide more portable and interoperable solutions and eliminate proprietary lock-in.
Leverage the cloud with minimal lock-in
For federal agencies, portability and interoperability ensure the ability to deliver data to the right location at the right time and to use it effectively. No attempt to move to the cloud should be made without first determining the requirements for enterprisewide portability and interoperability.
Define what portability means to the organization, identify necessary tools to meet those needs and establish criteria for evaluating what is appropriate for the type of data involved. By taking these steps, agencies will be better poised to move into, around and out of the cloud as designed.