Tips for replicating data in the cloud
Connecting state and local government leaders
The replication of data across multi-tenant environments will be a growing concern for agencies moving big storage to the cloud. Here are ways to do it right.
In a recent request for information, the Social Security Administration asked how a private cloud might be used to provide enterprise disk storage for the agency’s entire data storage needs.
Providing storage across an enterprise as complex and multi-faceted as SSA’s would mean replicating data across nests of different offices and IT environments. SSA said, for example, that it had applications running on a host of operating systems, including Hewlett-Packard Unix, IBM mainframes, Linux, Microsoft Windows, Solaris and VMware.
In the RFI, SSA officials wanted to know how replication would work in a multi-vendor environment, how vendors would approach changing an existing replicated environment, and what the limits and capabilities of replication techniques were.
Related coverage:
Virtualization makes replication easier
Moving storage to the cloud? Don't forget about security.
SSA puts out the call for a private cloud
If the SSA’s RFI is a harbinger of things to come, then how vendors and cloud providers answer these questions will be a vital concern for agencies moving their storage to cloud infrastructures.
Replication technology duplicates stored or archived data in real time over a storage-area network. Other terms for this type of service include file replication, data replication and remote storage replication.
How agencies and their cloud providers replicate data securely and reliably depends on the type of data in play, whether structured or unstructured, as well the cloud infrastructure deployed — private, public, and community or hybrid — industry experts say.
“Replication is absolutely important for multiple people looking at the same data at the same time, depending on where they are,” said Steve Sicola, chief technology officer of XIO, a provider of performance-driven storage. “And it is very important for disaster recovery.”
5 tips for replication
Sicola offered agencies five tips to keep in mind as they consider replication in a private cloud. It starts with good storage.
- First, if you are deploying a private cloud, you have to start off with good storage that adheres to the four tenets of storage — availability, capacity, performance and reliability, Sicola said. “That is the basis for everything because once you start replicating, you are cutting stuff in two,” he added.
- Second, decide where you are going to put replication: between the storage boxes or between the servers. Those decisions are based on the type of storage or server operating systems you have. If you have a big mix, one size might not fit all, Sicola said.
- Third, decide how much data has to be online in order to be replicated. This will determine whether you use synchronous or asynchronous replication. Synchronous replication is online but can cost a lot of money if you don’t have the wiring and cable infrastructure between systems in locations a distance apart. “You have to have wire at the same speed as if they are next to one another,” Sicola said.
- Fourth, decide which data you are replicating and why. Are you doing it just to have another copy to play with in a multi-tenant environment, or are you doing it for disaster recovery? If you are doing it for both reasons, then you’ll have to invest for two different scenarios.
- Fifth, rate how homogeneous your environment is. If the infrastructure is all homogeneous — for example, Intel servers running the Microsoft Windows operating system — life becomes easier. But if you have to mix or match or keep older, legacy systems, you will have to make as much of the environment homogeneous as you can to save money and figure out how to keep the older machines.
“You’ll have to learn how to deal with exceptions as they come, and in the cloud exceptions cost money,” Sicola said.
Focus on apps and related links
To prepare for replication, agencies must do some fundamental things from a disaster recovery and continuity-of-operations perspective, regardless of whether the agency is moving to the cloud, said Dale Wickizer, CTO of NetApp’s U.S. Public Sector.
Agencies must focus not only on applications that are going to be replicated but also apps that are linked to those applications to perform a particular business or agency mission function. “You have to pay attention to application interfaces,” Wickizer said.
Then, if an agency does business with other agencies and has joint workflows that deliver business functions, managers need to make sure those interagency functions are in place as well.
“If I’ve created my own disaster recovery site or use a cloud provider, I have to make sure the interfaces to other agencies work as well.” Then if the agency has to cut over in a disaster, other agencies still have access to the data, Wickizer said.
Testing the environment is also critical, something that some organizations tend to avoid, Wickizer said. Agencies that take it seriously test their environments at least once a quarter by switching operations and running them at their disaster recovery sites to make sure their procedures are crisp, he added.
Type of data replicated?
Susie Adams, CTO of Microsoft’s federal civilian business, said having a firm grasps on the type of data you want to replicate — whether it is structured or unstructured or within a relational database — is key.
Microsoft’s Azure cloud platform offers several storage options, including Binary Large Object storage for large volumes of unstructured data, Table Storage for large volumes of data that need additional structure, Queues for persistent messaging between applications, and the Windows Azure Drive that allows users to move virtual hard disks from private to public clouds.
There is not a single, one-size-fits-all model for storage, Adams agreed. Users have to consider whether the data is a binary large object, video, audio, image or relational data and fit it to the particular cloud model, she said.
If they’re dealing with transactional data, does it need to be updated on a regular basis in real time? Which application programming interfaces are available for them to access data? These are all questions that need to be considered.
Finally, users need to evaluate different types of clouds — private, public, shared tenancy — and break up their data workload stores into different classifications and choose the appropriate provider. If the cloud provider offers a multi-tenant environment to anyone, for example, and the agency manages sensitive data, there wouldn’t be a comfortable fit with that model; the agency in that case would want a more dedicated cloud.
But there are other government users who might want to just make public data available instead of paying a lot for the storage, Adams said.