Cloud performance: How to measure the things that matter
Connecting state and local government leaders
Agency leaders need systems in place to assess performance across multiple vendors and environments and over the course of a dynamic transformation.
We know that cloud-native services offer agencies the promise of increased agility, efficiency and innovation. But when navigating a multifaceted, multiyear modernization, one of the common challenges for IT leaders is determining the effectiveness of their cloud operations.
At any given moment, agencies should be able to gauge cloud performance within individual environments and in aggregate, for tactical decision-making and long-term strategy. With a multitude of metrics and KPIs in the cloud, it’s easy to get caught in the weeds. Here are a few ways that senior leaders can ensure oversight into the measurements that matter.
Focus on key pillars of success
The insights and indicators enterprise leaders require to make their decisions can look entirely different from what engineering teams are interested in tracking on a day-to-day basis.
For a meaningful view of cloud operations, senior dashboards should focus on a few key areas:
- Environment security and health. The most critical area for leaders to monitor closely is security across the enterprise. In the move to cloud, agencies are outsourcing critical aspects of IT to external providers, so it’s essential to aggregate activity across cloud service providers to understand what’s happening and pinpoint where there could be gaps in compliance or security operations. This level of visibility helps maintain trust in the network, assign responsibility and address threats in real time -- particularly for zero-day vulnerabilities.
- Cost optimization. While moving to cloud can result in fewer physical servers and racks, a one-to-one migration falls short on delivering the advantages of the cloud. The real indicator of success is when agencies are able to decommission hardware, instead of recapitalizing it; when they are able to shut down a data center rather than continue paying for it; or when a virtual server goes away, and an application is rearchitected for a serverless and consumption-based model. Those instances are where cost savings truly starts to become a meaningful performance indicator for senior leadership.
- Enterprise resilience. For mission-critical systems, a service or application can’t risk going down or experience disruption with changes in utilization. For some agencies, traffic patterns may surge at various points and capacity needs to scale exponentially throughout. Leadership should be able to track site reliability metrics such as mean time to recovery (MTTR), recovery time objective (RTO) and lead time delivery -- and know how quickly systems scale resources up or down based on the amount of data being processed.
- Cloud-native development and utilization. Once an agency is established in the cloud with multiple applications migrated and tenants onboarded from across the enterprise, it’s essential to be able to answer the questions: Are we taking advantage of the cloud-native tools to innovate in ways that weren’t possible on premises? How quickly are we bringing new services to market? The native services offered by cloud providers open the door to new functionality, from natural language processing to advanced data analytics and offer longer term benefits such as reductions in maintenance costs. As those new capabilities are brought to bear, leaders can start to assess the rate of cloud-native adoption across environments.
Bring it all together
When evaluating cloud performance, there’s often an emphasis placed on how many applications have been migrated. But cloud adoption is a marathon, not a sprint. While it can be tempting to measure success in terms of short-term activities (e.g., we got five apps into production this year), it’s more meaningful to focus on multiyear victories that set up an agency for improved resilience, more secure and effective operations and speed to delivery.
Ultimately, measuring enterprise success in the cloud requires thinking differently about what matters most for decision-making. Leaders need systems in place to assess performance across multiple vendors and environments and over the course of a dynamic transformation.
The major cloud providers have their own way of tracking and presenting operating data, but also offer APIs and other built-in tools for achieving aggregated visibility -- in security, performance and other areas. As part of an agency's long-term transformation, data can be standardized and aggregated across environments. While this strategic line of sight may not have been possible on premises, we can’t use data center operations to inform cloud strategy. We must now change the model and bring things together so that leaders have a central way of understanding performance as they balance short-term decisions with future-state goals.
Osama Malik is a vice president at Booz Allen and leads digital transformation initiatives for civil government.
Andy Rheuban is a chief technologist at Booz Allen, specializing in cloud migration, application portfolio rationalization, and cost optimization.