Risk management through resilient technology
Connecting state and local government leaders
Resilient technology creates a more stable IT environment that saves money, increases cybersecurity and requires fewer resources to support it, a new report says.
Resilient technology is necessary to all organizations, but for the public sector, it’s especially important to building trust, disaster recovery, innovation and increased capacity, an expert says. There are also unique challenges to achieving it.
“Resilient technology is critical in maintaining uninterrupted services for customers and servicing them during peak times,” Daniel Wallance, an associate partner at McKinsey and Co., and two co-authors wrote in “A technology survival guide for resilience,” which McKinsey published March 24. “This requires a resilient infrastructure with heightened visibility and transparency across the technology stack to keep an organization functioning in the event of a cyberattack, data corruption, catastrophic system failure, or other types of incidents.”
The organizational structure of state and local governments complicates the ability of achieving resilience, however, Wallance said. To overcome that, the first step is to understand who owns the technology—a central agency or individual functions—where funding comes from and who the decision-makers are.
Then, agencies can look at what applications run on the technology in question. Do they affect critical systems or a lunch menu, for instance? “I’m less concerned about the latter,” Wallance said. “If it’s out for a day, that’s not as big a deal as if it contains massive amounts of critical data that I need to operate.”
Signs that a government organization may need to increase resilience are an increasing number of data corruption events or outages. “That would be an indicator that there’s a … larger issue,” he said.
Another challenge is overcoming a culture of stagnation—the idea that if it ain’t broke, don’t fix it. That is changing, though, Wallance said. “With the advent of digital or new technologies, new ways of working, remote access, [more agencies are] realizing, ‘Hey, these services that I want to provide my constituents, my customer base, I can’t provide as effectively or efficiently with the technology stack that I have in place today,’” he said.
The report offers three levers to build technology resilience. The first is about prioritizing services based on their criticality. Second, agencies should assess their current level of resilience and how they performed in past crises. It provides four maturity levels against which agencies can measure themselves.
The first is ad hoc resilience, meaning resilience is up to individual users and system owners, and monitoring involves users’ reporting of problems. The second is passive resilience, or resilience through manual backups, duplicate systems and data replication. Third is active resilience through failover, which is resilience through active synchronization of systems, applications and databases as well as active monitoring at the application level. The most mature level is called inherent resilience by design because resilience is architected into the technology stack from the get-go, and the technology is being actively monitored.
“It’s then possible to identify and ultimately remediate common factors that led to these incidents, which may include the technology environment itself, the architecture of applications, interfaces between systems and third parties, and the way resilience was built into individual applications and systems,” the report states.
The third lever of building resilience requires agencies to remediate gaps using a cross-functional approach, increasing resilience of individual applications and groups, strengthening on-premises or cloud-based hosts and implementing regular resilience testing.
A benefit of resilient technology is cost savings and increased cybersecurity. By replacing legacy and end-of-life systems with newer ones, the IT environment becomes more stable and requires fewer resources to support it.
Cloud provides a way to take advantage of new technology and increase resilience, Wallance said, adding that applications should not be lifted and shifted “because then a legacy application and assets are in a contemporary environment, [which is] not going to help me as much.”
Another benefit of resilience is workforce improvement. That’s because people who are knowledgeable about these new, resilient technologies must be upskilled or hired.
Agencies must create a culture of resilience among employees. The report has three tips for doing that:
- Institute a blame-free culture in which teams and managers focus on problem-solving, not pointing fingers.
- Use metrics to monitor performance and focus on repeat incidents that have the same root cause.
- Do simulations so that teams can “rehearse the outage … and iteratively build up and train to respond.”
The notion of resilient technology is taking root, albeit at varying levels nationwide, Wallance said, based on maturity level and the ability to attract or upskill talent to drive resiliency. But it’s crucial because the longer agencies wait to incorporate resilience, the harder it will be to address cyber incidents and outages and the tougher to incorporate into disparate systems.
“There’s also the security,” he said. “Even if I am able to provide digital and contemporary services to my constituents, if it’s not secure, they're not going to trust and use those systems.”
Stephanie Kanowitz is a freelance writer based in northern Virginia.