Defending IT infrastructure with analytics
Connecting state and local government leaders
The multicloud CyLab environment will develop, test and deploy AI-based analytic tools and software to counter existing and emerging threats.
To make it easier for next-generation threat hunters to analyze cybersecurity data across cloud environments, the Department of Homeland Security’s Cybersecurity and Infrastructure Security Agency and the DHS Science and Technology Directorate are developing an environment where new analytic tools and software can be researched and tested to counter existing and emerging threats.
CyLab will be a logical data warehouse to support improving CISA analytics and architecture by leveraging different cloud vendors and testing analytic solutions from development to production, according to CISA’s Associate Chief of Strategic Technology Gary Jones. Speaking in a July 26 webcast, he described how machine learning and threat hunting capabilities are being developed for use by DHS staff and contractors that can help defend not only federal systems and networks, but the nation's critical infrastructure.
CyLab’s data, though, is the basic ingredient for all the analysis, said Preston Werntz, the assistant chief data officer in the CTO's office in CISA. “We're really trying to make sure that data we've got is going to be in the best shape possible that we can move it into a CyLab and use it for these more advanced purposes,” he said. That entails bringing together what’s considered big data, cyber data, structured data and wide, or siloed, data that resides in smaller, perhaps unstructured data sets.
“All those different datas, even at the unclassified level, have certain sensitivities, maybe privacy sensitive, maybe critical infrastructure sensitive. So that governance and stewardship is so important,” he said.
CyLab is working to map all the data to different concepts and classes and increase the amount of captured metadata so the team can determine what data is appropriate for which ML model and help minimize the algorithms’ drift. It’s also important to keep on top of changes to the data, Werntz said.
The two things the team is focused on, he said, are getting CISA’s data ready to be used in CyLab and then putting policies in place to ensure that the machine learning models built on that data get shared to stakeholders, industry or critical infrastructure operators in machine-readable formats.
Alexandria Phounsavath, director of S&T’s Data Analytics Technology Center, outlined CyLab’s three-part research plan.
The first part concerns the ecosystem, the multicloud environment where various cloud providers’ capabilities can be reviewed. The CyLab team will consider how to move data and run computations across clouds and solve information-sharing and privacy issues so researchers can easily collaborate. The environment will also feature high-performance computing resources necessary for training artificial intelligence algorithms.
The second part of the research plan, she said, addresses the AI/ML tools for the environment, as well as the data wrangling, the model building, the natural language processing tools.
The final area is what Phounsavath called a “stretch goal.” It involves bringing academic researchers into the collaborative, problem-solving space. “So, where is this space? What data sets go in there? What do you do with folks you who may not be fully cleared?” she asked. In the event of another Colonial Pipeline incident where there’s a flurry of initial activity, CyLab wants to be able to sustain and maintain not just that energy but the whole environment, she said.
CyLab is expected to become operational in 2024, but additional capabilities will be added, according to Jones. The environment will probably start with basic machine learning capabilities along with DevOps-type development, he said.
“CyLab isn't one and done. It's going to be an enduring capability for systems missions to benefit from innovation,” Phounsavath said. “In the field of analytics, the players, the landscape of … products changes in months, not years. We're going to be creating an environment where, although the threats are changing and evolving, so will the capabilities that CISA has to address them.”
NEXT STORY: COVID cloud data fuels virus studies