NYC's real-time cyber defense platform
Connecting state and local government leaders
New York City's Cyber Command built an open-source, cloud-based data pipeline -- a security log aggregation platform that analysts use to quickly detect and mitigate cyber threats.
Two years after New York City Mayor Bill de Blasio created the NYC Cyber Command to lead the Big Apple’s cybersecurity defense efforts, the team built an open-source, cloud-based data pipeline to serve as a security log aggregation platform that analysts could use to quickly detect and mitigate threats to city networks and systems.
In accordance with its cloud-first strategy, NYC Cyber Command built the pipeline on Google Cloud Platform (GCP) and Google products such as Cloud Pub/Sub, a scalable data analytics product that facilitates data ingestion. Security events are published to Cloud Pub/Sub and then pull subscriptions make the data available to log parsers and other services via Google’s Cloud Dataflow, a fully managed service for stream and batch processing that puts the data in formats security analysts can use.
“We have data coming from external vendors, and all this data is ingested through Pub/Sub, and Pub/Sub pushes it through to Dataflow, which can parse or enrich the data,” said Noam Dorogoyer, a data engineer and IT project specialist at the command. “The way the data comes in can be simple such as comma-separated. Other times it’s a mess. There is not a common format among the vendors.”
The command uses logic in Dataflow to move the data into BigQuery, Google’s serverless cloud data warehouse, which puts it into a tabular format that’s easy for analysts to access.
All of the data is captured in real time. “Real time is king, and that’s the only data valuable to us,” Dorogoyer said. “If data comes in late, especially when it comes to cybersecurity, it’s no longer valuable, especially during an emergency. So, from a data engineering standpoint, the way we constructed the pipeline is to minimize latency at every single step. If it’s maybe a Dataflow job, we designed it so that as many elements as possible are happening in parallel so at no point is there a step that’s waiting for a previous one.”
Security analysts can access the data in several ways, said Anthony Bocekci, Computer Emergency Response Team specialist. They can run queries in BigQuery or use other tools that will provide visualizations of the data, such as Data Studio, a reporting solution.
“It gives us a very robust amount of options to deal with this cleaner data, which still has retained context, which is the key thing,” Bocekci added. “When it comes to incident response, you are oftentimes reacting to ongoing activities, so having the data available live in front of you -- again, parsed with the context still there -- it allows you to react appropriately…. It allows you to focus on particular facets of the incident that you may not have the ability to do if the logs were provided to you in a slower format.”
The amount of data flowing through the command varies each day. On weekdays during peak times, it could be 5 or 6 terabytes, Dorogoyer said. On weekends, that can drop to 2 to 3 terabytes. As NYC Cyber Command increases visibility across agencies, it will deal with petabytes of data.
Pipeline security is of obvious importance. To manage who is taking what actions on which data, NYC Cyber Command uses Google’s Cloud Identity and Access Management. To let engineers access GCP resources from untrusted networks without using a virtual-private network, it uses Cloud Identity-Aware Proxy, a foundational element of the company’s BeyondCorp, a zero-trust enterprise security model on which the pipeline is built.
“We wouldn’t be much of a cybersecurity firm if we weren’t careful with who had what permissions,” Dorogoyer said. “We want everybody to be able to do what they have to do for their job, but they don’t really need more than what they need…. There isn’t any account that would just be able to destroy the entire project and wipe it.”
Before de Blasio established NYC Cyber Command by executive order on June 11, 2017, the city’s Department of Information Technology and Telecommunications and individual agencies were responsible for cybersecurity. The command features a year-round security operations center and works with the city’s 100-plus agencies to prevent, detect, respond to and recover from threats.
“Coming together and having one umbrella approach to cybersecurity gives agencies a very different and new sense of comfort, where they have one group they can look to and one group they can rely on and work with,” Bocekci said.
Editor's note: This article was changed Aug. 6 to correct Anthony Bocekci's title.