When security matters, sometimes only a digital twin will do
Connecting state and local government leaders
GEMINAI creates a verifiable, privacy-protected synthetic twin of real-world datasets that can be manipulated and analyzed without disturbing or compromising the original data.
Technology that essentially creates a digital twin of sensitive data was selected as one of the top solutions in the Air Force’s AFWERX Base of the Future Challenge.
Hazardous Software won the spot using the technology from its affiliate, Diveplane, which developed the platform. Called GEMINAI, it creates a verifiable, privacy-protected synthetic twin of real-world, user-selected datasets can be manipulated and analyzed without disturbing or compromising the original data. The synthetic equivalents have the same statistical properties as the original data, but without any of its private, sensitive or classified information.
“Synthetic data generation is the whole notion of being able to create statistically equivalent, realistic yet not real datasets based upon the statistical distributions and mathematical properties of real datasets,” said Newton Grant, director of business development at Diveplane.
When a military service or civilian agency wants to share data with a potential research partner, but doesn’t want to share the actual data because of information security concerns, it can instead share the digital twin for research, development and modeling. GEMINAI can create statistically equivalent data points -- but without including any of the underlying classified information. “It can preserve the principal properties of the data points without necessarily compromising anything that might relate to a specific time of collection, as an example, or location of collection,” Grant said.
“What the Air Force could do with a product like GEMINAI is create synthetic data that they can then share with private industry to develop a particular solution or to test a particular prototype that they might be building,” Grant said.
This is particularly important for the Air Force, which often works with sensitive and classified data, and may want to work with a vendor that is not cleared to handle it. Getting that clearance can take time, slowing the time to develop new solutions. Sharing realistic synthetic data removes that barrier.
Or, if the Air Force wanted to work with researchers on COVID-19 treatment models using a database containing private medical information of airmen, it could generate twin records instead of sharing the actual medical files. The synthetic records would have the same statistical properties as the real patients’ data and be just as useful for predictive modeling, but because synthetic patient data cannot be traced back to identify an actual individual, the system complies with HIPAA and other privacy regulations.
GEMINAI can be applied even if the statistical distributions of the original data are classified along with the individual data points. In that case, users can adjust the synthetic data generation process to change the distribution for the synthetic data equivalent so that the synthetic data doesn’t share the same distribution as the original.
Working on structured-data problems is the primary application for GEMINAI, which must run in a cloud environment or on an air-gapped server. Primary users are data scientists using Python clients, but a graphical user interface is in the works for users who aren’t proficient in the programming language.
Next, the company wants to be able to automate particular manual functions and work on scaling the system, Grant said.
For the AFWERX challenge, Hazardous Solutions used a synthetic equivalent of telemetry data from the SES-10 rocket. It highlighted the mathematical principles behind the technology and showed how it can be integrated with other systems through open application programming interfaces, which can be generated in 50 languages.
The AFWERX Base of the Future initiative is made up of six concurrent challenges focused on different topics. This one homes in on “creating a culture of innovation to allow the free sharing of ideas on base and increasing the speed of technology adoption,” according to AFWERX, which was established in 2017 to spur collaboration among servicemembers, industry and researchers.
The AFWERX Fusion 2020 Showcase featured 370 teams selected from a record 1,500-plus submissions for the Base of the Future Challenge. Throughout the event, teams pitched their solutions to a panel of subject-matter experts from the relevant sectors of the Air Force. The top 92 selections were invited to further engage with the Air Force during the week of Aug. 31 with the hope of obtaining contracts.
As a result of its success in the challenge, Hazardous Software has connected with an Air Force activity that judges felt could benefit most from synthetic data generation. GEMINAI was also selected to be part of AFWERX’s EngageSpace, a challenge that looks at what’s possible in the space domain.