Census preps open data site for 2020 information
Connecting state and local government leaders
Census is working to make data.census.gov more interactive and give customers better access to data.
As the Census Bureau prepares for the 2020 population count, it's also readying its data.census.gov platform to make accessing the collected census data easier.
The impetus for the new site came out of making data available to the public as soon as possible. The consensus among data analysts is that they spend 80 percent of their time collecting data. Census wants to help ease that pain point by putting interactive datasets online.
The data.census.gov site is the preview environment for a single census data dissemination platform. The data on the site is currently test data that is used for evaluating the platform.
“We are taking an enterprise-level approach to consolidate our submission services,” Census Bureau Chief Data Officer Zach Whitman said. “We are hoping to centralize the experience with development and processes so we can have a more [consolidated] data model that we can share more broadly with our end consumer and do it in a way that repeatable and standardized.”
Census is taking a “ground-up approach” to developing data.census.gov and adding new features into the system through two-week sprint cycles. The goal is to add public releases to the site every 40 days, Whitman said at a June 13 Amazon Web Services Public Sector Summit panel.
The goal is to build an interface for the new platform that is similar to Census' existing American FactFinder site, Whitman said, but with capabilities that are far more extensive. “Everything that we are doing in terms of the user interface" will also be available through the API itself, he added.
Through the site’s application programming interface, users will be able to download and manipulate the data to serve their own purposes; ensuring that the API can drive all of data.census.gov's core functions means outside users will have more power as well. “The more that we make this API capable, then we can serve our customers better by providing them with ways to extend the API in their own platforms for their customer base,” Whitman said.
Whitman’s larger presentation was focused on the American Community Survey, which is the smaller annual survey performed by Census that collects information about the American people and housing units. Census made the ACS Public Use Microdata Sample (PUMS) available in as an AWS Public Dataset near the end of April 2017. By putting the data in the cloud, users could access and analyze it without downloading and storing their own copy of the data. Census also worked with data.world -- the self-described "social network for data people" -- to make the information easier to use.
“We took on the project with data.world because they have a National Science Foundation grant to take a look at how they could derive a semantic approximation for the current tabulation,” Whitman said. “In this case, we looked at PUMS data, which is the holy grail of social science research, so it is a good place to start.”
The goal of the ACS project was to add more depth to demographic data to provide “far better information and granularity,” he said.
“If you are an end user, then you currently need to learn a lot about Census before you can end up using the data,” Whitman said. “We have about a trillion [estimates] that we are trying to disseminate, but it doesn’t include the interoperability on top of that.”
Data.census.gov was first released to the public in September 2016, and integrations are planned on the site through June 2018. More information can be found here.
Editor's note: This article was changed June 19 to clarify statements made by Zach Whitman.
NEXT STORY: U.S. funds exascale computing research