With Open Data, Delicate Questions Arise Over When to Release What
Connecting state and local government leaders
In Philadelphia, the process for publishing municipal data can be a balancing act.
Stored in a mainframe database in Philadelphia is a vast number of records about financial payments the city has made.
City staffers tried diving into that data about a year-and-a-half ago. Their goal, according to Philadelphia’s chief data officer, Tim Wisniewski, was to see if the information in the database could be published on the city’s open data portal. But some of what they found in a column of data that contained descriptions of each transaction was quite sensitive. Information like Social Security and credit card numbers and names of foster care recipients.
“Those are only the things that we happened to stumble upon,” Wisniewski said during an interview on Tuesday. “The problem is that’s a decentralized system, so no one person actually knows all the stuff that’s in there and what to redact even if we could do it automatically.”
Wisniewski and the other city staff considered removing the transaction description column. But then they noticed that one of the categories in the database documented fees paid for legal settlements. “We thought, wow, what if those are the settlements that we agree not to disclose the amounts for,” he said. “Are we then violating our settlement agreement?”
“These are the types of questions that make a dataset like that hugely complicated,” Wisniewski said. “And god forbid we put something like that out there and the city gets sued.”
Philadelphia is considered a leader when it comes to local government open data. The Open Knowledge Foundation ranks the city fifth out of 98 U.S. cities in its Open Data Census. To date, city departments there have published 171 datasets. These cover a broad range of areas, such as the city’s operating budget, crime, contracts and property assessments.
But there are thousands of other datasets from more than 50 departments across the city that remain unpublished.
Deciding which ones to release sooner, rather than later, is an ongoing and complex challenge. It’s a balancing act the city is trying to manage by factoring in both the public value certain data can provide, and the cost, time and departmental support required to get it online.
To work through decisions about what to release, Philadelphia rolled out a new process last fall. This involved creating a comprehensive inventory of all the datasets each city department produces. The city is also looking to gauge demand for various types of data. One way it is doing this is through an Open Data Advisory Group. The group includes academics, technologists, transparency advocates and others from outside city government.
Wisniewski has played a key role in advancing these efforts.
“I think he’s been a leader in terms of trying to be thoughtful about how the city understands what data it holds, and prioritizing the release of data that is of interest to people,” said Emily Shaw, national policy manager at the Sunlight Foundation.
She added: “He’s really pushed forward a model we like.”
Open data is seen as valuable for a wide range of reasons. For instance, putting city budget figures online in an easy-to-digest format can offer residents a window into how their local government is using tax dollars. Journalists and watchdog groups can use readily available datasets to glean information about how agencies are performing. And, across the country, developers are using municipal data to support mobile applications meant to improve the way government functions and services get delivered. These apps provide information and insights about things ranging from blighted properties to bus arrival times.
But, as the financial payment database example illustrates, there are a number of roadblocks a city can encounter when trying to publish a dataset online.
In addition to the difficulties related to redacting private or sensitive information, Philadelphia has hit other technical hurdles as well. For instance, some pending releases in the city have been bogged down by the steps required to get data off of the city mainframe, and into a format where it can be shared with the public and updated automatically.
On a webpage featuring Philadelphia’s own data census, which looks at datasets that have and have not been published, there’s a scatterplot chart. On the vertical axis is the demand and impact of a given dataset, on the horizontal axis is the cost and complexity of getting it online. Much of the low demand, low cost data is available. But the top right corner of the chart is peppered with about 30 red dots showing high demand, high cost data that is unreleased.
(via phila.gov)
“There used to be more dots in that quadrant,” Wisniewski noted. He also pointed out that some of the datasets that are classified as unreleased on the chart involve records that exist in paper format, or information that is available online, but not through the open data portal.
“We’ll never get to all of them,” he said in an email, referring to the unavailable datasets. “So let’s make sure the ones we spend time on are fulfilling actual public interest, rather than doing open data for open data’s sake.”
The city’s current push to open more data can be traced back to an executive order that Mayor Michael Nutter enacted in 2012.
There have been some bumps along the way.
The city’s first chief data officer, Mark Headd, stepped down last April after a disagreement over granting wider access to a property tax balance dataset. Following his departure, he knocked some aspects of the city’s approach to its open data initiative. He even critiqued the city’s open data policy online earlier this week.
That said, Headd acknowledges that Philadelphia is making strides. “I’m not criticizing what they’re doing, I think they’re a clear leader in this area,” he said during an interview on Wednesday. He added: “I love what the city of Philadelphia is doing with the advisory board.”
From his perspective, Philadelphia has had two blind spots when it comes to prioritizing which datasets to release.
One, he said, is that the city doesn’t give enough weight to the number of public records requests for particular datasets. The other is that it does not consider which data gets most frequently “scraped.” Web scrapers are small pieces of software that can be used to extract data that is available on a website but not in an open format, meaning that a person can’t download all of it, in bulk, at one time and easily analyze it, or compare it to other data.
Web scrapers can slow down, or even crash websites. Headd said a city site in Philadelphia was overwhelmed by a scraper during his tenure.
Asked about his predecessor's concerns, Wisniewski said the city has analyzed public records requests, called “right-to-know” requests in Pennsylvania, and is taking these into consideration. “They are an indicator, but they’re skewed towards journalists and attorneys, nonetheless that’s an indicator, that’s a starting point,” he said. “We’re expanding that to include more communities.” He also reemphasized the importance of the dataset inventory, saying: “The problem with right-to-know is people don’t know what data exists.”
As for scraping, he said the property data that has been most frequently targeted, will be made available for bulk download.
“We prioritized that because there’s public demand around it,” he said.
Getting a better grasp of where public interest lies has also been useful for Wisniewski as he has had discussions with city departments about opening up more of their data.
“It’s been helpful for me, at least internally,” he said. This, he explained, is because he can illustrate to departments that they’re not being asked to drop everything else they’re doing and focus an outsized amount of attention on releasing huge amounts of data. But rather, they are working toward making specific high-demand datasets more readily available.
In the view of Wisniewski’s predecessor, Headd, there’s another issue that can sometimes come into play during discussions between data officers and city departments.
“When people have data to evaluate the performance of government they can start to ask tough questions,” Headd said. “A shrinking minority in government are not necessarily comfortable with that dynamic.”
Wisniewski, however, pointed to data related to performance, such as 311 service requests, that Philadelphia has either released, or is working toward publishing online. He also recognized that there are limits around what data might eventually end up on the city's portal.
“There’s always going to be datasets that you do have to put in a right-to-know request for,” Wisniewski said. “There’s some datasets that just don’t make sense to publish on the Web.”
Going forward, he would like to see open data integrated more deeply into the way the city does business.
A good example of this, in his view, is Philadelphia’s bikeshare program, which launched in late April. From the program’s start it incorporated a live feed of JSON data, which includes information about docking station locations and available bikes. Conversations about how to make the bikeshare data publicly accessible, and shareable across departments, began during the request for proposals process for the program, according to Wisniewski.
“What we’re aiming for,” he said, “is that when you build a new program, or initiative, or department, that from the beginning you’re talking about how can the data be shared, at least internally, how can we capitalize on the data we’re producing here to maximize government efficiency, and also how are we going to share this with the public.”
NEXT STORY: New app delivers situational awareness to protect travelers