With Open Data, Delicate Questions Arise Over When to Release What

Samuel Borges Photography / Shutterstock.com

 

Connecting state and local government leaders

In Philadelphia, the process for publishing municipal data can be a balancing act.

Stored in a mainframe database in Philadelphia is a vast number of records about financial payments the city has made.

City staffers tried diving into that data about a year-and-a-half ago. Their goal, according to Philadelphia’s chief data officer, Tim Wisniewski, was to see if the information in the database could be published on the city’s open data portal. But some of what they found in a column of data that contained descriptions of each transaction was quite sensitive. Information like Social Security and credit card numbers and names of foster care recipients.

“Those are only the things that we happened to stumble upon,” Wisniewski said during an interview on Tuesday. “The problem is that’s a decentralized system, so no one person actually knows all the stuff that’s in there and what to redact even if we could do it automatically.”

Wisniewski and the other city staff considered removing the transaction description column. But then they noticed that one of the categories in the database documented fees paid for legal settlements. “We thought, wow, what if those are the settlements that we agree not to disclose the amounts for,” he said. “Are we then violating our settlement agreement?”

“These are the types of questions that make a dataset like that hugely complicated,” Wisniewski said. “And god forbid we put something like that out there and the city gets sued.”

Philadelphia is considered a leader when it comes to local government open data. The Open Knowledge Foundation ranks the city fifth out of 98 U.S. cities in its Open Data Census. To date, city departments there have published 171 datasets. These cover a broad range of areas, such as the city’s operating budget, crime, contracts and property assessments.

But there are thousands of other datasets from more than 50 departments across the city that remain unpublished.

Deciding which ones to release sooner, rather than later, is an ongoing and complex challenge. It’s a balancing act the city is trying to manage by factoring in both the public value certain data can provide, and the cost, time and departmental support required to get it online.

To work through decisions about what to release, Philadelphia rolled out a new process last fall. This involved creating a comprehensive inventory of all the datasets each city department produces. The city is also looking to gauge demand for various types of data. One way it is doing this is through an Open Data Advisory Group. The group includes academics, technologists, transparency advocates and others from outside city government.

Wisniewski has played a key role in advancing these efforts.

“I think he’s been a leader in terms of trying to be thoughtful about how the city understands what data it holds, and prioritizing the release of data that is of interest to people,” said Emily Shaw, national policy manager at the Sunlight Foundation.

She added: “He’s really pushed forward a model we like.”

Open data is seen as valuable for a wide range of reasons. For instance, putting city budget figures online in an easy-to-digest format can offer residents a window into how their local government is using tax dollars. Journalists and watchdog groups can use readily available datasets to glean information about how agencies are performing. And, across the country, developers are using municipal data to support mobile applications meant to improve the way government functions and services get delivered. These apps provide information and insights about things ranging from blighted properties to bus arrival times.

But, as the financial payment database example illustrates, there are a number of roadblocks a city can encounter when trying to publish a dataset online.

In addition to the difficulties related to redacting private or sensitive information, Philadelphia has hit other technical hurdles as well. For instance, some pending releases in the city have been bogged down by the steps required to get data off of the city mainframe, and into a format where it can be shared with the public and updated automatically.

On a webpage featuring Philadelphia’s own data census, which looks at datasets that have and have not been published, there’s a scatterplot chart. On the vertical axis is the demand and impact of a given dataset, on the horizontal axis is the cost and complexity of getting it online. Much of the low demand, low cost data is available. But the top right corner of the chart is peppered with about 30 red dots showing high demand, high cost data that is unreleased.

(via phila.gov)

“There used to be more dots in that quadrant,” Wisniewski noted. He also pointed out that some of the datasets that are classified as unreleased on the chart involve records that exist in paper format, or information that is available online, but not through the open data portal.

“We’ll never get to all of them,” he said in an email, referring to the unavailable datasets. “So let’s make sure the ones we spend time on are fulfilling actual public interest, rather than doing open data for open data’s sake.”

The city’s current push to open more data can be traced back to an executive order that Mayor Michael Nutter enacted in 2012.

There have been some bumps along the way.

The city’s first chief data officer, Mark Headd, stepped down last April after a disagreement over granting wider access to a property tax balance dataset. Following his departure, he knocked some aspects of the city’s approach to its open data initiative. He even critiqued the city’s open data policy online earlier this week.

That said, Headd acknowledges that Philadelphia is making strides. “I’m not criticizing what they’re doing, I think they’re a clear leader in this area,” he said during an interview on Wednesday. He added: “I love what the city of Philadelphia is doing with the advisory board.”

From his perspective, Philadelphia has had two blind spots when it comes to prioritizing which datasets to release.

One, he said, is that the city doesn’t give enough weight to the number of public records requests for particular datasets. The other is that it does not consider which data gets most frequently “scraped.” Web scrapers are small pieces of software that can be used to extract data that is available on a website but not in an open format, meaning that a person can’t download all of it, in bulk, at one time and easily analyze it, or compare it to other data.

Web scrapers can slow down, or even crash websites. Headd said a city site in Philadelphia was overwhelmed by a scraper during his tenure.

Asked about his predecessor's concerns, Wisniewski said the city has analyzed public records requests, called “right-to-know” requests in Pennsylvania, and is taking these into consideration. “They are an indicator, but they’re skewed towards journalists and attorneys, nonetheless that’s an indicator, that’s a starting point,” he said. “We’re expanding that to include more communities.” He also reemphasized the importance of the dataset inventory, saying: “The problem with right-to-know is people don’t know what data exists.”

As for scraping, he said the property data that has been most frequently targeted, will be made available for bulk download.

“We prioritized that because there’s public demand around it,” he said.

Getting a better grasp of where public interest lies has also been useful for Wisniewski as he has had discussions with city departments about opening up more of their data.

“It’s been helpful for me, at least internally,” he said. This, he explained, is because he can illustrate to departments that they’re not being asked to drop everything else they’re doing and focus an outsized amount of attention on releasing huge amounts of data. But rather, they are working toward making specific high-demand datasets more readily available.

In the view of Wisniewski’s predecessor, Headd, there’s another issue that can sometimes come into play during discussions between data officers and city departments.

“When people have data to evaluate the performance of government they can start to ask tough questions,” Headd said. “A shrinking minority in government are not necessarily comfortable with that dynamic.”

Wisniewski, however, pointed to data related to performance, such as 311 service requests, that Philadelphia has either released, or is working toward publishing online. He also recognized that there are limits around what data might eventually end up on the city's portal.

“There’s always going to be datasets that you do have to put in a right-to-know request for,” Wisniewski said. “There’s some datasets that just don’t make sense to publish on the Web.”

Going forward, he would like to see open data integrated more deeply into the way the city does business.

A good example of this, in his view, is Philadelphia’s bikeshare program, which launched in late April. From the program’s start it incorporated a live feed of JSON data, which includes information about docking station locations and available bikes. Conversations about how to make the bikeshare data publicly accessible, and shareable across departments, began during the request for proposals process for the program, according to Wisniewski.

“What we’re aiming for,” he said, “is that when you build a new program, or initiative, or department, that from the beginning you’re talking about how can the data be shared, at least internally, how can we capitalize on the data we’re producing here to maximize government efficiency, and also how are we going to share this with the public.”

X
This website uses cookies to enhance user experience and to analyze performance and traffic on our website. We also share information about your use of our site with our social media, advertising and analytics partners. Learn More / Do Not Sell My Personal Information
Accept Cookies
X
Cookie Preferences Cookie List

Do Not Sell My Personal Information

When you visit our website, we store cookies on your browser to collect information. The information collected might relate to you, your preferences or your device, and is mostly used to make the site work as you expect it to and to provide a more personalized web experience. However, you can choose not to allow certain types of cookies, which may impact your experience of the site and the services we are able to offer. Click on the different category headings to find out more and change our default settings according to your preference. You cannot opt-out of our First Party Strictly Necessary Cookies as they are deployed in order to ensure the proper functioning of our website (such as prompting the cookie banner and remembering your settings, to log into your account, to redirect you when you log out, etc.). For more information about the First and Third Party Cookies used please follow this link.

Allow All Cookies

Manage Consent Preferences

Strictly Necessary Cookies - Always Active

We do not allow you to opt-out of our certain cookies, as they are necessary to ensure the proper functioning of our website (such as prompting our cookie banner and remembering your privacy choices) and/or to monitor site performance. These cookies are not used in a way that constitutes a “sale” of your data under the CCPA. You can set your browser to block or alert you about these cookies, but some parts of the site will not work as intended if you do so. You can usually find these settings in the Options or Preferences menu of your browser. Visit www.allaboutcookies.org to learn more.

Sale of Personal Data, Targeting & Social Media Cookies

Under the California Consumer Privacy Act, you have the right to opt-out of the sale of your personal information to third parties. These cookies collect information for analytics and to personalize your experience with targeted ads. You may exercise your right to opt out of the sale of personal information by using this toggle switch. If you opt out we will not be able to offer you personalised ads and will not hand over your personal information to any third parties. Additionally, you may contact our legal department for further clarification about your rights as a California consumer by using this Exercise My Rights link

If you have enabled privacy controls on your browser (such as a plugin), we have to take that as a valid request to opt-out. Therefore we would not be able to track your activity through the web. This may affect our ability to personalize ads according to your preferences.

Targeting cookies may be set through our site by our advertising partners. They may be used by those companies to build a profile of your interests and show you relevant adverts on other sites. They do not store directly personal information, but are based on uniquely identifying your browser and internet device. If you do not allow these cookies, you will experience less targeted advertising.

Social media cookies are set by a range of social media services that we have added to the site to enable you to share our content with your friends and networks. They are capable of tracking your browser across other sites and building up a profile of your interests. This may impact the content and messages you see on other websites you visit. If you do not allow these cookies you may not be able to use or see these sharing tools.

If you want to opt out of all of our lead reports and lists, please submit a privacy request at our Do Not Sell page.

Save Settings
Cookie Preferences Cookie List

Cookie List

A cookie is a small piece of data (text file) that a website – when visited by a user – asks your browser to store on your device in order to remember information about you, such as your language preference or login information. Those cookies are set by us and called first-party cookies. We also use third-party cookies – which are cookies from a domain different than the domain of the website you are visiting – for our advertising and marketing efforts. More specifically, we use cookies and other tracking technologies for the following purposes:

Strictly Necessary Cookies

We do not allow you to opt-out of our certain cookies, as they are necessary to ensure the proper functioning of our website (such as prompting our cookie banner and remembering your privacy choices) and/or to monitor site performance. These cookies are not used in a way that constitutes a “sale” of your data under the CCPA. You can set your browser to block or alert you about these cookies, but some parts of the site will not work as intended if you do so. You can usually find these settings in the Options or Preferences menu of your browser. Visit www.allaboutcookies.org to learn more.

Functional Cookies

We do not allow you to opt-out of our certain cookies, as they are necessary to ensure the proper functioning of our website (such as prompting our cookie banner and remembering your privacy choices) and/or to monitor site performance. These cookies are not used in a way that constitutes a “sale” of your data under the CCPA. You can set your browser to block or alert you about these cookies, but some parts of the site will not work as intended if you do so. You can usually find these settings in the Options or Preferences menu of your browser. Visit www.allaboutcookies.org to learn more.

Performance Cookies

We do not allow you to opt-out of our certain cookies, as they are necessary to ensure the proper functioning of our website (such as prompting our cookie banner and remembering your privacy choices) and/or to monitor site performance. These cookies are not used in a way that constitutes a “sale” of your data under the CCPA. You can set your browser to block or alert you about these cookies, but some parts of the site will not work as intended if you do so. You can usually find these settings in the Options or Preferences menu of your browser. Visit www.allaboutcookies.org to learn more.

Sale of Personal Data

We also use cookies to personalize your experience on our websites, including by determining the most relevant content and advertisements to show you, and to monitor site traffic and performance, so that we may improve our websites and your experience. You may opt out of our use of such cookies (and the associated “sale” of your Personal Information) by using this toggle switch. You will still see some advertising, regardless of your selection. Because we do not track you across different devices, browsers and GEMG properties, your selection will take effect only on this browser, this device and this website.

Social Media Cookies

We also use cookies to personalize your experience on our websites, including by determining the most relevant content and advertisements to show you, and to monitor site traffic and performance, so that we may improve our websites and your experience. You may opt out of our use of such cookies (and the associated “sale” of your Personal Information) by using this toggle switch. You will still see some advertising, regardless of your selection. Because we do not track you across different devices, browsers and GEMG properties, your selection will take effect only on this browser, this device and this website.

Targeting Cookies

We also use cookies to personalize your experience on our websites, including by determining the most relevant content and advertisements to show you, and to monitor site traffic and performance, so that we may improve our websites and your experience. You may opt out of our use of such cookies (and the associated “sale” of your Personal Information) by using this toggle switch. You will still see some advertising, regardless of your selection. Because we do not track you across different devices, browsers and GEMG properties, your selection will take effect only on this browser, this device and this website.