Frank Dravis | Keep your data fit enough to survive

 

Connecting state and local government leaders

As an agency amasses data, its IT architects are likely to find problems with consistency.

As an agency amasses data, its IT architects are likely to find problems with consistency. Some data elements are formatted one way, others formatted differently. Some information becomes outdated but is never erased. Some is wrong and never corrected. It's a headache that only grows worse as databases expand and are aggregated.As the vice president of information quality at Firstlogic Inc. of LaCrosse, Wis., Frank Dravis is something of a guru on these matters. FirstLogic sells software that analyzes and improves the quality of enterprise data. The company got its start working government contracts and still counts among its customers the Commerce, Homeland Security and Labor departments, as well as the General Services Administration, House of Representatives and Postal Service.Dravis helps organizations work through their data quality problems. He is a member of the International Association of Information and Data Quality and writes a blog at weblogs.firstlogic.com/dravis. Dravis holds a bachelor's degree in computer science from National University in San Diego and is currently pursuing a master's in business at the University of Wisconsin. He spoke to GCN associate writer Joab Jackson by phone. Data quality is fitness for use. It is how well your data supports your own business rules and operations. How can you expect an agency to share information internally or across other agencies if it doesn't meet some common formatting standards? It won't be immediately useful if it doesn't meet some standard.The information gets thrown over the fence, and the people who catch it have to put in place their own [extract, transform and loading] system, and that is [money spent on] a lot of nonvalue add. It is gumming up the whole information pipeline. The greater the formatting problems, then the greater likelihood you're not going to go back to the source and ask for that information again. Dates have common formatting errors. There are so many ways to enter dates. Are you using dashes, slashes or periods? If you are merging data together and one data source uses slashes and another uses periods, it can be confusing to people using the data. While they may be able to decipher the dates, sooner or later it slows the whole process down.Part numbers are inherently problematic. Again, some people want to use slashes and dashes, but maybe over time they replace them with spaces. Then later, they concatenate the fields together wherever there is a space. All of a sudden, there are nine-character part numbers where there should be 10-character part numbers. They took the dashes out, replaced them with a spaces and then slammed them together. Classic stuff. We started with contracts with the Postal Service. We provided an address assignment technology that was loaded into multiline optical character recognition systems. That's how I got my start here; I was a ZIP-plus-four assignment engineer. I wrote address assignment algorithms and matching algorithms. As the mail pieces flew by, the little camera took a picture of each envelope and sent it to [our software, which] deciphered the characters. It looked the addresses up in our address database and then supplied the bar code to spread on the mail piece. The mail piece could then go into the automation mail stream.Now that was a data quality application. A lot of times the address would be slightly askew, or radically askew, and it didn't match the address database. So you had to do some fuzzy matching logic to find out what was close, and once the confidence was above a certain threshold, you could say this is the real address.That was the genesis. This was 20 years ago. I remember when my boss came up to me and said, 'Frank, we're doing address cleansing, and we need to do name cleansing. It should be a short step.' So we developed a name-cleansing, standardization and formatting algorithm.Addresses took us to names, names took us to matching, matching took us to consolidation. Wherever our customer had a data quality problem, they dragged us into that field. And so that is why our solution works on operational data. Early on, customers would come to us and say they need an address-cleansing solution. We'd sell them address-cleansing solutions, but it was a lot like someone going to the pharmacist and saying, 'I need high-blood-pressure medication.' 'Have you seen the doctor?' 'No.' 'How do you know you have high blood pressure?' 'I can feel it.'Well, now Firstlogic offers data-profiling software that measures your data against your business rules. You can quantify the level of data quality against your own thresholds [of acceptable quality] to build a return-on-investment, so you can say, 'Here are our data defects. If we fix these data defects, we will gain these benefits.' Householding is the act of [bundling] similar records. Let's use a retail example. You have [records on] Frank Dravis, Daniel Dravis, Kim Dravis and Drew Dravis, and they all have the same address. All four of them have the same phone number. The ages of two of them are over 40, and the ages of the other two are under 20. [This practice would] aggregate those four into a household to get a view of purchasing patterns.The Navy is very interested in using household views to optimize their supply chain. Some of these weapon systems are kind of old. Over time, the manufacturer of a jet engine may stop supporting that engine. Maybe an aftermarket manufacturer supports that engine. Within that engine, there may be a generator, or the turbine blades, each made by a different manufacturer. So you need to get a hierarchical view of all the vendors for the engine so you can select the most cost-effective ones, the vendors closest to your re-engineering facilities, or whatever your criteria [are].If it is the FBI, a form of household might be an associative network or actor network. Who are all the various people related to, or associate[d] with, this one person? The associations may be as tenuous as air flights. Were these two people on the same airplane, or did these two people fly to same country at the same time? This is not a simple thing you ask. It is a big project. I could give you a very short answer: It involves aggregating and integrating the various disparate data sources to a staging area, using an [extract, transform and loading] application to load all of this data into the integrated data warehouse.From there, extract the data from the data warehouse into various contextually rich data marts. Various applications will then either feed the data marts or feed from the data marts. That is the 60-second statement of a very, very big subject. I worked with a client that had five or ten records for each customer in its customer relationship management system. They didn't have any practices to guard against duplicative customer entry. Most organizations would have found that system unusable, but because individual managers used their own little subsets of the data, they understood which records were defective and should be avoided.The downside was that the information was in the heads of the practitioners. It was not organizational information. The marketing people couldn't run a report on who the top customers were, because there were too many duplicate records.Have you ever gotten duplicate mailings from the same vendor, with one title on one piece and another title on another piece? [That vendor] doesn't reconcile these duplicative contacts, and over time the problem just gets bigger. Sooner or later, [the organization] realizes it must implement some sort of managing and consolidation solution. In order for that solution to work, it has to do address and name cleansing.

Age: 46


Family: Two girls, ages 5 and 15


Hobbies: 'Lots, most involve the outdoors.'


Car currently driven: Audi A4; 'My next car will be a hybrid.'


Military service: Six years in the Navy as a sonar technician


Last books read: Who Moved My Cheese? by Spencer Johnson and Kenneth Blanchard; Angels and Demons by Dan Brown


Most played iPod music genre: Alternative


Personal hero: Ronald Reagan. 'But I vote independent.'

Firstlogic's Frank Dravis







GCN: How do you define data quality?

Dravis:

GCN: How important is data formatting to sharing data?

Dravis:



GCN: What are some common formatting errors?

Dravis:



GCN: How did Firstlogic get started?

Dravis:






GCN: And how has the technology evolved?

Dravis:



GCN: I noticed a research paper you co-authored on something called 'householding' was supported by the Naval Inventory Control Point. What is householding, and what was the Navy's interest in it?

Dravis:





GCN: Technically speaking, how would an agency create an enterprisewide data structure?

Dravis:


GCN: What is the most extreme example of poor data quality that you've seen?

Dravis:




NEXT STORY: Incoming

X
This website uses cookies to enhance user experience and to analyze performance and traffic on our website. We also share information about your use of our site with our social media, advertising and analytics partners. Learn More / Do Not Sell My Personal Information
Accept Cookies
X
Cookie Preferences Cookie List

Do Not Sell My Personal Information

When you visit our website, we store cookies on your browser to collect information. The information collected might relate to you, your preferences or your device, and is mostly used to make the site work as you expect it to and to provide a more personalized web experience. However, you can choose not to allow certain types of cookies, which may impact your experience of the site and the services we are able to offer. Click on the different category headings to find out more and change our default settings according to your preference. You cannot opt-out of our First Party Strictly Necessary Cookies as they are deployed in order to ensure the proper functioning of our website (such as prompting the cookie banner and remembering your settings, to log into your account, to redirect you when you log out, etc.). For more information about the First and Third Party Cookies used please follow this link.

Allow All Cookies

Manage Consent Preferences

Strictly Necessary Cookies - Always Active

We do not allow you to opt-out of our certain cookies, as they are necessary to ensure the proper functioning of our website (such as prompting our cookie banner and remembering your privacy choices) and/or to monitor site performance. These cookies are not used in a way that constitutes a “sale” of your data under the CCPA. You can set your browser to block or alert you about these cookies, but some parts of the site will not work as intended if you do so. You can usually find these settings in the Options or Preferences menu of your browser. Visit www.allaboutcookies.org to learn more.

Sale of Personal Data, Targeting & Social Media Cookies

Under the California Consumer Privacy Act, you have the right to opt-out of the sale of your personal information to third parties. These cookies collect information for analytics and to personalize your experience with targeted ads. You may exercise your right to opt out of the sale of personal information by using this toggle switch. If you opt out we will not be able to offer you personalised ads and will not hand over your personal information to any third parties. Additionally, you may contact our legal department for further clarification about your rights as a California consumer by using this Exercise My Rights link

If you have enabled privacy controls on your browser (such as a plugin), we have to take that as a valid request to opt-out. Therefore we would not be able to track your activity through the web. This may affect our ability to personalize ads according to your preferences.

Targeting cookies may be set through our site by our advertising partners. They may be used by those companies to build a profile of your interests and show you relevant adverts on other sites. They do not store directly personal information, but are based on uniquely identifying your browser and internet device. If you do not allow these cookies, you will experience less targeted advertising.

Social media cookies are set by a range of social media services that we have added to the site to enable you to share our content with your friends and networks. They are capable of tracking your browser across other sites and building up a profile of your interests. This may impact the content and messages you see on other websites you visit. If you do not allow these cookies you may not be able to use or see these sharing tools.

If you want to opt out of all of our lead reports and lists, please submit a privacy request at our Do Not Sell page.

Save Settings
Cookie Preferences Cookie List

Cookie List

A cookie is a small piece of data (text file) that a website – when visited by a user – asks your browser to store on your device in order to remember information about you, such as your language preference or login information. Those cookies are set by us and called first-party cookies. We also use third-party cookies – which are cookies from a domain different than the domain of the website you are visiting – for our advertising and marketing efforts. More specifically, we use cookies and other tracking technologies for the following purposes:

Strictly Necessary Cookies

We do not allow you to opt-out of our certain cookies, as they are necessary to ensure the proper functioning of our website (such as prompting our cookie banner and remembering your privacy choices) and/or to monitor site performance. These cookies are not used in a way that constitutes a “sale” of your data under the CCPA. You can set your browser to block or alert you about these cookies, but some parts of the site will not work as intended if you do so. You can usually find these settings in the Options or Preferences menu of your browser. Visit www.allaboutcookies.org to learn more.

Functional Cookies

We do not allow you to opt-out of our certain cookies, as they are necessary to ensure the proper functioning of our website (such as prompting our cookie banner and remembering your privacy choices) and/or to monitor site performance. These cookies are not used in a way that constitutes a “sale” of your data under the CCPA. You can set your browser to block or alert you about these cookies, but some parts of the site will not work as intended if you do so. You can usually find these settings in the Options or Preferences menu of your browser. Visit www.allaboutcookies.org to learn more.

Performance Cookies

We do not allow you to opt-out of our certain cookies, as they are necessary to ensure the proper functioning of our website (such as prompting our cookie banner and remembering your privacy choices) and/or to monitor site performance. These cookies are not used in a way that constitutes a “sale” of your data under the CCPA. You can set your browser to block or alert you about these cookies, but some parts of the site will not work as intended if you do so. You can usually find these settings in the Options or Preferences menu of your browser. Visit www.allaboutcookies.org to learn more.

Sale of Personal Data

We also use cookies to personalize your experience on our websites, including by determining the most relevant content and advertisements to show you, and to monitor site traffic and performance, so that we may improve our websites and your experience. You may opt out of our use of such cookies (and the associated “sale” of your Personal Information) by using this toggle switch. You will still see some advertising, regardless of your selection. Because we do not track you across different devices, browsers and GEMG properties, your selection will take effect only on this browser, this device and this website.

Social Media Cookies

We also use cookies to personalize your experience on our websites, including by determining the most relevant content and advertisements to show you, and to monitor site traffic and performance, so that we may improve our websites and your experience. You may opt out of our use of such cookies (and the associated “sale” of your Personal Information) by using this toggle switch. You will still see some advertising, regardless of your selection. Because we do not track you across different devices, browsers and GEMG properties, your selection will take effect only on this browser, this device and this website.

Targeting Cookies

We also use cookies to personalize your experience on our websites, including by determining the most relevant content and advertisements to show you, and to monitor site traffic and performance, so that we may improve our websites and your experience. You may opt out of our use of such cookies (and the associated “sale” of your Personal Information) by using this toggle switch. You will still see some advertising, regardless of your selection. Because we do not track you across different devices, browsers and GEMG properties, your selection will take effect only on this browser, this device and this website.