Stop the fear mongering over ICD-10: It's just another taxonomy
Connecting state and local government leaders
The complaints about the upgrade to ICD-10 health care codes illustrate common misconceptions about the organization and characteristics of big data sets.
A recent Weekly Standard article shrilly announced that an Oct. 1, 2014, update of ICD medical codes would augur “a nightmare for doctors.” This was not the only source predicting doom and gloom for the forthcoming code switch from ICD-9 to ICD-10 -- so I decided to investigate.
It turns out many of these critics are falling victim to common misunderstandings about taxonomies and the characteristics and purposes of large, structured data sets. Here’s how.
ICD, or the International Classification of Diseases, is a taxonomy for diseases. As a classification scheme it is no different than any other hierarchical or drill-down scheme where data items or nodes flow from parent to child (and where the child is more specialized than its parent). Examples abound, including the Dewey Decimal System for libraries, Amazon’s product catalog, Netflix movie categories and iTunes music genres.
ICD-10 originated as an international standard with 12 top-level nodes that drill down to about four or five levels. Specifically, the ICD-10 codes consist of two parts: ICD-10-CM for diagnosis coding and the ICD-10-PCS for inpatient procedure coding. Basically, one taxonomy is for diseases and one is for procedures to treat diseases.
The headline of the Weekly Standard article is “Code Chaos,” which is a significant misnomer because a taxonomic structure is a well-proven, simple and effective information organization structure, Instead of chaos, it actually follows best practices for a code organization scheme.
There are three main areas of confusion about the great ICD code switch:
Misunderstanding of magnitude. The chief issue that naysayers harp on is the increase in the number of codes. The Weekly Standard describes this difference as “vast,” from 17,000 codes in ICD-9 to 155,000 in ICD-10.
The misunderstanding involves an incorrect assumption about complexity. An increase in the number of codes cannot be directly correlated to an increase in complexity, especially when the additions are made to a taxonomy or tree structure. Adding a new level to a tree always represents an exponential increase in the total number of nodes, but it may also mean that only one or two more new levels have been added to the tree.
When only the total number of new codes is emphasized, the implicit assumption is that some kind of linear search process is under way. But that is not true with taxonomies. The drill down is much more efficient than that. In fact, this sub-dividing of a tree structure is why trees are so prevalent as tools in computer science; they neatly execute a “divide and conquer” strategy for organizing information.
The total number of codes in ICD-10 should not be feared because they are divided into bite-sized groups within the taxonomy structure. In fact, the taxonomy structure drastically reduces the organization and learning complexity even when the number of total codes grows exponentially.
Misunderstanding of outliers. Many commenters poke fun at the level of detail in ICD-10 by citing a rare disease or procedure that is represented in ICD-10 that was not in ICD-9. Examples include an injury when water skiing or a bite from a venomous frog.
While some of these codes may need to be refined or possibly removed, the critics misunderstand the utility of such fine-grained detail. The assumption is that since several outlier codes may never be used, one can conclude that a large percentage of codes at this level of granularity will not be used.
Tom Coburn, the senator from Oklahoma who is also a doctor, is quoted as saying that 80 percent of the new codes won’t be used. I respectfully disagree. A more reasonable analysis is that the codes within each branch would follow a normal bell curve in terms of the rarity of their occurrence in the population. Again, linear assumptions based on outliers are not accurate for a hierarchical structure.
Misunderstanding of purpose. While some critics understand the inherent benefits of specificity, many others incorrectly assert that ICD-10 is mere bureaucratic overreach, which belies a misunderstanding of the correlation between this type of information and successful big data analytics.
Every business in the country is clamoring for improved business analytics and better decision making. The way you get that insight is by adding fidelity to your data management practices and data collection. High fidelity, granular data collection is the base of the pyramid, and predictive analytics is the top – you don’t get one without the other.
As they said in the movie Fame, “Fame costs, and right here is where you start paying.” Well, I’ll rephrase that and say, “Analytics costs, and fidelity is how you start paying.”
ICD-10 is a major change that will require resources and training to implement, and its structure and codes will be refined as it evolves. But it is also just a set of taxonomies to categorize diseases and procedures and should not be overblown.
Implementing ICD-10 is achievable by organizations of all sizes and will greatly improve the analysis of healthcare in the United States. Taxonomies are a proven metadata technique that is a best practice in information organization and discovery. In the case of ICD-10, the benefits really do outweigh the costs.
Michael C. Daconta (mdaconta@incadencecorp.com or @mdaconta) is the Vice President of Advanced Technology at InCadence Strategic Solutions and the former Metadata Program Manager for the Homeland Security Department. His new book is entitled, The Great Cloud Migration: Your Roadmap to Cloud Computing, Big Data and Linked Data.