Want to avoid software snafus? Here's a good place to start.

By William Jackson

Connecting state and local government leaders

| November 28, 2011

NIST has greatly expanded its SAMATE dataset to help software developers identify and avoid known coding weaknesses.

The National Institute of Standards and Technology has dramatically expanded its public dataset of software flaws to help developers and analyzers avoid weaknesses in their programs.

The Software Assurance Metrics and Tool Evaluation (SAMATE) Reference Dataset contains examples of errors in a number of popular programming languages that could leave software vulnerable to exploits by hackers and criminals.

Version 4.0 of SAMATE contains 175 broad categories of weaknesses with more than 60,000 specific cases. This release has more than doubled the number of categories and added 30 times the number of examples from the previous release.

“This is an enormous step toward bringing methodical science to the hard question of bugs in software,” said Paul E. Black, NIST computer scientist and SAMATE project leader. The dataset is used to build static analyzers that comb software for problems.

SAMATE, which began in 2004, is an umbrella project to improve software assurance by excluding known problems. The catalyst for the program was a Homeland Security Department project on software assurance tools, Black said.

“They wanted to understand what tools were available, measure their effectiveness and identify gaps,” he said. The tools analyze software, scanning it for known flaws and weaknesses. “We asked ourselves, does this tool catch all possible errors? We realized that to answer that we needed a list of all possible errors.”

NIST worked with DHS to establish a long-term program for creating such a list. The effort complements other programs, such as the Common Weakness Enumeration and the Common Vulnerabilities and Exposures databases maintained by Mitre Corp.

SAMATE contains specific examples of coding flaws in software written in Java, C and C++. Each case is about a page of computer code showing a problematic way of composing functions, loops or logic operations

The current dataset is limited in the languages it includes and still does not cover all types of weaknesses. The Common Weakness Enumeration contains closed to 500 types of weaknesses, Black said. “We’ve expanded enormously, but we could probably double our set again,” he said.

Industry is using SAMATE, Black said. Before the latest release, there had been 10,000 downloads of the dataset over a 10-month period.

NEXT STORY: Romanian charged with hacking NASA systems

This website uses cookies to enhance user experience and to analyze performance and traffic on our website. We also share information about your use of our site with our social media, advertising and analytics partners. / Do Not Sell My Personal Information

Accept Cookies