Steering clear of ‘sneakernet’ at big-data scale

 

Connecting state and local government leaders

To accommodate big data sharing and storage, agencies take to high-performance networks and clouds.

They don’t call it big data for nothing. A typical whole genome sample is about 150 GB, a day’s worth of HD surveillance video can run a terabyte or more, and yesterday’s weather constitutes some 20 terabytes of data from Doppler radar, weather satellites, buoy networks and other weather stations. And while there are plenty of tools that have scaled to collect and crunch such massive datasets, moving them around and making them available to researchers and the public is more problematic.

In the not-so-distant past, when even medium-sized files could tax a network, many organizations resorted to "sneakernet" -- manually walking a disk, or later a thumb drive, from one computer to the next.  And while express mail and hard-drive arrays have entered the equation, some organizations are still “shuttling entire storage systems from one place to another just to be able to share data,” Sumit Sadana, executive vice president of flash memory provider SanDisk wrote in an op-ed in Re/Code

“Hyperscale data centers, software-defined networking and new storage technologies represent the first steps in what will be a tremendous cycle of innovation,” Sadana wrote. But government agencies, often the generator of massive data sets, have sometimes had to build their own network infrastructure to support their research, opening their data for sharing and collaboration.

The Energy Science Network, for example, connects more than 40 Department of Energy research sites and 140 DOE-funded partners in industry and academia. ESnet helps to directly connect a long list of national laboratories, institutions and research facilities.  It currently carries about 20-petabytes of data a month, and is expected to grow to 100 petabytes by 2016. ESnet launched its fifth-generation network collaboration with Internet2 in 2012, making 100-gigabit/sec speed available and deploying an 8.8-terabyte network.

Last year, NASA’s High End Computer Network reached a 91-gigabit/sec transfer between facilities in Denver and Greenbelt, Md., across ESnet -- the “fastest end-to-end data transfer ever conducted under ‘real world’ conditions,” according to Wired.

Another example is N-Wave, the 10-gigabit network connecting the National Oceanic and Atmospheric Administration, Internet2 and its partners in the national Research and Education network community. NOAA makes heavy use of N-Wave; the agency's research and development high-performance computing system alone moves up to 60 terabytes of data a day. And according to a recent newsletter, N-Wave is currently deploying 100-gigabit/sec connections in the Washington, D.C. area, connecting three sites in Maryland and one in Virginia.

Another approach is to put big data directly into the cloud. NOAA is in the research and development stages of a new public-private Big Data Project that builds on collaboration with industry-leading cloud providers, said Maia Hansen, a presidential innovation fellow with the National Oceanic and Atmospheric Administration. 

According to Hansen, who spoke at the General Services Administration's DigitalGov Citizen Services Summit on May 21, NOAA will have stored approximately 300 petabytes of data by 2030 -- a resource that is extremely difficult to share if housed in a traditional NOAA data center. The NOAA Big Data Project is a structured agreement with Amazon Web Services, Google Cloud Platform, IBM, Microsoft and the Open Cloud Consortium to leverage private industry’s infrastructure and create a sustainable, market-driven ecosystem that lowers the cost barrier to data publication.

The National Institutes of Health, meanwhile, has created the Commons -- a storage framework built using the cloud and high performance computing to co-locate datasets with the analytical tools and workflow pipelines that use them, so that research results are accessible and shareable.

“The Commons is a scalable three-part environment that will provide computing and storage for sharing digital bio-medical research products,” George Komatsoulis, senior bioinformatics specialist at NIH’s National Center for Biotechnology Information, told the International Science Grid This Week.

The Commons is built on a computing platform built for storage and computation on this scale. The data, or digital objects, come from Individual investigators, existing databases and programs like Big Data to Knowledge. The objects follow specific guidelines for identification and citation so that they’re readily searchable.

“The Commons will make it easier to use (and reuse) the data and software contained within these well-curated resources,” Komatsoulis said.

The Commons will serve as a hub of public and private cloud providers, academic national labs and various computing centers willing to follow NIH requirements and help to create a democratized approach to these research datasets. All while keeping the cost of big data storage and sharing down, according to Komatsoulis.

The physical shuttling of storage systems may never disappear entirely -- as computer scientist Andrew Tanenbaum once put it, "never underestimate the bandwidth of a station wagon full of tapes hurtling down the highway" -- but cloud computing and high-speed connectivity are giving government more and more ways to leave the sneakernet behind.

X
This website uses cookies to enhance user experience and to analyze performance and traffic on our website. We also share information about your use of our site with our social media, advertising and analytics partners. Learn More / Do Not Sell My Personal Information
Accept Cookies
X
Cookie Preferences Cookie List

Do Not Sell My Personal Information

When you visit our website, we store cookies on your browser to collect information. The information collected might relate to you, your preferences or your device, and is mostly used to make the site work as you expect it to and to provide a more personalized web experience. However, you can choose not to allow certain types of cookies, which may impact your experience of the site and the services we are able to offer. Click on the different category headings to find out more and change our default settings according to your preference. You cannot opt-out of our First Party Strictly Necessary Cookies as they are deployed in order to ensure the proper functioning of our website (such as prompting the cookie banner and remembering your settings, to log into your account, to redirect you when you log out, etc.). For more information about the First and Third Party Cookies used please follow this link.

Allow All Cookies

Manage Consent Preferences

Strictly Necessary Cookies - Always Active

We do not allow you to opt-out of our certain cookies, as they are necessary to ensure the proper functioning of our website (such as prompting our cookie banner and remembering your privacy choices) and/or to monitor site performance. These cookies are not used in a way that constitutes a “sale” of your data under the CCPA. You can set your browser to block or alert you about these cookies, but some parts of the site will not work as intended if you do so. You can usually find these settings in the Options or Preferences menu of your browser. Visit www.allaboutcookies.org to learn more.

Sale of Personal Data, Targeting & Social Media Cookies

Under the California Consumer Privacy Act, you have the right to opt-out of the sale of your personal information to third parties. These cookies collect information for analytics and to personalize your experience with targeted ads. You may exercise your right to opt out of the sale of personal information by using this toggle switch. If you opt out we will not be able to offer you personalised ads and will not hand over your personal information to any third parties. Additionally, you may contact our legal department for further clarification about your rights as a California consumer by using this Exercise My Rights link

If you have enabled privacy controls on your browser (such as a plugin), we have to take that as a valid request to opt-out. Therefore we would not be able to track your activity through the web. This may affect our ability to personalize ads according to your preferences.

Targeting cookies may be set through our site by our advertising partners. They may be used by those companies to build a profile of your interests and show you relevant adverts on other sites. They do not store directly personal information, but are based on uniquely identifying your browser and internet device. If you do not allow these cookies, you will experience less targeted advertising.

Social media cookies are set by a range of social media services that we have added to the site to enable you to share our content with your friends and networks. They are capable of tracking your browser across other sites and building up a profile of your interests. This may impact the content and messages you see on other websites you visit. If you do not allow these cookies you may not be able to use or see these sharing tools.

If you want to opt out of all of our lead reports and lists, please submit a privacy request at our Do Not Sell page.

Save Settings
Cookie Preferences Cookie List

Cookie List

A cookie is a small piece of data (text file) that a website – when visited by a user – asks your browser to store on your device in order to remember information about you, such as your language preference or login information. Those cookies are set by us and called first-party cookies. We also use third-party cookies – which are cookies from a domain different than the domain of the website you are visiting – for our advertising and marketing efforts. More specifically, we use cookies and other tracking technologies for the following purposes:

Strictly Necessary Cookies

We do not allow you to opt-out of our certain cookies, as they are necessary to ensure the proper functioning of our website (such as prompting our cookie banner and remembering your privacy choices) and/or to monitor site performance. These cookies are not used in a way that constitutes a “sale” of your data under the CCPA. You can set your browser to block or alert you about these cookies, but some parts of the site will not work as intended if you do so. You can usually find these settings in the Options or Preferences menu of your browser. Visit www.allaboutcookies.org to learn more.

Functional Cookies

We do not allow you to opt-out of our certain cookies, as they are necessary to ensure the proper functioning of our website (such as prompting our cookie banner and remembering your privacy choices) and/or to monitor site performance. These cookies are not used in a way that constitutes a “sale” of your data under the CCPA. You can set your browser to block or alert you about these cookies, but some parts of the site will not work as intended if you do so. You can usually find these settings in the Options or Preferences menu of your browser. Visit www.allaboutcookies.org to learn more.

Performance Cookies

We do not allow you to opt-out of our certain cookies, as they are necessary to ensure the proper functioning of our website (such as prompting our cookie banner and remembering your privacy choices) and/or to monitor site performance. These cookies are not used in a way that constitutes a “sale” of your data under the CCPA. You can set your browser to block or alert you about these cookies, but some parts of the site will not work as intended if you do so. You can usually find these settings in the Options or Preferences menu of your browser. Visit www.allaboutcookies.org to learn more.

Sale of Personal Data

We also use cookies to personalize your experience on our websites, including by determining the most relevant content and advertisements to show you, and to monitor site traffic and performance, so that we may improve our websites and your experience. You may opt out of our use of such cookies (and the associated “sale” of your Personal Information) by using this toggle switch. You will still see some advertising, regardless of your selection. Because we do not track you across different devices, browsers and GEMG properties, your selection will take effect only on this browser, this device and this website.

Social Media Cookies

We also use cookies to personalize your experience on our websites, including by determining the most relevant content and advertisements to show you, and to monitor site traffic and performance, so that we may improve our websites and your experience. You may opt out of our use of such cookies (and the associated “sale” of your Personal Information) by using this toggle switch. You will still see some advertising, regardless of your selection. Because we do not track you across different devices, browsers and GEMG properties, your selection will take effect only on this browser, this device and this website.

Targeting Cookies

We also use cookies to personalize your experience on our websites, including by determining the most relevant content and advertisements to show you, and to monitor site traffic and performance, so that we may improve our websites and your experience. You may opt out of our use of such cookies (and the associated “sale” of your Personal Information) by using this toggle switch. You will still see some advertising, regardless of your selection. Because we do not track you across different devices, browsers and GEMG properties, your selection will take effect only on this browser, this device and this website.