Super storage

 

Connecting state and local government leaders

Multiserver clusters are the new supercomputer stand-in. Here's how storage has adapted to its important new supporting role.

Storage was straightforward in supercomputing's monolithic days. Storage subsystems came with supercomputers and were fairly simple to integrate. Today, high-performance computing often involves clusters of computers. Now 72 percent of the world's 500 fastest supercomputers are clusters, according to Top500.org, a group that publishes an annual list. The increasingly dominant distributed computing model tests storage in ways centralized supercomputer systems never did. 'It's infinitely harder to connect storage in a cluster environment,' said Eric Seidman, storage systems manager at Verari Systems, a high-performance computing vendor. The overall objective, Seidman said, is to achieve balanced performance among processors, memory and input/output subsystems, which connect the cluster to storage. Many vendors now offer storage systems for clusters. Storage experts say, however, that no single solution on the market will fit every supercomputing task. Instead, tiers of solutions are emerging to address the input/output requirements of different high-performance computing environments. High-end clusters with a heavy input/output demand typically call for features such as parallel file systems and high-speed storage interconnects. But clusters of more modest size may use more familiar storage staples, such as network-attached storage (NAS) systems.High-performance computing storage differs from the mainstream, enterprise variety in that storage managers need to pursue a holistic approach, said Anne Vincenti, storage director at Linux Networx, which builds Linux-based supercomputing clusters.'They need to not only look at the storage but'get into the guts of the system architecture in order to optimize clusters,' Vincenti said. On the flip side, technology managers who concentrate on the computing side of clusters should also take a broader view, Seidman said. 'There's a lot of focus on the processor type and clock speed'and less focus on the overall architecture,' he added.The architecture's storage element has several critical parts, including the storage array, the link between the storage device and the computing cluster, and the file system that sits atop the storage architecture. Altogether, those elements supply data to the scientific and technical applications running on a cluster. Storage systems also house the data that those applications generate. Customers must tune those various components to maximize performance. Because clusters vary in size and performance and run different applications, they need individualized storage approaches. Lawrence Berkeley National Laboratory's Scientific Cluster Support program has more than 20 Linux clusters in production. The group 'specializes in building individual clusters for specific areas of scientific research,' said Gary Jung, SCS project manager at the lab's Information Technology Division. 'Since we are building [high-performance computing] systems for specific areas of science, we can address the unique storage considerations more directly.' Storage solutions at SCS and other high-performance computing centers tend to fall into three categories based on cluster size: small, medium and large. Linux Networx defines the value performance segment as small clusters with as many as 32 nodes. Applications may include crash simulations or smaller computational fluid dynamics jobs, Vincenti said. Customers in the value camp often can manage with a file server running the standard Network File Server (NFS) protocol or a commercial NAS box, she said. Value customers emphasize manageability and seek solutions 'that will deploy and work quickly without having to hack around,' Vincenti said.Smaller file sizes also characterize the value bracket. Vincenti said smaller files may be 50M or less, while a large file could be bigger than 50G.At SCS, smaller clusters with modest input/output requirements use some form of Linux-based NFS server with a direct-attached Redundant Array of Independent Disks solution, Jung said. NFS, commonly used in mainstream Unix and Linux environments, provides a distributed file system through which users share files on a network.Midsize clusters, meanwhile, gravitate toward NAS. The SCS program has deployed Network Appliance's NAS systems for clusters with moderate input/output requirements, Jung said. He said the systems are reliable and easy to integrate. The Army Research Laboratory also uses NAS storage. 'NAS systems with multiple Gigabit Ethernet connections have been successful for small- to moderate-sized clusters,' said Tom Kendall, chief engineer at the Army Research Laboratory's Major Shared Resource Center. The midsize market segment contains clusters with as many as 256 nodes, according to Linux Networx. Applications in this segment include process integration and design optimization and computational fluid dynamics. Clusters may handle small and large file-streaming applications. Using the latter, scientists can quickly load massive amounts of data into memory for faster execution of jobs.In the case of top-tier clusters, storage systems aim to address the problem of hundreds or thousands of data-hungry nodes. Vincenti said some high-performance computing systems don't get enough data to take advantage of their computing power. 'You have to have enough [input/output] built into the system to keep the [system] fully optimized,' she said.Large clusters with intensive input/output demands often call for a parallel file system. Parallel file systems are designed for supercomputing tasks in which multiple nodes access the same files simultaneously. In addition, parallel file systems let a cluster's nodes access data at speeds much faster than what NFS can achieve.'Parallel file systems are the greatest hope in this area,' Kendall said. The Army lab is deploying clusters with input/output requirements approaching 10 gigabytes/sec. The clusters have 'capacities of hundreds of terabytes, which are beyond the capability of nearly all single-server solutions,' he said. Parallel file systems don't suit every supercomputing challenge, however.'There are applications that are by their very nature serial and, therefore, unable to leverage existing parallel file systems beyond the capability' of a single node's input/output performance, Kendall said. 'Fortunately, only a small percentage of our applications fall into this regime,' he said.The San Diego Supercomputer Center's three largest clusters ' DataStar, TeraGrid IA-64 and BlueGene ' use IBM's General Parallel File System, said Richard Moore, director of production systems at the center. Similarly, SCS has used Panasas' ActiveScale Storage Cluster for its larger clusters with intensive input/output requirements, Jung said. ActiveScale Storage Cluster includes Panasas' parallel file system, PanFS.Cluster File Systems' Lustre is another example of a parallel file system. The Lawrence Livermore National Laboratory, the National Center for Supercomputing Applications and Sandia National Laboratories are among the government customers using this open-source file system.The high-end cluster segment also marks a shift in storage architecture. One approach incorporates the lower-level, block-based data access method of storage-area networks (SANs) alongside the file-based access techniques of NAS. In this approach, a large cluster may reserve a certain number of nodes for input/output. Those nodes feed data to the cluster's computing nodes. The ratio of input/output nodes to computing nodes varies. The San Diego Supercomputer Center's BlueGene cluster, for example, operates 1,024 computing nodes and 128 input/output nodes. Input/output nodes may connect to a cluster's computing nodes via proprietary interconnects, such as Myrinet and Quadrics, or standard Gigabit Ethernet, storage experts say. Reaching out from clusters to external storage devices, input/output nodes typically use Fibre Channel and attach to storage in a SAN-like manner. Supercomputer SANs may use storage devices from vendors such as DataDirect Networks, Engenio Information Technologies and Verari Systems, which markets Engenio products as an OEM partner. Vincenti cited interconnect bandwidth and fast and flexible disk controllers as attributes that make SANs a suitable environment for high-performance computing applications such as streaming.But storage technologists say such storage architectures aren't strictly SANs. Seidman said input/output nodes are typically NAS-attached to computing nodes and SAN-attached to storage systems. Some NAS-based approaches can also offer storage to high-end clusters. Vendors such as Panasas offer NAS solutions that let a cluster's input/output nodes access data in parallel. 'We are starting to see a big move toward parallelized NAS,' said Dan Smith, senior enterprise consultant at GTSI.Panasas' storage product deploys a small file system ' the company's DirectFlow client ' on all the nodes in a Linux cluster. The DirectFlow client lets the nodes access Panasas' storage devices in parallel.This parallel NAS solution has a SAN element, however. Panasas uses Internet SCSI over Ethernet as the underlying transport mechanism, said Larry Jones, vice president of marketing at Panasas. 'SAN and NAS get a bit muddy in clusters,' Seidman said.Buyers have multiple options when it comes to cluster storage. They must weigh factors such as cluster size, applications and throughput requirements before reaching a decision.











Special requirements















Storage classes























































X
This website uses cookies to enhance user experience and to analyze performance and traffic on our website. We also share information about your use of our site with our social media, advertising and analytics partners. Learn More / Do Not Sell My Personal Information
Accept Cookies
X
Cookie Preferences Cookie List

Do Not Sell My Personal Information

When you visit our website, we store cookies on your browser to collect information. The information collected might relate to you, your preferences or your device, and is mostly used to make the site work as you expect it to and to provide a more personalized web experience. However, you can choose not to allow certain types of cookies, which may impact your experience of the site and the services we are able to offer. Click on the different category headings to find out more and change our default settings according to your preference. You cannot opt-out of our First Party Strictly Necessary Cookies as they are deployed in order to ensure the proper functioning of our website (such as prompting the cookie banner and remembering your settings, to log into your account, to redirect you when you log out, etc.). For more information about the First and Third Party Cookies used please follow this link.

Allow All Cookies

Manage Consent Preferences

Strictly Necessary Cookies - Always Active

We do not allow you to opt-out of our certain cookies, as they are necessary to ensure the proper functioning of our website (such as prompting our cookie banner and remembering your privacy choices) and/or to monitor site performance. These cookies are not used in a way that constitutes a “sale” of your data under the CCPA. You can set your browser to block or alert you about these cookies, but some parts of the site will not work as intended if you do so. You can usually find these settings in the Options or Preferences menu of your browser. Visit www.allaboutcookies.org to learn more.

Sale of Personal Data, Targeting & Social Media Cookies

Under the California Consumer Privacy Act, you have the right to opt-out of the sale of your personal information to third parties. These cookies collect information for analytics and to personalize your experience with targeted ads. You may exercise your right to opt out of the sale of personal information by using this toggle switch. If you opt out we will not be able to offer you personalised ads and will not hand over your personal information to any third parties. Additionally, you may contact our legal department for further clarification about your rights as a California consumer by using this Exercise My Rights link

If you have enabled privacy controls on your browser (such as a plugin), we have to take that as a valid request to opt-out. Therefore we would not be able to track your activity through the web. This may affect our ability to personalize ads according to your preferences.

Targeting cookies may be set through our site by our advertising partners. They may be used by those companies to build a profile of your interests and show you relevant adverts on other sites. They do not store directly personal information, but are based on uniquely identifying your browser and internet device. If you do not allow these cookies, you will experience less targeted advertising.

Social media cookies are set by a range of social media services that we have added to the site to enable you to share our content with your friends and networks. They are capable of tracking your browser across other sites and building up a profile of your interests. This may impact the content and messages you see on other websites you visit. If you do not allow these cookies you may not be able to use or see these sharing tools.

If you want to opt out of all of our lead reports and lists, please submit a privacy request at our Do Not Sell page.

Save Settings
Cookie Preferences Cookie List

Cookie List

A cookie is a small piece of data (text file) that a website – when visited by a user – asks your browser to store on your device in order to remember information about you, such as your language preference or login information. Those cookies are set by us and called first-party cookies. We also use third-party cookies – which are cookies from a domain different than the domain of the website you are visiting – for our advertising and marketing efforts. More specifically, we use cookies and other tracking technologies for the following purposes:

Strictly Necessary Cookies

We do not allow you to opt-out of our certain cookies, as they are necessary to ensure the proper functioning of our website (such as prompting our cookie banner and remembering your privacy choices) and/or to monitor site performance. These cookies are not used in a way that constitutes a “sale” of your data under the CCPA. You can set your browser to block or alert you about these cookies, but some parts of the site will not work as intended if you do so. You can usually find these settings in the Options or Preferences menu of your browser. Visit www.allaboutcookies.org to learn more.

Functional Cookies

We do not allow you to opt-out of our certain cookies, as they are necessary to ensure the proper functioning of our website (such as prompting our cookie banner and remembering your privacy choices) and/or to monitor site performance. These cookies are not used in a way that constitutes a “sale” of your data under the CCPA. You can set your browser to block or alert you about these cookies, but some parts of the site will not work as intended if you do so. You can usually find these settings in the Options or Preferences menu of your browser. Visit www.allaboutcookies.org to learn more.

Performance Cookies

We do not allow you to opt-out of our certain cookies, as they are necessary to ensure the proper functioning of our website (such as prompting our cookie banner and remembering your privacy choices) and/or to monitor site performance. These cookies are not used in a way that constitutes a “sale” of your data under the CCPA. You can set your browser to block or alert you about these cookies, but some parts of the site will not work as intended if you do so. You can usually find these settings in the Options or Preferences menu of your browser. Visit www.allaboutcookies.org to learn more.

Sale of Personal Data

We also use cookies to personalize your experience on our websites, including by determining the most relevant content and advertisements to show you, and to monitor site traffic and performance, so that we may improve our websites and your experience. You may opt out of our use of such cookies (and the associated “sale” of your Personal Information) by using this toggle switch. You will still see some advertising, regardless of your selection. Because we do not track you across different devices, browsers and GEMG properties, your selection will take effect only on this browser, this device and this website.

Social Media Cookies

We also use cookies to personalize your experience on our websites, including by determining the most relevant content and advertisements to show you, and to monitor site traffic and performance, so that we may improve our websites and your experience. You may opt out of our use of such cookies (and the associated “sale” of your Personal Information) by using this toggle switch. You will still see some advertising, regardless of your selection. Because we do not track you across different devices, browsers and GEMG properties, your selection will take effect only on this browser, this device and this website.

Targeting Cookies

We also use cookies to personalize your experience on our websites, including by determining the most relevant content and advertisements to show you, and to monitor site traffic and performance, so that we may improve our websites and your experience. You may opt out of our use of such cookies (and the associated “sale” of your Personal Information) by using this toggle switch. You will still see some advertising, regardless of your selection. Because we do not track you across different devices, browsers and GEMG properties, your selection will take effect only on this browser, this device and this website.