System X designers beat the odds

 

Connecting state and local government leaders

Srinidhi Varadarajan, an assistant professor of computer science at Virginia Polytechnic Institute and State University, helped make supercomputing history last year.

Srinidhi Varadarajan, an assistant professor of computer science at Virginia Polytechnic Institute and State University, helped make supercomputing history last year.The $10 million Mac OS X supercomputer he designed, nicknamed System X, outperformed systems costing hundreds of millions more.His secret ingredient: 1,100 off-the-shelf Apple Power Mac dual-processor G5s. The design raised eyebrows, because Apple Computer Inc. had never considered itself a maker of supercomputing components.System X is currently ranked by Top500.org as the third most powerful computer in the world. Virginia Tech uses it for systems modeling, computational chemistry, biochemistry, nanoscale electronics and other research problems. The G5s meanwhile have been replaced by Apple Xserve units, which require less cooling and electricity.Varadarajan received a bachelor's degree in electronics and communications engineering from the Regional Engineering College of Warangal, India, and a Ph.D. in computer science from the State University of New York at Stony Brook. The National Science Foundation has given him a Faculty Early Career Development award.GCN associate editor Joab Jackson interviewed Varadarajan at his Blacksburg, Va., office by phone. VARADARAJAN: The large supercomputing facilities at the Energy Department, NASA and National Science Foundation centers must be perfectly stable as a production resource.If you want to improve performance or try new programming models or memory management techniques, the machine will necessarily become unstable. If it crashes, you are in trouble. And you need to test on very-large-scale systems.Simulation alone is not sufficient.In the late 1980s and early 1990s, we had access to machines where we could do testing, but then they disappeared.The scalability problems you see with systems of 2,000 processors are considerably different from those you see with 32 processors.When you are writing an algorithm today, you have to rely more on intuition with no experimental evidence to back it up. You don't know if the implementation is really scalable or not. So System X is intended to operate with both kinds of code.VARADARAJAN: We first looked at a Dell Inc. Itanium 2 solution. We got the architecture nailed down, but the machine would have cost about $12 million. We didn't have $12 million.Dell pulled out at the last minute, less than a day before the memorandum of understanding ended. We tried to work with Hewlett-Packard Co. on another Itanium solution, but the same thing happened.So the next thing we did was work with IBM Corp. to get a PowerPC 970 solution. The PowerPC 970 processor was considerably better for our applications, but the clock rates it could support were fairly low.IBM was building the chassis for data centers, where you can put two of them back to back in a rack. The problem is, the amount of heat generated that way is phenomenal.For us, space was not much of an issue, but thermal heat definitely was. You could cool one rack by putting an air conditioner right in front of it and blasting air through. But if you have 50 racks, you cannot buy and operate 50 air conditioners'it is cost-prohibitive to do that.So we worked with IBM trying to get an Opteron solution. The problem with the Opteron was its floating-point unit. Our applications are largely scientific, and they do a lot of multiplication and addition. The PowerPC 970 and the Itanium 2 each can do both a multiplication and an addition within one cycle of throughput. The Opteron can't do that.So, essentially, the Opteron design became very large. One dual-Opteron machine is cheaper than a G5 platform. Two of them are not.That is what caused that particular solution's cost to go beyond $10 million.Right about then, Apple announced a PowerPC 970 platform. We went to see what these nodes looked like. The motherboard was very well-designed. Everything matched. The memory management was excellent. There were two CPUs'the fastest 970 processors. The price point was perfect. It was a completely Unix-based system. Right then, we knew we could do this.So we called up the local Apple representative to see if he could get an order placed.VARADARAJAN: His reaction was, 'We need to talk to headquarters.'Cupertino said, 'Look, you need to come here and talk to us.' So we took a flight on Sunday night, met all day Monday and flew back Monday night. We spoke to senior management to convince them that it would work. They asked us for 24 hours to make up their minds.VARADARAJAN: The G5 was Apple's leading product. It had never been used in a project of this size'fairly high-profile, so it could have led to marketing nightmares.Everyone was telling us that such a system wouldn't work. There were too many risks.We were trying out a communications fabric that had never been tried before at this size. We had a brand-new cooling system design. We had no real history of doing supercomputing before. Pretty much everything stacked up against success.The plan made perfect sense to us, but it was very hard to convince others that this was doable.VARADARAJAN: We got the first machines coming off the assembly line, and that was our other big risk factor. The first application we ran was a benchmark, which is very taxing. It uses all processors at full capacity for hours at a stretch.The first benchmark result was a little disappointing. We optimized the system over a five-week period. We started picking up pace after Oct. 10 or so.The first number we reported, on Oct. 1, was only 800 billion floating-point operations per second. By Oct. 11, the machine was at 7 trillion FLOPS, and every day it was producing another 10 percent more.We had to do some fairly hectic work getting the optimizations correct'work inside the operating system, numerical libraries and system management groups.VARADARAJAN: If you have a node that goes down even once a year, and there are 1,100 nodes, that means 1,100 failures a year'that's three a day. And that is fairly normal for systems of this size.Systems on a big scale need fault tolerance, which means recovering from failures without ever causing applications to be affected. The [management software we developed] is both operating system-independent and hardware-independent. It recovers transparently from a failure.Here is how the fault tolerance operates: Say you have 1,100 nodes and 10 hot-spare nodes. When there is a failure, your application migrates to one of the hot spares. You never see the failure.Later, when one job terminates, it returns a node back to the hot-spare pool. You always have 10 hot spares. When the system reaches a certain threshold of, say, five bad nodes, you call in a systems administrator to fix the bad nodes.We're commercializing that software now. I am the chief technical officer of California Digital Corp., headquartered in Fremont, Calif. We have a software development center in Blacksburg.VARADARAJAN: I was in Gujarab, India, at the time, in the eighth grade. I wanted to play video games but my parents wouldn't let me have any. So I learned Basic to program video games. I used a British computer called the BBC Micro [from Acorn Computers Ltd. of Cambridge, England].To this day, I think the Micro is one of the best machines to teach programming on, because it's simple.By the tenth grade, I knew what was going on inside the machine'what the processor was, what the memory hierarchy was and everything else associated with it.But when you look at a modern PC, it is fairly intimidating. It doesn't seem like something you could learn to program in six months.

What's more

Age: 31


Car currently driven: Porsche


Last book read: The Federalist Papers


Last movie seen: 'Lost in Translation'


Favorite Web site: Google.com


Sports/leisure activities: Hiking and squash


Personal hero: Physicist Richard Feynman


Personal motto: 'Think different.'

Srinidhi Varadarajan: G5 supercomputer architect













GCN: You often speak at conferences about the increasing difficulty of testing new supercomputing technologies. Why is it harder when there are so many more supercomputers today?













GCN: When you were planning System X, how did you arrive at using Apple Power Mac G5s? Did you look at other 64-bit chips'the Intel Itanium 2 or Advanced Micro Devices' Opteron?




















GCN: What did the sales rep say when you said you wanted to order 1,100 of the G5s?





GCN: What was their concern?









GCN: So obviously Apple did decide to provide the computers for you.









GCN: You've said that one of the unforeseen problems with clusters is the uptime of individual machines.











GCN: How did you first get interested in IT?







X
This website uses cookies to enhance user experience and to analyze performance and traffic on our website. We also share information about your use of our site with our social media, advertising and analytics partners. Learn More / Do Not Sell My Personal Information
Accept Cookies
X
Cookie Preferences Cookie List

Do Not Sell My Personal Information

When you visit our website, we store cookies on your browser to collect information. The information collected might relate to you, your preferences or your device, and is mostly used to make the site work as you expect it to and to provide a more personalized web experience. However, you can choose not to allow certain types of cookies, which may impact your experience of the site and the services we are able to offer. Click on the different category headings to find out more and change our default settings according to your preference. You cannot opt-out of our First Party Strictly Necessary Cookies as they are deployed in order to ensure the proper functioning of our website (such as prompting the cookie banner and remembering your settings, to log into your account, to redirect you when you log out, etc.). For more information about the First and Third Party Cookies used please follow this link.

Allow All Cookies

Manage Consent Preferences

Strictly Necessary Cookies - Always Active

We do not allow you to opt-out of our certain cookies, as they are necessary to ensure the proper functioning of our website (such as prompting our cookie banner and remembering your privacy choices) and/or to monitor site performance. These cookies are not used in a way that constitutes a “sale” of your data under the CCPA. You can set your browser to block or alert you about these cookies, but some parts of the site will not work as intended if you do so. You can usually find these settings in the Options or Preferences menu of your browser. Visit www.allaboutcookies.org to learn more.

Sale of Personal Data, Targeting & Social Media Cookies

Under the California Consumer Privacy Act, you have the right to opt-out of the sale of your personal information to third parties. These cookies collect information for analytics and to personalize your experience with targeted ads. You may exercise your right to opt out of the sale of personal information by using this toggle switch. If you opt out we will not be able to offer you personalised ads and will not hand over your personal information to any third parties. Additionally, you may contact our legal department for further clarification about your rights as a California consumer by using this Exercise My Rights link

If you have enabled privacy controls on your browser (such as a plugin), we have to take that as a valid request to opt-out. Therefore we would not be able to track your activity through the web. This may affect our ability to personalize ads according to your preferences.

Targeting cookies may be set through our site by our advertising partners. They may be used by those companies to build a profile of your interests and show you relevant adverts on other sites. They do not store directly personal information, but are based on uniquely identifying your browser and internet device. If you do not allow these cookies, you will experience less targeted advertising.

Social media cookies are set by a range of social media services that we have added to the site to enable you to share our content with your friends and networks. They are capable of tracking your browser across other sites and building up a profile of your interests. This may impact the content and messages you see on other websites you visit. If you do not allow these cookies you may not be able to use or see these sharing tools.

If you want to opt out of all of our lead reports and lists, please submit a privacy request at our Do Not Sell page.

Save Settings
Cookie Preferences Cookie List

Cookie List

A cookie is a small piece of data (text file) that a website – when visited by a user – asks your browser to store on your device in order to remember information about you, such as your language preference or login information. Those cookies are set by us and called first-party cookies. We also use third-party cookies – which are cookies from a domain different than the domain of the website you are visiting – for our advertising and marketing efforts. More specifically, we use cookies and other tracking technologies for the following purposes:

Strictly Necessary Cookies

We do not allow you to opt-out of our certain cookies, as they are necessary to ensure the proper functioning of our website (such as prompting our cookie banner and remembering your privacy choices) and/or to monitor site performance. These cookies are not used in a way that constitutes a “sale” of your data under the CCPA. You can set your browser to block or alert you about these cookies, but some parts of the site will not work as intended if you do so. You can usually find these settings in the Options or Preferences menu of your browser. Visit www.allaboutcookies.org to learn more.

Functional Cookies

We do not allow you to opt-out of our certain cookies, as they are necessary to ensure the proper functioning of our website (such as prompting our cookie banner and remembering your privacy choices) and/or to monitor site performance. These cookies are not used in a way that constitutes a “sale” of your data under the CCPA. You can set your browser to block or alert you about these cookies, but some parts of the site will not work as intended if you do so. You can usually find these settings in the Options or Preferences menu of your browser. Visit www.allaboutcookies.org to learn more.

Performance Cookies

We do not allow you to opt-out of our certain cookies, as they are necessary to ensure the proper functioning of our website (such as prompting our cookie banner and remembering your privacy choices) and/or to monitor site performance. These cookies are not used in a way that constitutes a “sale” of your data under the CCPA. You can set your browser to block or alert you about these cookies, but some parts of the site will not work as intended if you do so. You can usually find these settings in the Options or Preferences menu of your browser. Visit www.allaboutcookies.org to learn more.

Sale of Personal Data

We also use cookies to personalize your experience on our websites, including by determining the most relevant content and advertisements to show you, and to monitor site traffic and performance, so that we may improve our websites and your experience. You may opt out of our use of such cookies (and the associated “sale” of your Personal Information) by using this toggle switch. You will still see some advertising, regardless of your selection. Because we do not track you across different devices, browsers and GEMG properties, your selection will take effect only on this browser, this device and this website.

Social Media Cookies

We also use cookies to personalize your experience on our websites, including by determining the most relevant content and advertisements to show you, and to monitor site traffic and performance, so that we may improve our websites and your experience. You may opt out of our use of such cookies (and the associated “sale” of your Personal Information) by using this toggle switch. You will still see some advertising, regardless of your selection. Because we do not track you across different devices, browsers and GEMG properties, your selection will take effect only on this browser, this device and this website.

Targeting Cookies

We also use cookies to personalize your experience on our websites, including by determining the most relevant content and advertisements to show you, and to monitor site traffic and performance, so that we may improve our websites and your experience. You may opt out of our use of such cookies (and the associated “sale” of your Personal Information) by using this toggle switch. You will still see some advertising, regardless of your selection. Because we do not track you across different devices, browsers and GEMG properties, your selection will take effect only on this browser, this device and this website.