The top machine on top500.org is now Chinese, 5 times faster than the nearest DOE machine. That's not set to change for at least two years: http://nextbigfuture.
The Chinese system is also all non-US hardware. Somebody in the US Gov't thought an embargo might slow things down: http://www.theregister. co.uk/2015/04/10/us_intel_ china_ban/, but it had no effect; China built their own. The author of the article pointed out: "If the ERC honestly thinks that the ban will put a significant dent in China's supercomputing plans, it is either very foolish or dangerously misinformed." Oops, that only took a year. I guess they were both foolish AND misinformed. The main effect of the embargo was to ensure Chinese independence of US hardware.
Of course, the US is responding. We've got Meetings, Programs, Timelines, and Plans. What we don't seem to have is a chance of catching up. The first Meetings began in 2007, and a Timeline was created almost 10 years later, for a system to be delivered in 2023, only 3 years behind China's projected exascale delivery in 2020.
Does anyone see any way to get DOE HPC back on the successful track of the 1994-2008 era? Or does it make more sense to call it a day and start buying HPC systems from China?
28 comments:
Ron,
I read this post with great interest and mixed feelings. In a nutshell the sky isn't falling,... yet. I agree that our current program is a tepid and poorly executed mess of meetings along with nearly zero intellectual depth. The core of the problem is far deeper than anything you touch upon. We seem to think that having the fastest computer as measured by the Linpac benchmark is somehow important or even remotely associated with our scientific prowess.
Here is the key idea, the important thing with computers is modeling & simulation done with it. Modeling & simulation depends upon a whole bunch of science that is being poorly supported and receiving no emphasis in our program. This includes experiment, theory, applied math, .. Our program for all of its incompetence is still almost completely focused on computer hardware. The real key is whether the Chinese are investing appropriately in all the other things that actually provide real value in modeling & simulation. If they are then we have truly lost. If they just have this computer, its really not a big deal.
Bill
Well said, Bill. In addition, very few applications need a supercomputer when you can buy a workstation with 36 cores for $15K. Some computing challenge records are even being set with workstation class machines, like the most digits of pi.
Just to be clear, I am not saying we don't need supercomputers, we do. We absolutely need fast capable supercomputers. We never have enough computing power, but chasing this power entails massive opportunity costs.
What we do need is a balanced program that approaches supercomputing in a holistically intelligent manner. Many important aspects of high performance computer receive far too little support, and the benchmark-chasing hardware focus (along with industry pork) receives far too much.
What we do NOT need is virtually unusable computers chasing meaningless benchmarks. We need powerful, usable computers that perform on the applications we buy them for.
HPC - getting the wrong answers faster and faster
Just having the highest petaflop number on its own is not a science feat, it is a procurement feat. Think Roadrunner, which briefly gave LANL and NNSA an excuse to brag, but in the end produced no useful science. But the recent insufficient US investment in hardware could be a symptom of the general lack of funding for science. As Bill notes, then we truly lost.
And how long can the Chinese machine run between hardware or operating system failures ?
Face it, boys, the Chinese are systematically kicking our ass.
As long as it is more important to DOE that you get 4 signatures from management to take your work laptop home, we will not improve this situation.
I agree that the number of petaflops is not the most important measure (if your code has bugs it is irrelevant), however I find it a troubling sign.
30 years ago, high performance computing was done on supercomputers. 20 years ago on clusters of PC-class processors. 10 years ago, bigger clusters of PC-class and special purpose processors. Now we can each have PCs with dozens of cores on our own desks, the fastest of which are 100s of times faster than the supercomputers of 30 years ago - and the scientists have the whole machine to themselves. These PCs have many times more memory and storage than those older supercomputers too. No timesharing needed, and the cost per computation is essentially free.
A smaller and smaller fraction of our computing must be done on supercomputers. That's dead obvious isn't it? Consider that a run of the mill smartphone has more processing power than the computers used to design the current stockpile.
Think of it, how can anyone put a price tag on bragging rights?
Who cares if the platform is incompatible with the codes on hand, get that LINPACK number up, up, up.
I agree that today's supercomputer will not win next years contest. However, to just ignore them in favor of multicore desktops seems a bit simple. While supercomputers are achieving higher performance, at the same time the calculations have become more demanding. What was impossible in lattice QCD or molecular dynamics only a few years ago, has become routine today. And those calculations cannot run on multicore desktops.
I don't think anybody would invest a lot of money to develop supercomputers just to have bragging rights.
So is this site in any way formally associated with Bechtel, LLNS, LLNL Investigation, DOE, DHS,NNSA or any other of those colorful multi letter organizations? Just curious because it seems that there is no definitive disclaimer other than a blog to blow the whistle or identify other perceptive misdeeds.But, is that so? Do we track IP addresses? Is anonymous really anonymous? If not, it would be nice to say so up front. What say you?
Lattice QCD ran on clusters of PC processors a decade ago. The machines used were much faster than a multi-core PC is today, granted, but scientists rarely command the whole cluster full-time to do their calculations. Scientists only get a fraction of the cluster's processors and a fraction of the cluster's time as those machines are time-shared between many different calculations. Most scientists can afford only a small fraction of a cluster so a modern desktop with dozens of cores can often be a competitive option in wall-clock time and always a far, far cheaper option in the total cost of calculation.
If you buy 3 Chinese super computers, you get egg roll.
Getting the biggest baddest number cruncher does not necessarily translate to a more productive outcome.
LLNL went for the CDC STAR which was a disaster. The hardware was flakey as was the software we developed. At then end of its life it was productive but was only about twice as fast as the CDC 7600. But we learned how to do vector coding and when the CRAY came into town that painful learning experience paid dividends.
The IBM Purple was a logical extension from the White, code movement was relatively painless. At the same time that Purple came into town, IBM delivered the Blue Gene/L machine. Quite a different technology especially concerning I/O. But it was scalding fast. The machine was supposed to be used in the unclassified world but its raw processing power was so enticing that the Weapons group finally grabbed it. In this instance the required code changes were proven to be worthwhile and we shifted technology from the Blue/White/Purple to the Gene/L line of computers.
By chance, Blue Gene/L grabbed the number 1 rating for several years, lost it and then regained it when there was an expansion of the machine which in my view was done mainly to regain the No. 1 rating again.
The top ranking is bragging rights, eye candy or penis envy. Take your pick. Unfortunately that's what motivates those in Washington D.C. and they own the checkbook.
Lattice QCD ran on clusters of PC processors a decade ago.
Absolutely correct, and Eniac did multiplication during WW2.
That is not the point. The point I was trying to make is that while the computers advance in power, so do the complexities of the problem. QCD a decade ago, had not the same power as today. In MD the spacing has become much smaller.
"Scientists only get a fraction of the cluster's processors and a fraction of the cluster's time as those machines are time-shared between many different calculations."
I fail to see this as an argument against super computers. Are you saying that the rest of the time, when scientists do not "get their fraction" it is management which gets it? :)
The discussion gets to a number of misconceptions and inconsistencies that the field of supercomputing. The biggest issue is the disconnect between the needs of science and engineering and the success of supercomputing. The success of the supercomputing programs is tied to being able to put an American machine at the top of the list. Increasingly success at having the top computer on the increasingly useless Top500 list is completely at odds with acquiring machines useful for conducting science.
The science and engineering needs are varied all the way from QCD, MD and DNS to climate modeling and integrated weapons calculations. The pure science needs of QCD, MD and DNS are better met by the machines being built today, but even there the machines we buy to top the computing list are quite suboptimal for pure science. The degree of suboptimality for running our big integrated calculations has become absolutely massive over time and the gap is only growing larger with each passing year. Worse yet the execution of the exascale program is acting to make this worse, not better.
We then increase the damaging execution of the supercomputing program is the systematic hollowing out of the science, and engineering content from our programs. We are systematically diminishing our efforts in experimentation, theory, modeling, and mathematics despite their greater importance and impact on the entire enterprise.
We need supercomputing to be a fully complimentary part of science. Instead we have created supercomputing as a prop and marketing stunt. There is a certain political correctness about how it contributes to our national security, and the increasingly compliant Labs offer no resistance to the misuse of the taxpayer money. The current programs are ineffective and poorly executed (as Ron originally stated), and do a poor job of providing the sorts of capability claimed.
The biggest issue is the death of Moore's law and our impending failure to produce the results promised. Rather than reform our programs to achieve real benefits for science and national security, we will see a catastrophic failure. This will be viewed through the usual lens of scandal. It is totally foreseeable and predictable. It would be advisable to fix this before disaster, but my guess is we don't have the intellect or leadership to pull this off.
There is no doubt in my mind that the Chinese goal (and apparent achievement) of getting to the top of the Supercomputer rankings is to draw the best international computing and engineering talent to work in Chinese institutions. How many of our children are going into science and/or engineering these days ? And if they do, within a decade I'm guessing that they will be working in China. Folks, the opportunities are diminishing in the US and EU, and within a decade (or two, on the outside), Silicon Valley will close down. Apple, Google, Intel, HP, etc... will be moving entire operations to China (all fab and all science/engineering). All US Aerospace will be consolidated under Lockheed-Martin (including Sandia as a subsidiary), all weapons work to LANL, and LLNL closed down when NIF funding dries up.
Oh, and what about the big Boeing plant being built in China ? Trump has pounded on that one.
Say you have 1000 scientists that need access to high performance computers. You could buy 1000 high-end 36 core PCs for about $17 million. That's 10% of the cost of one Trinity, and that's only counting the $174 million destined for Cray.
Since those scientists need some sort of desktop anyway, these fast desktop machines serve that purpose too.
Now, of those 1000 scientists, it's likely that such a fast desktop dedicated to a single scientist would meet the needs of most - very few scientists run QCD calculations, very few run molecular dynamics, very few run Global Climate models. It's actually dishonest to claim those examples drive the general need for ever faster supercomputers because there are already dedicated supercomputers for many of the few scientists who do run those types of calculations.
Say those desktops meet the needs of half. That's a debatable fraction but it's probably conservative. You still have 90% of the cost of Trinity left over to make a supercomputer that could better serve the ~50% of the scientists that really need it. With this workstation option, you would obviate the need to port many codes over and over to every generation of unique supercomputer, you would offload half (or more) of all the users and the need to train them on each generation of supercomputer, and you would offload a large fraction of the CPU cycles to be run on the supercomputer. Knowing what codes really need to be run on the supercomputer, you could even end up with a supercomputer architecture that better suits the real needs too.
I chose Trinity because it's a good example of the architecture not meeting anyone's real need - it needs to be everything to every unknown code whether or not that code can run on a workstation, but that can't be designed. In the end, it's just another hammer looking for a nail.
Those who have been caught up in the insane teraflops race haven't given adequate consideration to how fast modern workstations have become.
Big computing and and big data, that is the only game in town. It is easy to follow, it is simple story and it sells and that is all that matters at the end of the day. That labs are all about costumers and we need to sell something to the costumer, so the labs are ultimately about selling, not making, not doing, only selling, and big computers and lasers sell. Be essential! No salesmen, no sales, no sales means no labs, so get out their and start hustling for sales, a potential costumer could be anywhere, a friend, a relative, the sales clerk and the next shop your go to, the random person next to you on a flight. We live in a supervuca world so you need to think outside of the box to make the big sale.
July 4, 2016 at 5:28 PM
you need to think outside of the box to make the big sale.
July 4, 2016 at 5:28 PM
July 4, 2016 at 10:25 PM
What a sad, pathetic, colorless, uninteresting world you live in.
What a sad, pathetic, colorless, uninteresting world you live in.
July 5, 2016 at 10:03 AM
You have it utterly and completely wrong, it is just the opposite, one needs to be vibrant, filled with rainbows of unending color to survive in the unreal crazy SUPERVUCA world. In order for the labs to have a viable sales model which moves product it need to embrace the super vuca world or their lunch will be eaten by the young people who live in the NOW, not the future or the past, but the NOW. You Sir are the one who is sad, pathetic and static and NOW is not your time.
Slow and steady will not win this race or move product. The job of the leader is to reframe VUCA, to get above it, and the conditions for this are in place. superVUCA!
Vibrant – Life is loaded with hope, fun, mystery, disdain, and possibility.
Unreal – New ideas trump old wealth, scale and knowledge. Unreal things become real, real things become fiction. Anything goes but only a few thing sell and remember the big idea is dead.
Crazy – The crazies always win through. Who here is crazy? I know I'm crazy. And if you were all these things, then you'd just attack me right now, so some of you are still crazy. This thing doesn't want to show itself, it wants to hide inside an imitation. It'll fight if it has to, but it's vulnerable out in the open. If it takes us over, then it has no more enemies, nobody left to kill it. And then it's won. We cannot let it win.
Astounding – The Americans are astounding themselves – and the Middle East – with shale energy. There is an energy revolution going on which will reshape world economics. In what ways will the labs astound the world in future? What is astounding today may be mundane tomorrow or it could even be more astounding tomorro. Crazy it is, that is because the world is not only unreal but it is also vibrant!
One word is missing but it is vibrant and vital, V-squared as they say, and that is "essential", be "essential" or get out of the way, sell or be sold. This is the world, it is the inverse of sad, pathetic and colorless it is SUPERVUCA!
In order for the labs to have a viable sales model which moves product...
July 5, 2016 at 7:13 PM
Exactly what "product" do you think the labs are trying to "move" and to where? If you admit to knowing you are crazy, you probably are capable of recognizing sanity when you see it. I pray you don't own firearms, and can only hope DHS has you on one of their lists.
in case you have not noticed reality has arrived in the NNSA complex and will be turned up a new notches in the next few years.
July 6, 2016 at 7:39 PM
It's always a few years away, isn't it? So is the end of your life. Are you sitting around waiting for that too?
July 4, 2016 at 9:46 PM makes some salient observations that buying 1000 scientists 1000 high end workstations will save a considerable amount of money. There is point that is overlooked in this train of thought. Quite often the very high end DOE machines come in and run on the classified side. The calculations and simulations for nuclear work has driven high end computing since the inceptions of the weapons labs. So lets put in those 1000 high end workstations. Because they'll need disk drives they will need to put into vaulted rooms. The price tag just jumped. You'll also need offline data storage, once again that system will require VTR housing. You'll need qualified sys admins to keep the machines running and to maintain security requirements, something apparently not done at the State Department.
As is often stated - the devil is in the details.
There is no way that nuclear simulations drive the need for next-generation, world-record types of supercomputers that can achieve many 10s of petaflops. The current stockpile was designed on a supercomputer with less processing power than a smart phone. While the simulations being done now require far more computing power than that, for sure, existing classified computer clusters are already fast enough to handle the current generation of classified simulations.
Besides, you *can* run classified on a PC behind the fence that's not in a vault during working hours. You just need the PC to have only volatile memory and removable hard drives which have to be locked up in a safe during non-working hours. It's done in many places already - I routinely did this at a previous job. The only downside is that after-hours processing does require the machine to be in a vault-type room.
Off-line data storage that resides in a VTR isn't a problem, you can buy an external 8 terabyte RAID drive for each machine for about $400. That's more storage than the whole world had in the early 1980s. Want more storage than that? Buy two. Want more than the two RAID backups? Buy two. One large VTR can hold all the storage needed for an entire division.
We already have sys-admins to keep PCs, classified PCs, and thin-client machines running. They already know how to do this. That's not an issue.
The current high-end PCs exceed one Teraflops. That's 3 times faster than the world's fastest supercomputer was in 1996, just 20 years ago. Seems to me that if cost and efficiency were factors, the DOE would have figured out how to integrate these fast PCs into their computing environment long ago. Other organizations have figured it out, why can't DOE?
If you like living in the world of CREM and audits of same then good for you. I would venture to guess that most people do not. And lord help you if that CREM gets misplaced or has an accounting error - this type of thing leads to contract cancellation.
Cost and efficiency are nice goals but with insane security rules and the penalties associated with those rules the management pushes to reduce the possibility of errors.
Near the end of my career my division was pushing on me to give up my repo in fear that I might leave it unlocked and they would get the black eye. And this was not a repo with CREM.
I have had CREM, I was a manager of a VTR and given my druthers, I say let DOE get bragging rights on a super computer I use with a diskless workstation and let someone else handle the heavy lifting on the security side of the matter.
What CREM? Wasn't that requirement eliminated?
I do agree that unnecessary security requirements drive us to make bad decisions in security, personnel, operations, AND buying ludicrously expensive unneeded record-setting supercomputers.
Post a Comment