By Ron Minnich
The top machine on top500.org is now Chinese, 5 times faster than the nearest DOE machine. That's not set to change for at least two years: http://nextbigfuture. com/2016/06/ibm-promises-200- petaflop-supercomputer.html?, by which time 200 PF will probably be another also-ran.
The top machine on top500.org is now Chinese, 5 times faster than the nearest DOE machine. That's not set to change for at least two years: http://nextbigfuture.
The Chinese system is also all non-US hardware. Somebody in the US Gov't thought an embargo might slow things down: http://www.theregister. co.uk/2015/04/10/us_intel_ china_ban/, but it had no effect; China built their own. The author of the article pointed out: "If the ERC honestly thinks that the ban will put a significant dent in China's supercomputing plans, it is either very foolish or dangerously misinformed." Oops, that only took a year. I guess they were both foolish AND misinformed. The main effect of the embargo was to ensure Chinese independence of US hardware.
Of course, the US is responding. We've got Meetings, Programs, Timelines, and Plans. What we don't seem to have is a chance of catching up. The first Meetings began in 2007, and a Timeline was created almost 10 years later, for a system to be delivered in 2023, only 3 years behind China's projected exascale delivery in 2020.
Does anyone see any way to get DOE HPC back on the successful track of the 1994-2008 era? Or does it make more sense to call it a day and start buying HPC systems from China?
Comments
I read this post with great interest and mixed feelings. In a nutshell the sky isn't falling,... yet. I agree that our current program is a tepid and poorly executed mess of meetings along with nearly zero intellectual depth. The core of the problem is far deeper than anything you touch upon. We seem to think that having the fastest computer as measured by the Linpac benchmark is somehow important or even remotely associated with our scientific prowess.
Here is the key idea, the important thing with computers is modeling & simulation done with it. Modeling & simulation depends upon a whole bunch of science that is being poorly supported and receiving no emphasis in our program. This includes experiment, theory, applied math, .. Our program for all of its incompetence is still almost completely focused on computer hardware. The real key is whether the Chinese are investing appropriately in all the other things that actually provide real value in modeling & simulation. If they are then we have truly lost. If they just have this computer, its really not a big deal.
Bill
What we do need is a balanced program that approaches supercomputing in a holistically intelligent manner. Many important aspects of high performance computer receive far too little support, and the benchmark-chasing hardware focus (along with industry pork) receives far too much.
What we do NOT need is virtually unusable computers chasing meaningless benchmarks. We need powerful, usable computers that perform on the applications we buy them for.
I agree that the number of petaflops is not the most important measure (if your code has bugs it is irrelevant), however I find it a troubling sign.
A smaller and smaller fraction of our computing must be done on supercomputers. That's dead obvious isn't it? Consider that a run of the mill smartphone has more processing power than the computers used to design the current stockpile.
Who cares if the platform is incompatible with the codes on hand, get that LINPACK number up, up, up.
I don't think anybody would invest a lot of money to develop supercomputers just to have bragging rights.
Getting the biggest baddest number cruncher does not necessarily translate to a more productive outcome.
LLNL went for the CDC STAR which was a disaster. The hardware was flakey as was the software we developed. At then end of its life it was productive but was only about twice as fast as the CDC 7600. But we learned how to do vector coding and when the CRAY came into town that painful learning experience paid dividends.
The IBM Purple was a logical extension from the White, code movement was relatively painless. At the same time that Purple came into town, IBM delivered the Blue Gene/L machine. Quite a different technology especially concerning I/O. But it was scalding fast. The machine was supposed to be used in the unclassified world but its raw processing power was so enticing that the Weapons group finally grabbed it. In this instance the required code changes were proven to be worthwhile and we shifted technology from the Blue/White/Purple to the Gene/L line of computers.
By chance, Blue Gene/L grabbed the number 1 rating for several years, lost it and then regained it when there was an expansion of the machine which in my view was done mainly to regain the No. 1 rating again.
The top ranking is bragging rights, eye candy or penis envy. Take your pick. Unfortunately that's what motivates those in Washington D.C. and they own the checkbook.
Absolutely correct, and Eniac did multiplication during WW2.
That is not the point. The point I was trying to make is that while the computers advance in power, so do the complexities of the problem. QCD a decade ago, had not the same power as today. In MD the spacing has become much smaller.
"Scientists only get a fraction of the cluster's processors and a fraction of the cluster's time as those machines are time-shared between many different calculations."
I fail to see this as an argument against super computers. Are you saying that the rest of the time, when scientists do not "get their fraction" it is management which gets it? :)
The science and engineering needs are varied all the way from QCD, MD and DNS to climate modeling and integrated weapons calculations. The pure science needs of QCD, MD and DNS are better met by the machines being built today, but even there the machines we buy to top the computing list are quite suboptimal for pure science. The degree of suboptimality for running our big integrated calculations has become absolutely massive over time and the gap is only growing larger with each passing year. Worse yet the execution of the exascale program is acting to make this worse, not better.
We then increase the damaging execution of the supercomputing program is the systematic hollowing out of the science, and engineering content from our programs. We are systematically diminishing our efforts in experimentation, theory, modeling, and mathematics despite their greater importance and impact on the entire enterprise.
We need supercomputing to be a fully complimentary part of science. Instead we have created supercomputing as a prop and marketing stunt. There is a certain political correctness about how it contributes to our national security, and the increasingly compliant Labs offer no resistance to the misuse of the taxpayer money. The current programs are ineffective and poorly executed (as Ron originally stated), and do a poor job of providing the sorts of capability claimed.
The biggest issue is the death of Moore's law and our impending failure to produce the results promised. Rather than reform our programs to achieve real benefits for science and national security, we will see a catastrophic failure. This will be viewed through the usual lens of scandal. It is totally foreseeable and predictable. It would be advisable to fix this before disaster, but my guess is we don't have the intellect or leadership to pull this off.
Since those scientists need some sort of desktop anyway, these fast desktop machines serve that purpose too.
Now, of those 1000 scientists, it's likely that such a fast desktop dedicated to a single scientist would meet the needs of most - very few scientists run QCD calculations, very few run molecular dynamics, very few run Global Climate models. It's actually dishonest to claim those examples drive the general need for ever faster supercomputers because there are already dedicated supercomputers for many of the few scientists who do run those types of calculations.
Say those desktops meet the needs of half. That's a debatable fraction but it's probably conservative. You still have 90% of the cost of Trinity left over to make a supercomputer that could better serve the ~50% of the scientists that really need it. With this workstation option, you would obviate the need to port many codes over and over to every generation of unique supercomputer, you would offload half (or more) of all the users and the need to train them on each generation of supercomputer, and you would offload a large fraction of the CPU cycles to be run on the supercomputer. Knowing what codes really need to be run on the supercomputer, you could even end up with a supercomputer architecture that better suits the real needs too.
I chose Trinity because it's a good example of the architecture not meeting anyone's real need - it needs to be everything to every unknown code whether or not that code can run on a workstation, but that can't be designed. In the end, it's just another hammer looking for a nail.
Those who have been caught up in the insane teraflops race haven't given adequate consideration to how fast modern workstations have become.
July 4, 2016 at 5:28 PM
July 4, 2016 at 5:28 PM
July 4, 2016 at 10:25 PM
What a sad, pathetic, colorless, uninteresting world you live in.
What a sad, pathetic, colorless, uninteresting world you live in.
July 5, 2016 at 10:03 AM
You have it utterly and completely wrong, it is just the opposite, one needs to be vibrant, filled with rainbows of unending color to survive in the unreal crazy SUPERVUCA world. In order for the labs to have a viable sales model which moves product it need to embrace the super vuca world or their lunch will be eaten by the young people who live in the NOW, not the future or the past, but the NOW. You Sir are the one who is sad, pathetic and static and NOW is not your time.
Slow and steady will not win this race or move product. The job of the leader is to reframe VUCA, to get above it, and the conditions for this are in place. superVUCA!
Vibrant – Life is loaded with hope, fun, mystery, disdain, and possibility.
Unreal – New ideas trump old wealth, scale and knowledge. Unreal things become real, real things become fiction. Anything goes but only a few thing sell and remember the big idea is dead.
Crazy – The crazies always win through. Who here is crazy? I know I'm crazy. And if you were all these things, then you'd just attack me right now, so some of you are still crazy. This thing doesn't want to show itself, it wants to hide inside an imitation. It'll fight if it has to, but it's vulnerable out in the open. If it takes us over, then it has no more enemies, nobody left to kill it. And then it's won. We cannot let it win.
Astounding – The Americans are astounding themselves – and the Middle East – with shale energy. There is an energy revolution going on which will reshape world economics. In what ways will the labs astound the world in future? What is astounding today may be mundane tomorrow or it could even be more astounding tomorro. Crazy it is, that is because the world is not only unreal but it is also vibrant!
One word is missing but it is vibrant and vital, V-squared as they say, and that is "essential", be "essential" or get out of the way, sell or be sold. This is the world, it is the inverse of sad, pathetic and colorless it is SUPERVUCA!
July 5, 2016 at 7:13 PM
Exactly what "product" do you think the labs are trying to "move" and to where? If you admit to knowing you are crazy, you probably are capable of recognizing sanity when you see it. I pray you don't own firearms, and can only hope DHS has you on one of their lists.
July 6, 2016 at 7:39 PM
It's always a few years away, isn't it? So is the end of your life. Are you sitting around waiting for that too?
As is often stated - the devil is in the details.
There is no way that nuclear simulations drive the need for next-generation, world-record types of supercomputers that can achieve many 10s of petaflops. The current stockpile was designed on a supercomputer with less processing power than a smart phone. While the simulations being done now require far more computing power than that, for sure, existing classified computer clusters are already fast enough to handle the current generation of classified simulations.
Besides, you *can* run classified on a PC behind the fence that's not in a vault during working hours. You just need the PC to have only volatile memory and removable hard drives which have to be locked up in a safe during non-working hours. It's done in many places already - I routinely did this at a previous job. The only downside is that after-hours processing does require the machine to be in a vault-type room.
Off-line data storage that resides in a VTR isn't a problem, you can buy an external 8 terabyte RAID drive for each machine for about $400. That's more storage than the whole world had in the early 1980s. Want more storage than that? Buy two. Want more than the two RAID backups? Buy two. One large VTR can hold all the storage needed for an entire division.
We already have sys-admins to keep PCs, classified PCs, and thin-client machines running. They already know how to do this. That's not an issue.
The current high-end PCs exceed one Teraflops. That's 3 times faster than the world's fastest supercomputer was in 1996, just 20 years ago. Seems to me that if cost and efficiency were factors, the DOE would have figured out how to integrate these fast PCs into their computing environment long ago. Other organizations have figured it out, why can't DOE?
Cost and efficiency are nice goals but with insane security rules and the penalties associated with those rules the management pushes to reduce the possibility of errors.
Near the end of my career my division was pushing on me to give up my repo in fear that I might leave it unlocked and they would get the black eye. And this was not a repo with CREM.
I have had CREM, I was a manager of a VTR and given my druthers, I say let DOE get bragging rights on a super computer I use with a diskless workstation and let someone else handle the heavy lifting on the security side of the matter.
I do agree that unnecessary security requirements drive us to make bad decisions in security, personnel, operations, AND buying ludicrously expensive unneeded record-setting supercomputers.