Comments Locked

77 Comments

Back to Article

  • ads295 - Tuesday, May 7, 2019 - link

    Happy for this. AMD gets a nice paycheck for developing cutting edge hardware. however at this scale I hope they're nailed to the wall by the DoE in getting their power consumption in control...
  • sgeocla - Tuesday, May 7, 2019 - link

    AMD will be using TSMC 6nm or 5nm by then while Intel is still unsure of 10nm for datacenter.
    That's why there are no Power Consumption usage figures for Aurora. At 30% less power than Frontier the Intel supercomputer will probably use 30%-50% more power.
  • sgeocla - Tuesday, May 7, 2019 - link

    *at 30% less performance than Frontier the Intel supercomputer will probably use 30%-50% more power.
  • Irata - Tuesday, May 7, 2019 - link

    According to a Cray press release, Aurora will use "more than 200 Shasta Cabinets" (vs. more than 100 for Frontier)

    This does not have to mean that it uses twice the power though, but if it's the same cabinet as used for Frontier, it would mean twice the space is needed.
  • HStewart - Tuesday, May 7, 2019 - link

    Just FYI, More than 100 can be more than 100 depending on how many it actually is. I don't believe these numbers until actually out
  • Irata - Tuesday, May 7, 2019 - link

    Yes, more than hundred be any number between 101 and 199, just as more than 200 can be anywhere between 200 and 299.

    So in theory Frontier could have 199 cabinets and Aurora 201 but that is highly unlikely.
  • Jimbo Jones - Tuesday, May 7, 2019 - link

    Eh? What are you one about? AMD already has far better power consumption and efficiency than Intel ...

    Peruse this page to see how much power Intel 8th and 9th gen suck compared to Ryzen under various loads: https://www.tomshardware.com/reviews/intel-core-i9...

    Its a fancy little trick that Intel managed to fool everyone with ... their "95w" 9900k can easily pull over 200 watts ... That's how you make a CPU more efficient these days, you use a smaller TDP number in the marketing. lol. ;)
  • BigMamaInHouse - Wednesday, May 8, 2019 - link

    Look how tables turned, 2 most powerfull supercomputers are based on AMD(CPU/GPU) and Intel(CPU/GPU), don't you wonder why no Nvidia's GPU's?, IMO he gonna sell his leather jacket soon.
  • jtd871 - Tuesday, May 7, 2019 - link

    "All told, Frontier should be able to deliver over 7x the performance of Summit, and is expected to be the fastest supercomputer in the world once it’s activated."

    "Which to put things in context, this is over twice the power consumption of the 13MW Summit. So while Frontier is a significantly faster system than the supercomputer it replaces, Cray, AMD, and the US DOE are all feeling the pinch of Moore’s Law slowing down, as power efficiency gains get harder to achieve."

    1) from Wikipedia: "Moore's law is the observation that the number of transistors in a dense integrated circuit doubles about every two years." Moore's Law has nothing explicitly to say about performance or power efficiency.
    2) 7x the performance for ~2.5x the power consumption is a pretty good ratio, in my book.
  • Ryan Smith - Tuesday, May 7, 2019 - link

    1) Thanks. I meant to put Dennard scaling there

    2) Compared to 7x the performance for little-to-no increase power consumption, it's unfortunately not. For reference, Summit is 7.4x faster than Titan for ~1.5x the power consumption.
  • rpg1966 - Tuesday, May 7, 2019 - link

    Do you mean to say that an improvement in perf/watt of almost 3 isn't good? It's not as good as the Titan>Summit increase of nearly 5, but surely that doesn't make it bad?
  • deil - Tuesday, May 7, 2019 - link

    still, 30 MW of power is already in a painful area. We cannot increase power indefinitely and keeping those supercomputers in check will not get any easier....
  • zmatt - Tuesday, May 7, 2019 - link

    Its a good ratio, but the point of the article is that its slipping compared to the perf/watt of previous builds.
  • Yojimbo - Thursday, May 9, 2019 - link

    It's bad because the trend is to get less and less of a performance boost with each generation.
  • Irata - Tuesday, May 7, 2019 - link

    Ryan: Do you have any information on the power consumption expected for Aurora ?

    All I could find on this was the mention in a Cray press release from March 18th, 2019 stating:
    "The Argonne system, named Aurora, will be comprised of more than 200 Shasta cabinets".

    So would this mean ~60 MW for Aurora based on your calculation for Frontier (number of cabinets times rating per cabinet) ?
  • Ryan Smith - Tuesday, May 7, 2019 - link

    No, I'm afraid I don't have any information on Aurora's power. The 300KW/cabinet figure was specifically for Frontier.
  • Yojimbo - Thursday, May 9, 2019 - link

    Costs keep rising, too. Titan was $97 million, Summit was $214 million amid a high RAM price environment (doesn't affect the cost of the system so much, but it affects the specs of the system, possibly including the performance), and Frontier will be about $500 million.
  • del42sa - Tuesday, May 7, 2019 - link

    It like with Vega 4x power efficiency :-)

    http://i.imgur.com/WAeuoHz.jpg
  • jtd871 - Tuesday, May 7, 2019 - link

    I believe you meant to type "Shasta" rather than "Sashta".
  • PeachNCream - Tuesday, May 7, 2019 - link

    Interesting to see AMD taking the lead on this one rather than Intel, but unsurprising given the state of the competitive landscape.
  • HStewart - Tuesday, May 7, 2019 - link

    This is PR move, Intel had Aurora planned way before this
  • SarahKerrigan - Tuesday, May 7, 2019 - link

    You realize this was a competitive procurement process, right? This isn't "AMD decides to build an exascale system", it's "DoE selects an AMD/Cray system for their larger exascale machine."
  • HStewart - Tuesday, May 7, 2019 - link

    But there is also Intel/Cray machine already planned for different department - it just DoE making sure it has options. It is not intended as replacement for Intel. My concern is that it sounds more like a PR move from AMD to say me too.
  • Irata - Tuesday, May 7, 2019 - link

    How is that a PR move - like Sarah stated, the DoE selected the Cray / AMD based system.

    Also, Aurora was first announced around 4 years ago, but using Xeon-Phi with 180 Petaflops.
    It was later redesigned using Xeon CPU + XE GPU. The announcement for this was in March of this year, so not too long before Frontier was announced.
  • HStewart - Tuesday, May 7, 2019 - link

    But Cray has a future Intel Xe system from Cray also and AMD wanted part of action to boost their image in system place. AMD want to take some of claim from Intel - this is not a replacement for Intel system - just another system for Cray to evaluated. Just because it announce of AMD comes after Intel announcement does not mean Cray switch systems. Also don't expect Intel will cease to continue with there efforts.
  • SarahKerrigan - Tuesday, May 7, 2019 - link

    So when Intel bids on a large-system contract, it's not a PR move, but when AMD does it, it is?

    It really bothers you that the US's largest supercomputer installation is going to be an AMD system, doesn't it? Folks like you make me feel warm and fuzzy about my company's migration from Xeon and Itanium to Power.
  • HStewart - Tuesday, May 7, 2019 - link

    Please note from chart AMD system possibly replace IBM/NVidia system or at least supplement it system because it same laboratory - but Intel Xe system has a different laboratory so likely they have different purposes.
  • Korguz - Tuesday, May 7, 2019 - link

    SarahKerrigan, Irata, any one else that replies to HStewart.. this person is the biggest intel fan on this site, as sa666666 just below me has said.. he will ALWAYS bash amd any way he can.. while praising and making intel look good no matter what.. ANY time amd has good news.. or a design win of some sorts.. it upsets HStewart.. because Intel isnt involved... just treat him as the intel fanboy that he is, as no matter what you will say.. he will ALWAYS spin it, switch it around.. to make intel look better...

    and HStewart.. its architecture NOT architexture... the X should be a C
  • sa666666 - Tuesday, May 7, 2019 - link

    *Never* underestimate a shill's ability to twist the facts to suit their agenda. If AMD were to come out with 10GHz CPUs tomorrow and provide them free of charge to all non-profits organizations on the planet, you'd still find a way to say they were deficient in some way and that Intel is still hot shit.
  • Yojimbo - Thursday, May 9, 2019 - link

    Intel was not eligible for Frontier because they got Aurora. It is still very interesting to see AMD have both the CPU and GPU on the system. They aren't taking the lead, though. Cray is the lead contractor. In Aurora, though, I believe Intel is the lead contractor. For Perlmutter, Cray is the leader contractor. Cray is involved in a lot of the soon to be built supercomputers. One has to imagine that IBM/NVIDIA are favored to win El Capitan.
  • Kidnova - Tuesday, May 7, 2019 - link

    But can it run Crisis?
  • PeachNCream - Tuesday, May 7, 2019 - link

    Yes, but only on the lowest settings.
  • Cullinaire - Tuesday, May 7, 2019 - link

    Yes, but only the player portion, North Korea needs another supercomputer to simulate.
  • GreenReaper - Saturday, May 11, 2019 - link

    Yes, but the simulation is so powerful it convinces Trump to start a real nuclear holocaust.
  • beginner99 - Tuesday, May 7, 2019 - link

    Is this pure coincidence that each System/hardware maker get's a piece of the cake one after another or is it intentional? First IBM/NV get a piece, then intel and now AMD.
  • DanNeely - Tuesday, May 7, 2019 - link

    IBM/NV winning a previous generation computer isn't relevant.

    Intel and AMD each winning only one of the two 2021 DOE awards isn't a coincidence because diversity requirements meant that they intentionally picked different hardware for the two. Assuming IBM+NVidia made a bid for one or both of them, the DOE decided it didn't like their proposal as much as the other two.
  • ABR - Wednesday, May 8, 2019 - link

    In other words they would have preferred to go with two AMD systems but were forced to pull in Intel for multisourcing reasons. :-} Seriously, this is a clear statement that the performance crown, in practical, energy-considering contexts, is already AMD's.
  • Yojimbo - Sunday, May 12, 2019 - link

    No. Intel's chips were chosen for Aurora21 first, and then they were ineligible for Frontier because of it.

    This new system also will be delivered a year after Intel's. It costs more as well. And you can't just compare the FLOPS of the system to decide which one is most effective. The systems architectures are a bit different and so they will probably each be good at different things.
  • Irata - Tuesday, May 14, 2019 - link

    Both Aurora and Frontier are scheduled for 2021. This is not to say that one will be ready before the other but not by one year.
    Yes, Frontier costs more than Aurora - $600 Million in 100+ Shasta cabinets for 1.5 ExaFlops vs. $500 Million in 200+ Shasta cabinets for 1 ExaFlop - but it is also quite a bit faster, at least measured in ExaFlops.
  • Yojimbo - Sunday, May 12, 2019 - link

    No, it's not coincidence. But it's also not necessary. The DOE has a procurement philosophy where they want at least two different architectures, and they have restrictions for certain projects that they can't be exactly the same as a certain other project. For example, from what I remember, an architecture based on Xe GPUs would not have been eligible to win Frontier because it was being used in Aurora21. I am sure the fact that AMD's CPUs and GPUs offer a third alternative was a factor in the decision.
  • peevee - Tuesday, May 7, 2019 - link

    AMD? If it is on their EPIC CPUs and not GPUs, it is not real 1.5exaFP. Unfortunately, on that architecture (same with other modern CPUs), peak performance while doing FMAs in AVX registers and real performance on big data which has to be retrieved from and put back to main memory, performance can differ by 3-4 orders of magnitude.
    On GPUs, multiplication of dense matrices can be done efficiently, but not sparse matrices in their usual compressed representation (and in most HPC applications, it is all about giant matrix multiplications).
  • amrnuke - Tuesday, May 7, 2019 - link

    Good thing you brought this up. Can someone get ORNL on the line and let them know they made a bad decision based on peevee's speculation about AVX performance of a microarchitecture that is still under development with no details?

    (I'm fairly certain that when tasked with spending hundreds of millions of dollars on a mission-critical supercomputer that they themselves will be using, they've done the math already.)
  • peevee - Wednesday, May 8, 2019 - link

    amrnune, if you thought less about your ad hominem attacks and more about the subject, you would understand that I am right about AVX peak vs real performance, regardless MICROarchitecture. The very basic Von Neumann-derived architecture is the problem here.

    In terms (scale) of lamp computers of Von Neumann times, in a modern computer using Intel/AMD/ARM etc, ALU is in New York and memory is all over Europe. Is it easier to understand this way?
  • Irata - Wednesday, May 8, 2019 - link

    It's both on their CPU and GPU - afaik it's a based on 1S with one CPU + 4 GPU .
  • peevee - Wednesday, May 8, 2019 - link

    So main compute is GPU, as I expected. Of course GPUs cannot be deployed alone as they are not general purpose processors.
  • mode_13h - Wednesday, May 8, 2019 - link

    > On GPUs, multiplication of dense matrices can be done efficiently, but not sparse matrices in their usual compressed representation

    It would be interesting if they use the texture units to facilitate this.

    I expect we'll also start seeing texture units handling on-the-fly decompression of neural network weights.
  • peevee - Wednesday, May 8, 2019 - link

    The word "compressed" means completely different thing when applied to sparse matrices vs textures. They just skip all (or most in some schemes) 0s and add a level of indirection through indices/offsets of non-zero elements of blocks of elements. That indirection is bad news for any kind of vector processors, including SMs in GPUs.

    I guess for simulations of nuclear blasts it does not matter that much as there should not be that many zeros there... but it is just a guess, my HPC tasks are different.
  • mode_13h - Thursday, May 9, 2019 - link

    You could put indirection and scatter/gather support down in the memory controllers. GDDR6 and HBM2 both have fairly narrow channel widths, presumably making them less dependent on large burst sizes for good efficiency.
  • HStewart - Tuesday, May 7, 2019 - link

    To me this sounds like AMD is playing the me too game. One thing is confusing is the vendor, Aurora platform is also Cray based and some of specs on cabinets are not mention.

    I would not doubt that the version of Aurora finally done will match or beat this one. In reality it probably already has.
  • Irata - Tuesday, May 7, 2019 - link

    According to Cray's press release, Aurora will use "over 200 Shasta cabinets" vs. "over 100" for Frontier.
  • HStewart - Tuesday, May 7, 2019 - link

    200 is over 100 cabinets - it also depends on what is in cabinets - it still too earlier to know how valid the #'s are.
  • Jleppard - Tuesday, May 7, 2019 - link

    Not really because everyone knows AMD can actually provide twice the cores per socket as Intel can. So the Intel system using twice the cabinet space adds up
  • HStewart - Tuesday, May 7, 2019 - link

    But power per core in AMD is significantly less than Intel and 32 vs 28 is not twice - that is using release technology, this is future tech and it anybody guess how many is actual is there.
  • Irata - Tuesday, May 7, 2019 - link

    Rome is about to be released and it is 64C/128T max.
  • HStewart - Tuesday, May 7, 2019 - link

    Yes and Intel has Sunny Cove too, you can't compare future AMD vs existing Intel/
  • fallaha56 - Tuesday, May 7, 2019 - link

    Er no you big shill lol

    AMD is shipping 64 cores now vs some fantasy future chip from Intel

    -or Intel’s ‘watercooling as standard’ 48 core 400W chip...watt a disaster
  • Irata - Tuesday, May 7, 2019 - link

    Rome is future as in Q3 this year. However, frontier may even use the next gen Epyc - same for Intel and Aurora, seeing when they will be deployed
  • Korguz - Tuesday, May 7, 2019 - link

    HStewart.. yes you can, and we are.. but YOU dont want to because your beloved intel, is not getting the design win.. AMD is.. there fore.. this is a LOSS for intel.. and a WIN for amd.. and you cant handle it being an intel fanboy... you HATE it when something like this happens.. and do EVERYTHING you can to try to put a positive spin on this toward intel... guess what HStewart.. intel has played the me too game too...
  • jospoortvliet - Tuesday, May 7, 2019 - link

    > Frontier (...) will become the second and most powerful of the US DOE’s two planned 2021 exascale systems

    In other words, they selected Intel for the slow one and AMD for the faster one they are building. The reason is obvious and also in the text: AMD can deliver a faster system with 100 cabinets than Intel with 200 and I bet power and cost of those 200 cabinets adds up to more than for those 100 with AMD. I bet that they would have gone for two AMD systems if they were not forced to pick different vendors.

    All of this is completely in line with what we see today from Intel vs AMD in the high end server market and in line with roadmaps: AMD delivers higher density and better performance per watt than Intel on many big scale work loads, with a better roadmap. So it should be no surprise to anyone in the industry.

    I'm sure Intel will get their act together and catch up but not before 2021.
  • Yojimbo - Thursday, May 9, 2019 - link

    It doesn't work that way. Aurora was not originally supposed to be an exascale system. It was renegotiated and pushed back. Intel was not eligible to win Frontier because they had Aurora. I believe that was part of the RFP (request for proposals). At least they were not eligible to win it with the same type of system they won Aurora with (Xe GPUs). That basically means it would have had to be a CPU-only system, which couldn't very well get to exascale in that time period for necessary price and budget constraints.

    AMD put in a bid and won. The DOE isn't playing politics with the way they hand out the winners to the bids, they are looking at the submitted proposals and selecting the systems they think give them the best value and follow their principles of procurement. I am guessing that AMD is building this system on a razor thin margin compared to Intel, however. I only say that because of the margins they get on their commercial CPUs and GPUs. NVIDIA will be between GPU generations at the time the DOE wants this system to be delivered. Their post-Volta data center chip will go into Perlmutter in 2020 and their generation after that probably wouldn't be available in time for the delivery schedule of Frontier. El Capitan will be delivered a year later and that will probably include NVIDIA's post-post-Volta data center GPU, assuming they manage to win the contract.
  • Yojimbo - Thursday, May 9, 2019 - link

    I meant "for the necessary power and budget constraints".
  • sa666666 - Tuesday, May 7, 2019 - link

    Of course, since Intel can do no wrong, and are _always_ best in _all_ situations. No exceptions. And AMD are _always_ crap. That's your outlook, right? Don't analyze anything at all. Just see Intel -> good, AMD -> bad. And never look deeper. Must be nice to live in a black and white world, devoid of all logic and requirement to actually think about anything.
  • Koenig168 - Tuesday, May 7, 2019 - link

    7X is quite an amazing performance jump. Summit is about 50% more powerful than Sunway TaihuLight, the previous leading supercomputer from 2016 which is in turn 50% more powerful than Tianhe-2A from 2013.
  • Kevin G - Tuesday, May 7, 2019 - link

    This is the one system that has the funding to really flesh out advanced packaging options IE massive interposers. Big thing would be placing numerous amounts of HBM, the massive IO controller, external fabric controller, lots of CPU dies and GPU dies into one modular package. Such a schema would likely show some platform based performance/watt gain because much of the platform need to loop chips together would simply disappear by going on-die. Things can scale up rather well but it then becomes a thermal density issue and anything resembling this would require liquid cooling.

    Intel could do that same with Xeon and Xe but AMD appears to have a head start based upon what they've announced and shipped thus far.
  • amrnuke - Tuesday, May 7, 2019 - link

    The cooling solution is liquid-based, thousands of gallons a minute. It's incredible to think about the scale of it!
  • Jleppard - Tuesday, May 7, 2019 - link

    YA Intel can use the paper launched card that they have not even manufactured as of yet. Wonder if that would be on 10nm to!
  • HStewart - Tuesday, May 7, 2019 - link

    Intel is based on Sunny Cove or higher which is 10nm and possibly lower. But keep in mind it not the process that is most important but architexture on the chip. What I really like about Sunny Cove is they added addition store store unit and now there is store/get on two separate parts - sound like twice the speed internal for read/store operations which is significant
  • Thunder 57 - Tuesday, May 7, 2019 - link

    "...and possibly lower". Hahaha that's a good one. "What I really like about Sunny Cove is..." that it is Intel. AMD is really bad and touched me as a child. Intel is good like rele good. They solve all the world's problems one at a time.
  • Kevin G - Tuesday, May 7, 2019 - link

    Intel has stated that they have fully decoupled their core designs from their manufacturing side. The 10 nm troubles have forced their hand on this. They have stated that the Sunny Cove core design is up on 10 nm and is ready for 7 nm when it arrives. What they haven't said but I suspect is that Sunny Cove has a 14 nm version as a fall back.
  • Kevin G - Tuesday, May 7, 2019 - link

    More importantly is that Sky Lake-SP introduced the on-die tile based coherent fabric. Scaling that upward with interposers/EMIB should be straight forward. Things like package power distribution becomes critical.

    If Xe could sit on the same tile based on-die fabric, Intel could pull off some magic here. While under performing, Intel's graphic have been rather feature complete and are coherent with the CPU side of things. That is the big feature which AMD is touting with this deal, just with performant parts.
  • mode_13h - Tuesday, May 7, 2019 - link

    AI workloads? So, I guess AMD might finally have tensor cores by 2021?
  • fallaha56 - Tuesday, May 7, 2019 - link

    Not necessarily

    In any case if nVidia tensor cores were ‘so good’ they would have been selected again...
  • mode_13h - Wednesday, May 8, 2019 - link

    Actuallly, I think the DoE procurement process might have influenced them to go with a different vendor. Also, is Nvidia still partnering with Cray? If not, maybe Cray had some pull in getting this deal through.

    And, by 2021, AMD had better get an answer to Tensor cores, or they can basically forget about the Deep Learning market. Their current solution is pretty uncompetitive.
  • mode_13h - Tuesday, May 7, 2019 - link

    > with each cabinet rated for 300KW

    mind = blown. How many racks per cabinet? That's like 500 upper mid-range gaming PCs' worth.

    > it sounds like Oak Ridge will be installing a total of 40MW of capacity for Frontier

    It might be cheaper just to build a small power plant on site. How many acres of photo voltaic would you need for that?

    I wonder how this compares to Google's Stadia capacity. I imagine they'll have lower interconnect bandwidth, but the compute capability of larger Stadia hosting facilities should probably rank fairly high up on the Top 500 list.
  • Lord of the Bored - Wednesday, May 8, 2019 - link

    Super computer power demands are crazy. Fortunately Oak Ridge knows a thing or two about nuclear power, because their computer department might need a dedicated reactor.
  • zodiacfml - Friday, May 10, 2019 - link

    Looks like AMD has server(more likely desktops too) CPUs that are compelling than competition by 2021. TSMC seems to be on schedule with their 5nm+ by 2021.
    I want an AMD system by next year, not sure if I can wait.
  • dcrus - Saturday, November 23, 2019 - link

    I'm very happy to be here. Reading this article explained many of my doubts. I also invite you to my <a href="https://free.test-iq.live/">website</a&...

Log in

Don't have an account? Sign up now