Comments Locked

58 Comments

Back to Article

  • eastcoast_pete - Friday, June 25, 2021 - link

    Interesting CPUs. Regarding the 72F3, the part for highest per-core performance: Dating myself here, but I recall working with an Oracle DB that was licensed/billed on a per-core or per-thread basis (forgot which one it was). Is that still the case, and what other programs (still) use that licensing approach?
    And, just to add perspective: the annual payments for that Oracle license dwarfed the price of the CPU it ran on, so yes, such processors can have their place.
  • flgt - Friday, June 25, 2021 - link

    More workstation than server, but companies like Ansys still require additional license packs to go beyond 4 cores with some of their tools, and they often come with hefty 5-figure price tags depending on the program and your organizations bargaining ability.
  • RollingCamel - Friday, June 25, 2021 - link

    It was refreshing to see Midas NFX running without core limitations.

    The core limitations archaic and doesn't represent the current development. Unless the license policies has evolved in the past years.
  • realbabilu - Monday, June 28, 2021 - link

    Interesting to see The fea implementation with latest math kernel available like midas nfx bench in anandtech. Hopefullly anandteam got demos from midas korea for testing. Abaqus, msc nastran, inventor fea anything will do.
    However i dont think midas improved their math kernel, i had midas civil and gts licensed, but cant use all threads and all memory i had On my pc, like others fea dis.
  • mrvco - Friday, June 25, 2021 - link

    Power unit pricing! LOL, the dreaded Oracle audit when they needed to find a way to make their quarterly numbers!?!?!
  • eek2121 - Friday, June 25, 2021 - link

    It blows my mind that people still use Oracle products.
  • Urbanfox - Sunday, June 27, 2021 - link

    For a lot of things there isn't a viable alternative. The hotel industry is a great example of this with Opera and Micros.
  • phr3dly - Friday, June 25, 2021 - link

    In the EDA world we pay per-process licensing. As with your scenario, the license cost dwarfs the CPU cost, particularly over the 3-year lifespan of a server. You might easily spend 50x the cost of the server on the licenses, depending on the number of cores. Trying to optimize core speed/core count/eventual server load against license cost is a fun optimization problem.

    So yeah, the CPU cost is irrelevant. I'm happy to pay an extra several thousand dollars for a moderate performance improvement.
  • eldakka - Saturday, June 26, 2021 - link

    > Oracle DB that was licensed/billed on a per-core or per-thread basis (forgot which one it was). Is that still the case, and what other programs (still) use that licensing approach?

    Lots of Enterprise applications still use that approach, Oracle (not just DB), IBM products - WebSphere Application Server (all flavours, standalone, ND, Liberty, etc.), messaging products like WebSphere MQ, I believe SAP uses it, many RedHat 'middleware' products (e.g. JBOSS web server, EAS, etc.) use it as well.

    In the enterprise space, it is basically the expected licensing model.

    And the licensing cost per-core is usually 'generation' dependant. So you usually just can't upgrade from, say, a 20-core Xeon 6th-gen to a 20-core 8th gen and expect to pay the same.

    The typical model is a 'PVU', Processor Value Unit (different companies may give it a different label - and value different processors differently, but it usually boils down to the same thing). Each platform-generation (as decided by the software vendor, i.e. Oracle, or IBM, etc.) has a certain PVU per core. E.g., (making up numbers here) a POWER8 (2014ish release) might have a PVU of 4000, and an Intel Haswell/Ivy Bridge - E5/7 v2/3 I think (2014/15ish)- might be given 3500. So if using an 8-core P8 LPAR that'd be 32000 PVU, while an 8 core VI on E7 v3 would be 28000. And P9 might be 7000 PVU, a Milan might be 6000 PVU, so for 8-cores it'd be 56000 or 48000 respectively. Then there will be a doller per PVU multiplier based on the software, so software bob could be x1 per year, so in the P9 case $56k/year license, whereas software fred mught be a x4 multiplier, so $224k/year. And yes, there can be instances where software being run on a single server (not even necessarily a beefy server) can be $millions/year licensing. If some piece of software is critical, but low CPU/memory requirements, such as an API-layer running on x86 hardware that allows midrange-based apps to interface directly to mainframe apps, they can charge millions for it even if its only using combined 8-cores across 4 2-core server instances (for HA) - though in this case, where they know it requires tiny resources, they'll switch to a per-instance rather than a per-core pricing model, and charge $500k per instance for example.

    The per-core PVU can even change within a generation depending on specific CPU feature sets, e.g. the 2TB per-socket limited Xeon might be 5000 PVU, but the 4TB per-socket SKU of the same generation might be 6000 PVU, just because the vendor thinks "they need lots of memory, therefore they are working on big data sets, well, obviously, that's more money!" they want their tithe.

    Oh, and nearly forgot to mention, the PVU can even change depending on whether the software vendor is tring to push their own hardware. IBM might give a 'PVU' discount if you run their IBM products on IBM Power hardware versus another vendors hardware, to try and push Power sales. So even though, in theory, a P9 might be more PVU than a Xeon, but since you are running IBM DB2 on the P9, they'll charge you less money for what is conceptually a higher PVU than on say a Xeon, to nudge you towards IBM hardware. Oracle has been known to do this with their aquired SPARC (Sun)-based hardware too.
  • eastcoast_pete - Saturday, June 26, 2021 - link

    Thanks! I must say that I am glad I don't have to deal with that aspect of computing anymore. I also wonder if people have analyzed just how severely these pricing schemes have blunted or prevented advancements in hardware capabilities?
  • Threska - Sunday, June 27, 2021 - link

    Seems the only thing blunted is the economics of throwing more hardware at the problem. Actual technical development has taken off because all the chip-makers have multiple customers across many domains. That's why Anandtech and others are able to have articles like they have.
  • tygrus - Sunday, June 27, 2021 - link

    Reminds me of the inn keeper from Les Miserables. Nice to your face with lots of good promises but then tries to squeeze more money out of the customer at every turn.
  • tygrus - Sunday, June 27, 2021 - link

    I was ofcourse referring to the SW not the CPU.
  • 130rne - Tuesday, September 14, 2021 - link

    What the hell did I just read? Just came across this, I had no idea the enterprise side was this fucked. They are scalping the ungodly dog shit out of their own customers. So you obviously can't duplicate their software in house meaning you're forced to use their software to be competitive, that seems to be the gist. So I buy a stronger cpu, usually a newer model, yeah? And it's more power efficient, and I restrict the software to a certain number of threads on those cpus, they'll just switch the pricing model because I have a better processor. This would incentivize me to buy cheaper processors with less threads, yeah? Buy only what I need.
  • 130rne - Tuesday, September 14, 2021 - link

    Continued- basically gimping my own business, do I have that right? Yes? Ok cool, just making sure.
  • eachus - Thursday, July 15, 2021 - link

    There is a compelling use case that builders of military systems will be aware of. If you have an in-memory database and need real-time performance, this is your chip. Real-time doesn't mean really fast, it means that the performance of any command will finish within a specified time. So copy the database on initialization into the L3 cache, and assuming the process is handing the data to another computer for further processing, the data will stay in the cache. (Writes, of course, will go to main memory as well, but that's fine. You shouldn't be doing many writes, and again the time will be predictable--just longer.)

    I've been retired for over a decade now, so I don't have any knowledge of systems currently being developed.

    Who would use a system like this? A good example would be a radar recognition and countermeasures database. The fighter (or other aircraft) needs that data within milliseconds, microseconds is better.
  • hobbified - Thursday, August 19, 2021 - link

    At the time I was involved in that (~2010) it was per-core, with multiple cores on a package counting as "half a CPU" — that is, 1 core = 1CPU license, two 1-core packages = 2CPU license, one 2-core package = 1CPU license, 4 cores total = 2CPU license, etc.

    I'm told they do things in a completely different (but no less money-hungry) way these days.
  • lemurbutton - Friday, June 25, 2021 - link

    Can we get some metrics on $/performance as well as power/performance? I think the Altra part would be better value there.
  • schujj07 - Friday, June 25, 2021 - link

    "Database workloads are admittedly still AMD’s weakness here, but in every other scenario, it’s clear which is the better value proposition." I find this conclusion a bit odd. In MultiJVM max-jOPS the 2S 24c 7443 has ~70% the performance of the 2S 40c 8380 (SNC1 best result) despite having 60% the cores of the 8380. In the critical-jOPS the 7443's performance is between the 8380's SNC1 & SNC2 results despite the core disadvantage. To me that means that the DB performance of the Epyc isn't a weakness.

    I have personally run the SAP HANA PRD performance test on Epyc 7302's & 7401's. Both CPUs passed the SAP HANA PRD performance test requirements on ESXi 6.7 U3. However, I do not have scores from Intel based hosts for comparison of scores.
  • schujj07 - Friday, June 25, 2021 - link

    The DB conclusion also contradicts what I have read on other sites. https://www.servethehome.com/amd-epyc-7763-review-... Look at the MariaDB numbers for explanation of what is being analyzed. Their 32c Epyc &543p vs Xeon 6314U is also a nice core count vs core count comparison. https://www.servethehome.com/intel-xeon-gold-6314u... In that the Epyc is ~20%+ faster in Maria than the Xeon.
  • Andrei Frumusanu - Friday, June 25, 2021 - link

    Those results don't contradict anything I'm saying. Given a normalised throughput performance of the socket, for example here where the 16- and 24- core Milan equals or beats the 28-core ICL-SP in many workloads, the Xeon still handily beats those Milan parts in transactional workloads. The 40-core Xeon has 77% of the jbb performance of the 64-core EPYC even though in the int suite it's only at 60%. Those particular STH results work out because the 7543P is $1000 cheaper than the 7543, but for the SKUs we had in today, Intel still is on equal footing in terms of DB performance value.
  • Cllaymenn - Friday, June 25, 2021 - link

    Whatever one says about some insignificant single anomaly in some DB test... The fact is that ANY company, from small to large, any corporation needing power, any data centre, hosting, cloud computing, research institutes, universities, will choose EPYC on ZEN3 over even the 8320, because it will allow them to compute faster, make more money per month, and less stress for administrators when there are higher network loads, clouds because AMD will "grind" / process faster the requests/needs of thousands of of thousands of clients simultaneously using servers, because in addition to more compute power has more bandwidth AMD platform especially with 256 threads and 8 channel memory and fast Infinity Fabric and many of the ZEN3 optimizations... and is more flexible (harder to clog or jam Zen2/Zen3 from what I've noticed. ) These processors grind through anything you throw at them without any breathlessness.
  • schujj07 - Friday, June 25, 2021 - link

    While Spec is an "industry standard" benchmark, vendors spend hours optimizing for their servers to look better. Therefore as an administrator and designer of a high performance data-center I personally look at Spec results with a grain of salt. For example, Super Micro submitted data for 2 of their A+ AS-1124US-TNRP with dual 75F3 on April 26, 2021. One system has max-jOPS of 276,317 and critial-jOPS of 116,628. The other has a score of 211,179 max-jOPS & 191,813 critical-jOPS. They also have 2 X12DPG-QT6 with dual 8380's and one has scores of 272,500 for max-jOPS & 147,409 for critical-jOPS. The other has scores of 258,368 for max-jOPS & 201,334 for critical-jOPS. In these cases the 75F3 with few cores and threads ends up in a virtual tie with the 8380 in the transactional workload for one of the results, but the second result in the database is a 22-30% lower based on comparison systems. https://www.spec.org/jbb2015/results/res2021q2/

    Depending on the results you want, the 75F3 is a much better value or of equal value to the 8380. I think now you can see why I take Spec with a grain of salt on their results. Globally saying that Milan has issues in transactional DBs based solely on Spec results isn't a good idea. While I know it is the benchmarks that you choose as they are "industry standard," I think it would be worth while to invest in creating an actual real world scenario DB benchmark that doesn't use Spec.
  • Andrei Frumusanu - Friday, June 25, 2021 - link

    > One system has max-jOPS of 276,317 and critial-jOPS of 116,628. The other has a score of 211,179 max-jOPS & 191,813 critical-jOPS.

    Which generally makes submitted scores not very useful, we're using apples-to-apples runs here, and while you can argue they're not as optimised, they're comparable to each other.

    And I also never said that Milan has *issues*, I'm simply saying that compared to other workloads where there's a massive performance lead for AMD, Intel is still competitive, a view that falls in line with many industry customers.
  • Cllaymenn - Friday, June 25, 2021 - link

    We know that Intel watches the Anandtech website, and that you are aware of this, they also send you expensive hardware for testing, and hope that the results will be more favourable to their new development (e.g. 8320) which they have been working on for a long time. I think it would be unpleasant and uncomfortable to criticise their new products harshly if I were writing a review, but I would rather gently point out which is good at what, which is leading and which still needs to catch up. Because of the awareness of the efforts of hundreds or even thousands of Intel engineers I would not have the heart to criticize their new product, or sharply, clearly say who wins everything and the rest can hide. I know that even the engineers, designers and CPU architects like to read about their new baby after work, and they go to sites like Anandtech with enthusiasm and quiet hope that they have made a better impression on the reviewer and readers, than their previous older products, that we have noticed a significant difference, jump in performance and that it has been appreciated and maybe there will be some nice, positive comments, feedback. It probably gives them a lot of happiness to see people out there enjoying the results of their hard work and another success for the company. Because the 8320 was a huge challenge for these people, it's a brand new fresh 10nm SuperFin technology and a mega monolithic 40 core big piece of silicon. And it works! It may not catch up with the 64 core competition but it's still a huge step forward for them, reaching a significant milestone. Once they mastered this SuperFin 10nm technology to create monolithic 40 core chips they now have a lot of experience and know how to do it even better, especially in a modular architecture where the silicon pieces will be smaller. Many of the threads stem from the creation of the Xeon 8320, so I understand the reviewer's attitude of appreciating the level of technology, sophistication, and performance of their new design. (sorry for some grammatical errors, I'm still improving)
  • bwhitty - Friday, June 25, 2021 - link

    Can't tell if you're very subtly implying Andrei is coloring the results in favor of Intel? Perhaps you're not, but anyways it doesn't seem he is. Other than that, I agree that

    Small correction: Ice Lake is on 10nm+, not Super Fin. Tiger Lake is 10SF (10++), and Sapphire Rapids will be on 10 Enhance Super Fin, so 10nm+++.

    Tangent: I think that Ice Lake being on the non-SF process actually bodes extremely well for Sapphire Rapids because Ice Lake even in laptops is just not that good from a mfg perspective. It's basically Intel 10nm's first shippable and salvaged process. Super Fin appears far, far better in Tiger Lake versus Ice Lake, and so an improvement on top of that thusly should perhaps finally bring Intel's mfg in line with TSMC 7nm. That gives Sapphire Rapids a good place to be in the first half of 2022 until Genoa rolls out on TSMC 5nm is late 2022 / early 2023.
  • Cllaymenn - Friday, June 25, 2021 - link

    bwhitty. I did not mean favoring Intel products, but a more subdued way of speaking about their performance in relation to ZEN3, a way other than the popular Linus on YT, which is sharply pressing Intel with each premiere of new AMD products.

    As for Super Fin, I read about it recently in one of the popular IT websites. I typed in google and found a quote

    "Intel Xeon Scalable Ice Lake-SP processors were announced some time ago, but we had to wait a while for their premiere. We finally got it - we got to know the technical details of the units, as well as their performance results. Intel Xeon Scalable units (Ice Lake-SP) use the new Sunny Cove microarchitecture, which is expected to translate into up to a 20% increase in IPC over the previous generation Skylake. The chipsets are manufactured using a new 10nm SuperFin process.

    As I checked with a few other sources, I now know that this site was wrong about the 83xx series.
  • Ian Cutress - Friday, June 25, 2021 - link

    On 10nm naming, Intel has changed it twice. There are no + or ++ any more.

    https://www.anandtech.com/show/16107/what-products...
  • bwhitty - Monday, June 28, 2021 - link

    Oh yes, Dr Cutress, I know all these Intel mfg node specifics purely from Anandtech’s breakdowns
  • outsideloop - Friday, June 25, 2021 - link

    Far, far better? Tiger Lake H still sucks power like an anebriated Cleopatra.
  • DannyH246 - Thursday, July 1, 2021 - link

    lol - no need to be subtle about it. www.IntelTech.com has been doing this for years.
  • Qasar - Thursday, July 1, 2021 - link

    hilarious, go back to wccftech then danny
  • whatthe123 - Friday, June 25, 2021 - link

    good god man, the review quite literally posts hard numbers of epyc simply thrashing xeon in performance even on a per core basis, and you think they're worried that intel will retaliate against them if they don't say something nice about one corner of a segment of performance?

    what is it about technology that attracts cultists?
  • Threska - Saturday, June 26, 2021 - link

    There's a reason the PCMasterRace forum exists.
  • msroadkill612 - Sunday, June 27, 2021 - link

    "Cultists" - I like it :)

    So true, & on many levels. I see strangely neurotic behaviour from such an allegedly smart & rational demographic.

    As a group, they are prone to be great at rattling off streams of presumably accurate numbers and jargon, but arrive at childishly naive conclusions, & ask the wrong questions based plain wrong premises.
  • devione - Sunday, June 27, 2021 - link

    Jesus Christ man. Grow some fucking balls. If you're going to call out Anandtech for being biased at least be straightforward and frank. No need to write an essay trying to couch and justify and be obtuse about it
  • Makste - Monday, July 5, 2021 - link

    Lmao
  • Oxford Guy - Friday, June 25, 2021 - link

    What are the RAM timings? I don’t see that information in the charts.
  • Andrei Frumusanu - Saturday, June 26, 2021 - link

    These are standardised PC4-3200AA-RB2-12 sticks, running at JEDEC timings.

    https://www.micron.com/products/dram-modules/rdimm...
  • Oxford Guy - Monday, June 28, 2021 - link

    Thank you. And the other systems tested?
  • mode_13h - Sunday, June 27, 2021 - link

    Thanks for this update. Exciting findings!
  • Gondalf - Sunday, June 27, 2021 - link

    SPECint2017 is good but....SPECint2017 Rate to estimate the per-core performance, no no absolutely no. SPECint2017 Rate have a very small dataset and it can not be utilized to estimate the single core performance, we need of the full SPECint2017 workload, the only manner to bypass the crazy L3 of Ryzen. Half the article have a so so sense ( obviously SPEC Rate is very criticized by many and very likely means less than nothing, expeciallly if you rise the bar on L3 ), the other half nope, without sense.
    In fact Intel claim a new 10nm 32 cores superior than a 32 cores Milan, after all the two cores ( Zen 3 and Willow Cove) have around the same IPC, more or less, and being chiplets, 32 cores Milan is out of the games.
    Obviously in this article the world "latency" is hidden or so. A single die solution is always better than chiplet design under load with the same number of cores.
  • Qasar - Sunday, June 27, 2021 - link

    and there is the highly biased anti amd post from gondalf that he is known for.

    " In fact Intel claim a new 10nm 32 cores superior than a 32 cores Milan, after all the two cores ( Zen 3 and Willow Cove) have around the same IPC, more or less, and being chiplets, 32 cores Milan is out of the games. "
    yea ok, more pr bs from intel that you blindly believe ? post a link to this. the fact that you start with " in fact intel claim" kind of point to it being bs.
  • schujj07 - Monday, June 28, 2021 - link

    Gandalf missed a link I posed that has a 32c Intel vs 32c AMD. In that the AMD averages 20% better performance than the Intel across the entire test suite. https://www.servethehome.com/intel-xeon-gold-6314u...
  • iAPX - Sunday, June 27, 2021 - link

    There's a lot to read and understand on the last chart (per-Thread score / Socket Perf), about usefulness of SMT (or not), about who is the per-Thread performance leader and also the per-Socket performance leader, with a notable exception, the Altra Q80-33.

    I would like to see these kind of chart more often, it sum-up things very clearly, while naturally you have to understand that it is just a long-story short, and have to read about specific performance depending on the payload (ie: DB as stated).

    Kudos!
  • nordform - Thursday, July 1, 2021 - link

    Too bad Apple's M1 was left out ... it clearly would have smoked the "competition". Everything with a TDP higher than 25W is inappropriate, not to say obscene.

    Apple rules hands down
  • Qasar - Friday, July 2, 2021 - link

    " Everything with a TDP higher than 25W is inappropriate, not to say obscene. " and why would that be ?
  • mode_13h - Friday, July 2, 2021 - link

    That would be like drag racing a Tesla car against some 18-wheeled diesel trucks.

    Server CPUs are not optimized for low-thread performance. They're designed to scale, and have data fabrics to handle massive amounts of I/O that the M1 can't. It wouldn't be a fair (or relevant) comparison.

    Now, try running that Tesla car in a tractor pull and we'll see who's laughing!
  • Oxford Guy - Thursday, July 8, 2021 - link

    Happy to have won another debate in which my suggestion was aggressively attacked.

    I said having dual channel DDR4 for Zen 3 was unfortunate, as DDR4 is so long in the tooth — a fact that dual channel configuration makes more salient. I said it would have been good for the company to add more value by giving it quad channel RAM or, if possible, a support for both DDR4 and DDR5 — something some mainstream Intel quads had (support for DDR3 and DDR4).

    My remark was derided mainly on the basis of the claim that dual channel is plenty. This new set of parts demonstrate the benefit of having more RAM and cache.

    Considering how high the core counts are for Zen 3 desktop CPUs and how much Apple has set people on notice about what’s possible in CPU performance...

    Also, part of the rebuttal was citing the existence of TR. That’s still Zen 2, eh? Can’t really go out and buy that rebuttal.

    Is the benefit of being able to stay with the AM4 socket bigger than having less starvation of the CPU, particularly given the very high core counts of CPUs like the 5950? TR may be everyone’s segmentation dream (particularly when it’s being laughingly sold with obsolete Zen 2 and subjected to rapid expensive motherboard orphaning) but I think having five motherboard specs is a bridge too far. Let the low-end have dual channel and no overclocking, dump TR, and consolidate the enthusiast boards to a single (not two) chipset. But... that’s me. I like more value versus little crumbs and redundancies. When a whopping two companies is the state of the competition, though, people become trained to celebrate banality.
  • mode_13h - Thursday, July 8, 2021 - link

    > Zen 3 was unfortunate, as DDR4 is so long in the tooth ...
    > it would have been good for the company to add more value by giving it quad channel RAM

    Agreed. Would've been nice. In spite of that, the 5950X manages to show gains over the 5900X, but we can still wonder how much better it might be with more memory bandwidth.

    I wouldn't have an issue with quad-channel being reserved for their TR platform if:

    * they were more affordable

    * they brought Zen3 to the platform more promptly

    An interesting counter-point to consider is how little 8-channel RAM benefitted TR Pro:

    "In the tests that matter, most noticeably the 3D rendering tests, we’re seeing a 3% speed-up on the Threadripper Pro compared to the regular Threadripper at the same memory frequency and sub-timings."

    https://www.anandtech.com/show/16478/64-cores-of-r...

    That's much less benefit than I'd have expected, as a 64-core TR on quad-channel should be far more bandwidth-starved than a 16-core Ryzen on dual-channel. However, that same article features a micro-benchmark which shows the full potential of 8-channel. So, it's obviously workload-dependent.

Log in

Don't have an account? Sign up now