Comments Locked

4 Comments

Back to Article

  • ikjadoon - Monday, August 17, 2020 - link

    >Even though on a per-GHz TX3 is faster [than Intel]

    >TX3 does better than AMD on single thread.

    So, better IPC than Intel and higher IPC + ST than AMD (except throughput). I presume these are Cascade-X vs Rome comparisons.
  • ksec - Tuesday, August 18, 2020 - link

    Cloud Vendor would love this as it allow them to Sell 240 vCPU Core Per System. Would love to see how it perform in Real World.
  • name99 - Tuesday, August 18, 2020 - link

    "01:56PM EDT - (that's ~1.5x ?)"

    You can ask different questions.

    One question is:
    - run a single CPU single threaded and measure your mysql performance, then run the same single CPU 2 or 4-way multithreaded. That was MY assumption for the 1.79/2.21x speedups.
    This tells you something about how a single core behaves.

    Alternatively you can do what the second graph shows, run 1, 2, 3, ..60 CPUs at a single thread, then start turning on a second thread for each CPU and so on.
    This now tells you something about how the uncore behaves, as you start stressing the NoC, the L3, the memory system.

    But I share your skepticism. (About SMT in general). Sure, you are getting 1.5x faster performance for MySQL (and there are a class of similar codes that are important), for only (supposedly) 5% higher area. Sounds like a good deal. But is this REALLY a sensible design policy?

    (a) That extra 5% also takes a whole lot of extra engineer design and verification time. And opens you up to god knows what possible security issues. And can result in customers complaining about variability in performance so now you start adding extra epicycles to try to make things fairer (Intel has gone through a few iterations of this).

    (b) The alternative could have been to just design a lighter version of the core (much the same design, just strip out 30% area that's least helpful to low-IPC throughput-dominant codes like My SQL -- maybe halve the FPU facilities, shrink the L1 and L2 caches, general reduction of the OoO structures?) and offer a version of that design with 90 cores. Same throughput, less engineer effort.
    The extent to which this is feasible depends, of course, on the extent to which this design is parameterized and can be easily "recompiled" with different parameters. The result may not be an optimal throughput chip compared to a blank slate design, but good enough.

    And there's value to Marvell in having both a big and a small core that they own, for the purposes of all the other chips they produce, beyond just TX3...

    The value of SMT is optionality -- you can run the same chip as either a latency engine in SMT1 mode, or a throughput engine in SMT4 mode. But that optionality is of little value to data centers, which have racks dedicated to doing just one job; they're not like personal machines that constantly switch between different types of tasks.
  • rahvin - Tuesday, August 18, 2020 - link

    Based on the previous claims for I and II it won't be better than either AMD or Intel in either single or multithreaded.

Log in

Don't have an account? Sign up now