Comments Locked

18 Comments

Back to Article

  • SarahKerrigan - Monday, June 28, 2021 - link

    That looks like a really, really well-balanced accelerator.

    (I also appreciate that they didn't skimp on L3 like a lot of Neoverse designs have.)
  • Wilco1 - Monday, June 28, 2021 - link

    The top model beats Graviton 2 on SPECINT due to having *twice* the performance per core. It also has twice as much cache and memory bandwidth per core. And all that at 60W...

    It's a monster and likely outperforms most servers it will be offloading!
  • SarahKerrigan - Monday, June 28, 2021 - link

    At least for LLC, it's 4x the cache per core. Octeon10 is 2MB/core, Grav2 is 512KB/core.
  • mode_13h - Monday, June 28, 2021 - link

    For stateful packet processing, they really need as much cache as possible. The amount of context they can hold on chip can become a serious limiting factor. Because there's no SMT, the core is just sitting idle if you have to go off-chip for some connection-specific state.
  • mode_13h - Monday, June 28, 2021 - link

    ...and the usual trick of masking it with hardware prefetchers won't work at all, because they can't know which connection the next packet belongs to.
  • brucethemoose - Friday, July 2, 2021 - link

    Isn't Marvell the one who made a 4 or 8-way SMT ARM core, then abandoned it, presumably because it was too niche?

    This seems like a perfect use case. A shame it had to go...
  • mode_13h - Saturday, July 3, 2021 - link

    Seems that Cavium's Thunder X2 and X3 had SMT-4. And yes, they did get bought by Marvell.

    At that point, you could really reduce your OoO window and still probably get good utilization. Packet processing is one of those embarassingly parallel problems, so there should be no problem with workload scaling (or side-channel attacks, for that matter).

    Oh well. Perhaps somebody else might have a go at it. Maybe we'll start to see some SMT RISC-V cores, especially now that Linux has "core scheduling" to better manage SMT sidechannel vulnerabilities.

    Since this appears to be in a proprietary core, even Marvell could reverse course and drop in a SMT-based solution, at some point.
  • mode_13h - Monday, June 28, 2021 - link

    > Nvidia’s BlueField3 DPU design that still “only” features Cortex-A78 cores

    I didn't believe this, given its projected launch date, but it turns out to be right in the GTC 2021 keynote slides!

    https://images.anandtech.com/doci/16611/17056937.j...
  • eastcoast_pete - Tuesday, June 29, 2021 - link

    Interesting units, especially for 5G base stations and networking. Notice how they emphasize "fanless" operation in the slides! Curious how those compare (if at all) with Intel's x86-based offerings with some ML thrown in? Also, how do these compare with whatever Huawei has or had before they were booted from TSMC's advanced nodes?
  • mode_13h - Tuesday, June 29, 2021 - link

    Should be: https://en.wikipedia.org/wiki/HiSilicon#Kunpeng_93...

    For more, see: https://fuse.wikichip.org/news/2274/huawei-expands...

    I don't know if any of this stuff is (still) accurate.
  • mode_13h - Friday, July 2, 2021 - link

    I'm not entirely sold on the concept of vector packet processing. I wonder if they really wouldn't just be better off with >= 4-way SMT.
  • brucethemoose - Friday, July 2, 2021 - link

    Would security be a concern with SMT?

    For whatever reason, SMT seems to be unpopular in the ARM ecosystem, as even Marvell themselves abandoned the SMT heavy ThunderX3.

    In fact, wasn't the TX2 processor based on ThunderX2, which was also a SMT4 design?
  • mode_13h - Saturday, July 3, 2021 - link

    > Would security be a concern with SMT?

    I was just thinking about this. For some applications, no. This would tend to be running a highly-managed software stack. However, the nice thing about such an architecture is that you could run guest VMs and other sorts of software with higher likelihood of being malicious or exploitable to behave maliciously.

    To help manage these risks, Linux now offers better policy control over which threads can share cores. So, you could limit core-sharing to threads of the same process or VM, for instance.

    > For whatever reason, SMT seems to be unpopular in the ARM ecosystem,

    Because ARM cores are traditionally comparatively small, the area-efficiency of SMT has been less.

    ARM, itself, makes two SMT-2 cores (A65AE & A76AE), for 64-bit embedded applications. This is an implicit acknowledgement of the technical advantages of SMT. Embedded use-cases tend to be the ones with the least risk from side-channel attacks.

    > as even Marvell themselves abandoned the SMT heavy ThunderX3.

    I think that was simply because they weren't competitive with ARM's N2 cores.
  • mode_13h - Saturday, July 3, 2021 - link

    > ... the area-efficiency of SMT has been less.

    I meant the benefit in area-efficiency vs. simply adding more cores.

    Also, I think the raft of recent side-channel vulnerabilities has given SMT an image problem and reduced customer demand for the feature.
  • ChrisGX - Sunday, July 4, 2021 - link

    I don't think Split-Lock capability in the Cortex-A76AE relies on SMT. Dual Core Lock-Step as the name suggests is a way of engaging two cores to raise the reliability of operations running on these specialised computing and control units.
  • mode_13h - Sunday, July 4, 2021 - link

    The split-lock functionality seems distinct from the SMT capability.

    https://www.anandtech.com/show/13727/arm-announces...

    I'm not certain the A76AE is SMT-capable, however. That might've been some bad info I found.
  • ChrisGX - Monday, July 5, 2021 - link

    Actually, I recall there was a second core besides the Cortex-A65AE from ARM with SMT - the Neoverse E1. Andrei pointed out that the E1 was derived from the Cortex-A65AE. At the time of the release of the E1 core ARM had thought it would be used for “throughput workloads that largely are...about shifting large amounts of data around" and that "are predominantly in the data plane". The Cortex-A65AE was said to be suited to streaming data from sensors whereas the E1 could support the streaming of data from the network in the case of infrastructure workloads. Evidently, with compute capability having become essential to DPUs - that is shown clearly by the Octeon 10 - the E1 may have been eclipsed in the role it was expected to play by ARM's N series Neoverse silicon.

    The SMT-capable A65 core still seems interesting to me. It wouldn't shock me to see it (or something very much like it) put to good use beyond Automotive applications in more mainstream Cortex parts.

    https://www.anandtech.com/show/13959/arm-announces...
  • mode_13h - Wednesday, July 7, 2021 - link

    Cool, I had forgotten about the E1. Thanks for the follow-up.

    This page includes a roadmap slide showing the E1, N2, and V1 all falling off a cliff labeled "Poseidon Generation", in 2022+. So, who knows if there'll be an E2 or whether it'll have any relation to the E1...

    https://www.anandtech.com/show/16640/arm-announces...

Log in

Don't have an account? Sign up now