Both Cortex-A53 and Cortex-A55 are in-order designs with multicore support. It's all about starting small, with plans of big core designs focusing on performance coming after 2024.
Most seem to agree that they should leave the in-order stuff to their Cortex-A3xx series. However, they have yet to announce one of those for ARMv9 and I guess the X-series cores sort of demote the 7xx-series to mid-range, and that would mean the 5xx-series could effectively now replace the 3xx-series?
‘ It's all about starting small, with plans of big core designs focusing on performance coming after 2024.’
No, it’s about explaining why it makes sense to build an in-order architecture with the given die area and power budget that has 8 cores — versus an out-of-order design with fewer cores. What is/are the workloads where a high core count in-order design is more efficient?
Do you think designing a good out-of-order processor requires the same effort as designing a in-order one? If they could've managed to design one by 2021/2022 they would've done it, the point of starting small is beginning with stuff that's easier to design. Not even ARM uses out-of-order execution in their efficient/small cores and they have much more experience in the area.
> Apple small cores are OoO and they are the most efficient right now.
It's not quick or easy to make them that efficient.
Also, you're only looking at efficiency in terms of perf/W. There are other metrics that count, such as area-efficiency, which roughly translates into perf/$.
Making an OoO core as fast and efficient as Apple's uses quite a lot more transistors, which could blow the price targets for whatever applications Imagination has in mind. Think embedded, IoT-style uses, which tend to be very cost-sensitive.
You're right, but Apple has always envisioned and strived for innovation in performance, power, and area. Apple's small cores are actually not that large, but they are low-power, and have lots of performance on-tap. ARM will possibly follow in their lead, by designing a better low-power core for the industry; Qualcomm, Samsung, HiSilicon, MediaTek, RockChip, Unisoc, Allwinner, AMLogic, etc etc.
Talking about Huawei.... was it not that ImaginationTechnologies (UK company) had a hostile take-over by a Chinese firm? Well, there have been a lot of senior engineers who ended up leaving the company. Some direct to Apple, others into other industries. So I think Anandtech is wrong here, they don't have the talent, but they sure do have the IP.
They are more than likely being pumped with cash to get these solutions out there, via State Subsidies like Huawei has. So they have the IP, and now they have the money, they have RISC-V as a CPU, there is SIMC lithography, Huawei can build the micro-modem, and lastly they have the open-source software such as JingOS. So all the necessary parts are there to build a fully functioning SoC for the Chinese market, without interferences from USA.
So if they continue on this path, they will achieve a SoC equivalent to the QSD 835 in the next 3 years. Probably equivalent to the Cortex-A73, on a 16nm node, with JingOS or similar software. And going further, after 5 years they will possibly match the likes of the QSD 888. That seems like a lifetime away in the tech industry. But another way of looking at it, is that it is Rapid Innovation, since they are almost starting from scratch.
How do you justify the inefficiency and big die areas? One OOO core would be as big as 6-8 in-order cores especially stripped down microcontrolller cores. And it's not even matching the performance.
OOO design has awful performance density and also much worse efficiency at sub 1GHz mark, end of the story.
And you want justification? Simple, packet processors. You need dirt cheap parallel processing power. OOS is simply useless.
If your workload has enough concurrency (i.e. can be split up across enough threads), then having a large number of in-order cores is a far more area-efficient and power-efficient way to scale performance. This is exactly what GPUs do.
GPUs tend to have wide SIMD units and many-way SMT. If your code involves mostly scalar computations, then the SIMD registers + pipeline(s) are a waste of die space.
As for SMT, each thread requires its own copy of architectural state, which means more registers and other structures (e.g. additional tags in things like caches and branch-predictors). They could certainly go SMT with these cores, as discussed below, but likely not to the same extent as GPUs.
Lastly, GPU ISAs tend to lack critical features and instructions needed to run a modern OS. Often, GPUs can't even do instruction-level traps. If you added some of those things to a GPU, they'd add overhead, making it less efficient at its main task.
It's a bit like asking why you can't use a chainsaw to carve a turkey. Both a chainsaw and a carving knife are sharp and made for cutting, but the chainsaw wouldn't be terribly "efficient" with the meat, while burning a lot of power and making a lot of noise.
I find that surprising to hear, given how many years Apple used them. I wouldn't expect Apple to tolerate unstable drivers for long.
Anyway, the software stack for standard-ISA CPU cores is very different from what's needed for graphics. I'd like to say the vendor-supplied parts for it are vastly simpler than an entire graphics stack, but I know a great deal more about the complexities involved in a full graphics stack.
MediaTek and Unisoc continue to integrrate Imagination GPUs successfully, the drivers issues were only on Windows at a time when when basically only Nvidia had decent drivers.
I like their vision of being a RISC V- based competitor to ARM (at least, in the full IP sense), but it does seem like they're a bit late to the starting line. Aside from SiFive, there are plenty of other performance-oriented RISC V cores at various stages of development.
I can definitely see Huawei and other Chinese hardware makers using their designs for networking and other embedded devices due to not being under US' influence. As for consumer devices, it would be interesting to see Huawei porting HarmonyOS for RISC-V.
> I can definitely see Huawei and other Chinese hardware makers > using their designs for networking and other embedded devices
I'm sure they'll have plenty of homegrown RISC-V cores to use.
> it would be interesting to see Huawei porting HarmonyOS for RISC-V.
According to this, it's just a rebranded fork of Android 10. In that case, they can mostly just merge in the patches when Google ports Android to RISC-V.
There are plenty of changes to HarmonyOS from Android, especially at the kernel level which is quite different which would need serious work to port. But yeah, as a Mate 40 user running HarmonyOS 2.0 it would be really interesting to see a phone or tablet running HarmonyOS on a RISC-V device.
Imagination is wholly owned by Canyon Bridge Capital Partners, a Chinese equity fund, so while not technically homegrown, these cores are at least "homeowned".
So, in-order cores that are, however, multithreaded? Somebody please explain to me how that makes sense for a CPU design based on normally lean and area efficient RISC-V cores. Other implementations of RISC-V employed single-thread, out-of-order designs, and that made a lot of sense to me. This strikes me as a confusing core design that has the worst of both worlds; in order like the small ARM cores, but multi-threaded like big (large area) older style cores like x86 or Power. However, those are also typically out-of-order designs. So, what's up with your design, Imagination?
> So, in-order cores that are, however, multithreaded? > Somebody please explain to me how that makes sense
You had me confused, until I noticed one of the slides mentioned the CPU is "multithreaded". Are you sure that means SMT? It could be that RISC-V doesn't mandate context save/restore instructions, in which case they're just saying it'll have them.
And if they *do* mean SMT, then it's probably worth noting that ARM's only SMT cores are some of their automotive ones.
> worst of both worlds; in order like the small ARM cores, but multi-threaded
GPUs tend to be in-order and make heavy use of SMT. In that domain, it's certainly been a winning formula!
I would wager that it's not smt but one of the similar multithreading capabilities like interleaved multi-threading. All you need for interleaved multithreading is two register files and keeping track of which instruction/pipeline stage is working on what (which is much simpler in non-supercaler, in order designs). The core front end just inputs an instruction from each thread at a time.
The reason it's useful is that it effectively gives you constant(ish) performance for each thread like a multi core CPU, and it helps hide some of the penalties for branch mispredictions and other pipeline stalls (and even reduces the need to forward dependencies to avoid bubbles)
Both you and mode_13h make some good points; yes, for real-time and embedded microcontroller use, in-order makes a lot of sense. And, if the MT refers to interleaved MT, that wouldn't "bite" with the nature of these in-order designs and keep the area use small and efficient. I also re-read Ryan's article, and he did mention that Imagination stated they're working on an not yet specified out-of-order design of their "Catapult" line of RISC-V CPUs. In the meantime, the Chinese Academy of Sciences has apparently announced that they intend to roll out new RISC-V based CPU designs twice a year, including tape-outs of actual chips.
Take a look at the IBM's A2 cores and the RS64-II for example, or Sun's T-series. I think it's good to think SMT as a larger concept that what Intel's HT was.
You mean "is"? AFAIK, the widest Intel ever went with SMT (in x86) is 4-way, in the Knights Landing generation of Xeon Phi.
In their GPUs, they went 7-way for a while (including Gen 9, introduced with Skylake). I forget if it's still 7-way in Gen11, but I'm pretty sure they haven't disclosed how many in Xe.
I think the intention is to get to an OOO design. However, in order vs out of order isn't as important to a RISC ISA vs a CISC ISA - but it will have a small increase in a chip's efficiency. One of the basic paradigms of an OOO design is it effectively acts like a hardware thread scheduler on the CPU so it uses resources more efficiently - therefore far more important to a CISC design like x86. With RISC, it's more a situation of getting the data to the processing pipeline and be done with it, more like in a GPU design, so it's better to parallelize a lot of RISC cores than make them bigger.
It goes back to the old comparisons between CISC and RISC. With CISC, fewer bigger cores are necessary. With RISC more simpler cores to spread the load. With RISC designs it's mostly a question of keeping the pipeline as short as possible (it's already magnitudes shorter than CISC, but feature bloat, particularly with variable-width instructions...) and the fact that RISC is much more sensitive to clockspeeds than CISC... there's an argument however that we're in the age where we're at the limits of clockspeeds on silicon irrespective of ISA so the value of CISC vs RISC is mooted. What's far important is small, simple cores and lots and lots of them.
On that basis out-vs-in order rather takes a back seat.
Yeah, it depends on what market Imagination has in mind for these. As you can see from slides shown in the article, not all cores are advertised as supporting a "Rich OS". Others are labelled as "real-time processors".
In an RTOS, threads are often used as a basic construct for prioritization and service guarantees. That implies applications using them are likely to be more heavily-threaded. Nvidia's latest self-driving SoC (Orin) has 12 ARM cores.
whatever happened that machine that would parallelize sequential on the fly, and ship it as many cores as available? fact is, there still aren't all that many embarrassingly parallel problems. yes, in the innterTubes world servicing scads of independent connections is a worthy goal. just not the same thing.
> there still aren't all that many embarrassingly parallel problems.
Well, the HTML rendering engines in web browsers seem to have been parallelized for quite a while. I think a lot of apps parallelize rather well, if you can just reduce the communication & synchronization overhead by enough.
> One of the basic paradigms of an OOO design is > it effectively acts like a hardware thread scheduler
Um... a little bit, I suppose. However, unless you're talking about scheduling instructions from different SMT threads, they operate at *completely* different scales.
> It goes back to the old comparisons between CISC and RISC.
I think you're right about one thing, which is that RISC simplifies the decoder, enabling a smaller minimum core size. And yes, that means perf/area should scale better, as you add more of them.
> RISC is much more sensitive to clockspeeds than CISC
Simple RISC uArchs are. When they get sufficiently complex, that stops being true.
> With RISC designs it's mostly a question of keeping the pipeline as short as possible > (it's already magnitudes shorter than CISC
A few stages shorter, at most. You really shouldn't base so many assumptions on 30+ -year-old orthodoxy -- it will only lead you astray. Please compare specifics of some more modern CPUs.
This week, the Linux Foundation held a RISC-V Summit. Since it's not being reported in Anandtech's news feed, you can see for yourself what it included:
Uh, poached? Sounds like you're saying he's at another publication, but I think not.
He previously held (presumably an engineering) position at Imagination Tech. I don't know if he was doing any comparable work, since then, and only doing these articles as a side gig... but, whatever is now his "real" job, apparently conflict of interest is keeping him from writing any more articles in this space.
Anyway, it seems like Andrei's beat was pretty much all things mobile. With RISC-V not being strictly mobile, you'd think somebody else could've potentially covered it.
Furthermore, just *look* at the lack of newsfeed articles! Check out other tech sites and you'll see what sorts of things they *could* be covering.
Maybe Anandtech is losing support/budget vs. Tom's Hardware. Toms tends not to go in quite as much depth as the better articles on this site but they have had some good writers. Paul Alcorn is usually pretty solid, and Lucian Armasu was one of their better newsfeed contributors (though he seems to have been inactive since the pandemic hit). I've not been reading it for a while, but maybe I'll have to start following it, again.
Wondering about this, too. AWS's new Graviton (3) wasn't covered here, but several other sides did. Was wondering why Andrei or Ian didn't have an article about it here at AT? Yes, it's not something one can buy (strictly in-house), but the tech is still important.
Yeah, it's a good point. I also read about that elsewhere.
Seems to be using V1 cores, which came as a bit of a surprised (Graviton 2 used N1 and most people expected the next one to use N2). It also uses MCM, but the compute cores are all on a single die! Lots to talk about, really, including its performance estimates and power budget. Here's one place you can read about it:
That's right. Ryan and Andrei both thought it noteworthy enough to tweet about, but not even mention in a quick newsfeed blurb? Even after their extensive Tenstorrent coverage?
Thanks for those two links! BTW, the Twitter feed by Andrei also states that his December 1st piece would be his last one for Anandtech; if that's true, it'll be a huge loss for this site. I certainly also came here to read his reviews and deep dives in all things mobile and ARM/RISC. Ryan, if you read this, any comments or statements?
> Ryan, if you read this, any comments or statements?
What's there to say, besides "thanks" and that he'll be missed? That's about all you can ever do, when someone good decides to chart another path in their career/life.
Heh, there's been a fair bit of Intel news, since this article was published.
The tech industry doesn't stop for the holidays. Maybe companies hold off new consumer product announcements until CES, but plenty of tech news is still happening.
We’ve updated our terms. By continuing to use the site and/or by logging into your account, you agree to the Site’s updated Terms of Use and Privacy Policy.
62 Comments
Back to Article
Oxford Guy - Monday, December 6, 2021 - link
'The current in-order core design supports up to 8 cores in a single cluster.'In what workloads would an in-order design benefit from having so many cores. Or, is this more of a marketing bit?
vladx - Monday, December 6, 2021 - link
Both Cortex-A53 and Cortex-A55 are in-order designs with multicore support. It's all about starting small, with plans of big core designs focusing on performance coming after 2024.mode_13h - Monday, December 6, 2021 - link
Not only those, but even ARM's new Cortex-A510 remains in-order, when it really probably shouldn't!https://www.anandtech.com/show/16693/arm-announces...
Most seem to agree that they should leave the in-order stuff to their Cortex-A3xx series. However, they have yet to announce one of those for ARMv9 and I guess the X-series cores sort of demote the 7xx-series to mid-range, and that would mean the 5xx-series could effectively now replace the 3xx-series?
nandnandnand - Monday, December 6, 2021 - link
I'm waiting for an ARM SoC with Cortex-X2, Cortex-A710, Cortex-A510, Cortex-A310, and Cortex-M. F*** everything, we're doing five tiers!mode_13h - Monday, December 6, 2021 - link
:DOxford Guy - Tuesday, December 7, 2021 - link
‘ It's all about starting small, with plans of big core designs focusing on performance coming after 2024.’No, it’s about explaining why it makes sense to build an in-order architecture with the given die area and power budget that has 8 cores — versus an out-of-order design with fewer cores. What is/are the workloads where a high core count in-order design is more efficient?
vladx - Tuesday, December 7, 2021 - link
Do you think designing a good out-of-order processor requires the same effort as designing a in-order one? If they could've managed to design one by 2021/2022 they would've done it, the point of starting small is beginning with stuff that's easier to design. Not even ARM uses out-of-order execution in their efficient/small cores and they have much more experience in the area.Oxford Guy - Thursday, December 9, 2021 - link
Out of order CPUs are not fusion reactors. It is old tech.dotjaz - Friday, December 10, 2021 - link
It's also less efficient and much bigger.movax2 - Saturday, December 11, 2021 - link
Since 2018, Apple small cores are OoO and they are the most efficient right now.https://images.anandtech.com/doci/16983/SPECint-en...
mode_13h - Saturday, December 11, 2021 - link
> Apple small cores are OoO and they are the most efficient right now.It's not quick or easy to make them that efficient.
Also, you're only looking at efficiency in terms of perf/W. There are other metrics that count, such as area-efficiency, which roughly translates into perf/$.
Making an OoO core as fast and efficient as Apple's uses quite a lot more transistors, which could blow the price targets for whatever applications Imagination has in mind. Think embedded, IoT-style uses, which tend to be very cost-sensitive.
Kangal - Sunday, December 12, 2021 - link
You're right, but Apple has always envisioned and strived for innovation in performance, power, and area. Apple's small cores are actually not that large, but they are low-power, and have lots of performance on-tap. ARM will possibly follow in their lead, by designing a better low-power core for the industry; Qualcomm, Samsung, HiSilicon, MediaTek, RockChip, Unisoc, Allwinner, AMLogic, etc etc.Talking about Huawei.... was it not that ImaginationTechnologies (UK company) had a hostile take-over by a Chinese firm? Well, there have been a lot of senior engineers who ended up leaving the company. Some direct to Apple, others into other industries. So I think Anandtech is wrong here, they don't have the talent, but they sure do have the IP.
They are more than likely being pumped with cash to get these solutions out there, via State Subsidies like Huawei has. So they have the IP, and now they have the money, they have RISC-V as a CPU, there is SIMC lithography, Huawei can build the micro-modem, and lastly they have the open-source software such as JingOS. So all the necessary parts are there to build a fully functioning SoC for the Chinese market, without interferences from USA.
So if they continue on this path, they will achieve a SoC equivalent to the QSD 835 in the next 3 years. Probably equivalent to the Cortex-A73, on a 16nm node, with JingOS or similar software. And going further, after 5 years they will possibly match the likes of the QSD 888. That seems like a lifetime away in the tech industry. But another way of looking at it, is that it is Rapid Innovation, since they are almost starting from scratch.
dotjaz - Sunday, January 2, 2022 - link
Maybe you are too dumb to understand OOO can't do sub-100mW. Where's the efficiency ?dotjaz - Friday, December 10, 2021 - link
How do you justify the inefficiency and big die areas? One OOO core would be as big as 6-8 in-order cores especially stripped down microcontrolller cores. And it's not even matching the performance.OOO design has awful performance density and also much worse efficiency at sub 1GHz mark, end of the story.
And you want justification? Simple, packet processors. You need dirt cheap parallel processing power. OOS is simply useless.
mode_13h - Friday, December 10, 2021 - link
simply nailed it: )
Oxford Guy - Monday, December 27, 2021 - link
Yes, obviously Apple doesn't know what it's doing.mode_13h - Monday, December 6, 2021 - link
If your workload has enough concurrency (i.e. can be split up across enough threads), then having a large number of in-order cores is a far more area-efficient and power-efficient way to scale performance. This is exactly what GPUs do.Oxford Guy - Tuesday, December 7, 2021 - link
Then why not a GPU?mode_13h - Wednesday, December 8, 2021 - link
GPUs tend to have wide SIMD units and many-way SMT. If your code involves mostly scalar computations, then the SIMD registers + pipeline(s) are a waste of die space.As for SMT, each thread requires its own copy of architectural state, which means more registers and other structures (e.g. additional tags in things like caches and branch-predictors). They could certainly go SMT with these cores, as discussed below, but likely not to the same extent as GPUs.
Lastly, GPU ISAs tend to lack critical features and instructions needed to run a modern OS. Often, GPUs can't even do instruction-level traps. If you added some of those things to a GPU, they'd add overhead, making it less efficient at its main task.
It's a bit like asking why you can't use a chainsaw to carve a turkey. Both a chainsaw and a carving knife are sharp and made for cutting, but the chainsaw wouldn't be terribly "efficient" with the meat, while burning a lot of power and making a lot of noise.
CPU_junkie - Wednesday, December 8, 2021 - link
simply nailed itOxford Guy - Thursday, December 9, 2021 - link
So not so much ‘exactly what GPUs do’ after all.mode_13h - Friday, December 10, 2021 - link
The part about "having a large number of in-order cores" is indeed "exactly what GPUs do".Oxford Guy - Wednesday, December 15, 2021 - link
Which is it?dwillmore - Monday, December 6, 2021 - link
Would anyone trust devices based in Imagination IP? There would never be stable drivers for it.mode_13h - Monday, December 6, 2021 - link
I find that surprising to hear, given how many years Apple used them. I wouldn't expect Apple to tolerate unstable drivers for long.Anyway, the software stack for standard-ISA CPU cores is very different from what's needed for graphics. I'd like to say the vendor-supplied parts for it are vastly simpler than an entire graphics stack, but I know a great deal more about the complexities involved in a full graphics stack.
vladx - Monday, December 6, 2021 - link
MediaTek and Unisoc continue to integrrate Imagination GPUs successfully, the drivers issues were only on Windows at a time when when basically only Nvidia had decent drivers.Spunjji - Tuesday, December 7, 2021 - link
If we're talking the Windows Vista era, not even Nvidia had decent drivers!vladx - Tuesday, December 7, 2021 - link
Imagination Tech was already out of the desktop PC market by that time so no, I meant around 2002.lmcd - Saturday, December 11, 2021 - link
Briefly reentered on intermediate Atom releases.mode_13h - Monday, December 6, 2021 - link
I like their vision of being a RISC V- based competitor to ARM (at least, in the full IP sense), but it does seem like they're a bit late to the starting line. Aside from SiFive, there are plenty of other performance-oriented RISC V cores at various stages of development.https://www.nextplatform.com/2020/08/21/alibaba-on...
Also, how much money do they have to support these ambitions? Given their private ownership, I think that's probably now unanswerable.
vladx - Monday, December 6, 2021 - link
I can definitely see Huawei and other Chinese hardware makers using their designs for networking and other embedded devices due to not being under US' influence. As for consumer devices, it would be interesting to see Huawei porting HarmonyOS for RISC-V.mode_13h - Monday, December 6, 2021 - link
> I can definitely see Huawei and other Chinese hardware makers> using their designs for networking and other embedded devices
I'm sure they'll have plenty of homegrown RISC-V cores to use.
> it would be interesting to see Huawei porting HarmonyOS for RISC-V.
According to this, it's just a rebranded fork of Android 10. In that case, they can mostly just merge in the patches when Google ports Android to RISC-V.
https://en.wikipedia.org/wiki/HarmonyOS#Criticism
vladx - Monday, December 6, 2021 - link
There are plenty of changes to HarmonyOS from Android, especially at the kernel level which is quite different which would need serious work to port. But yeah, as a Mate 40 user running HarmonyOS 2.0 it would be really interesting to see a phone or tablet running HarmonyOS on a RISC-V device.Small Bison - Tuesday, December 7, 2021 - link
Imagination is wholly owned by Canyon Bridge Capital Partners, a Chinese equity fund, so while not technically homegrown, these cores are at least "homeowned".mode_13h - Wednesday, December 8, 2021 - link
Yes, and perhaps that has something to do with this development!https://www.techpowerup.com/289123/innosilicons-fe...
easp - Monday, December 6, 2021 - link
Imagination has (had?) customers for their various wares, which counts for a lot.eastcoast_pete - Monday, December 6, 2021 - link
So, in-order cores that are, however, multithreaded? Somebody please explain to me how that makes sense for a CPU design based on normally lean and area efficient RISC-V cores. Other implementations of RISC-V employed single-thread, out-of-order designs, and that made a lot of sense to me. This strikes me as a confusing core design that has the worst of both worlds; in order like the small ARM cores, but multi-threaded like big (large area) older style cores like x86 or Power. However, those are also typically out-of-order designs. So, what's up with your design, Imagination?mode_13h - Tuesday, December 7, 2021 - link
> So, in-order cores that are, however, multithreaded?> Somebody please explain to me how that makes sense
You had me confused, until I noticed one of the slides mentioned the CPU is "multithreaded". Are you sure that means SMT? It could be that RISC-V doesn't mandate context save/restore instructions, in which case they're just saying it'll have them.
And if they *do* mean SMT, then it's probably worth noting that ARM's only SMT cores are some of their automotive ones.
> worst of both worlds; in order like the small ARM cores, but multi-threaded
GPUs tend to be in-order and make heavy use of SMT. In that domain, it's certainly been a winning formula!
RandomStyuf - Tuesday, December 7, 2021 - link
I would wager that it's not smt but one of the similar multithreading capabilities like interleaved multi-threading. All you need for interleaved multithreading is two register files and keeping track of which instruction/pipeline stage is working on what (which is much simpler in non-supercaler, in order designs). The core front end just inputs an instruction from each thread at a time.The reason it's useful is that it effectively gives you constant(ish) performance for each thread like a multi core CPU, and it helps hide some of the penalties for branch mispredictions and other pipeline stalls (and even reduces the need to forward dependencies to avoid bubbles)
eastcoast_pete - Tuesday, December 7, 2021 - link
Both you and mode_13h make some good points; yes, for real-time and embedded microcontroller use, in-order makes a lot of sense. And, if the MT refers to interleaved MT, that wouldn't "bite" with the nature of these in-order designs and keep the area use small and efficient.I also re-read Ryan's article, and he did mention that Imagination stated they're working on an not yet specified out-of-order design of their "Catapult" line of RISC-V CPUs. In the meantime, the Chinese Academy of Sciences has apparently announced that they intend to roll out new RISC-V based CPU designs twice a year, including tape-outs of actual chips.
TeXWiller - Tuesday, December 7, 2021 - link
Take a look at the IBM's A2 cores and the RS64-II for example, or Sun's T-series. I think it's good to think SMT as a larger concept that what Intel's HT was.mode_13h - Wednesday, December 8, 2021 - link
> what Intel's HT was.You mean "is"? AFAIK, the widest Intel ever went with SMT (in x86) is 4-way, in the Knights Landing generation of Xeon Phi.
In their GPUs, they went 7-way for a while (including Gen 9, introduced with Skylake). I forget if it's still 7-way in Gen11, but I'm pretty sure they haven't disclosed how many in Xe.
MrHorizontal - Thursday, December 9, 2021 - link
I think the intention is to get to an OOO design. However, in order vs out of order isn't as important to a RISC ISA vs a CISC ISA - but it will have a small increase in a chip's efficiency. One of the basic paradigms of an OOO design is it effectively acts like a hardware thread scheduler on the CPU so it uses resources more efficiently - therefore far more important to a CISC design like x86. With RISC, it's more a situation of getting the data to the processing pipeline and be done with it, more like in a GPU design, so it's better to parallelize a lot of RISC cores than make them bigger.It goes back to the old comparisons between CISC and RISC. With CISC, fewer bigger cores are necessary. With RISC more simpler cores to spread the load. With RISC designs it's mostly a question of keeping the pipeline as short as possible (it's already magnitudes shorter than CISC, but feature bloat, particularly with variable-width instructions...) and the fact that RISC is much more sensitive to clockspeeds than CISC... there's an argument however that we're in the age where we're at the limits of clockspeeds on silicon irrespective of ISA so the value of CISC vs RISC is mooted. What's far important is small, simple cores and lots and lots of them.
On that basis out-vs-in order rather takes a back seat.
Oxford Guy - Thursday, December 9, 2021 - link
‘What's far important is small, simple cores and lots and lots of them.’Isn’t that quite workload-specific?
mode_13h - Friday, December 10, 2021 - link
Yeah, it depends on what market Imagination has in mind for these. As you can see from slides shown in the article, not all cores are advertised as supporting a "Rich OS". Others are labelled as "real-time processors".In an RTOS, threads are often used as a basic construct for prioritization and service guarantees. That implies applications using them are likely to be more heavily-threaded. Nvidia's latest self-driving SoC (Orin) has 12 ARM cores.
FunBunny2 - Tuesday, December 14, 2021 - link
"Isn’t that quite workload-specific?"whatever happened that machine that would parallelize sequential on the fly, and ship it as many cores as available? fact is, there still aren't all that many embarrassingly parallel problems. yes, in the innterTubes world servicing scads of independent connections is a worthy goal. just not the same thing.
mode_13h - Saturday, December 18, 2021 - link
> whatever happened that machine that would parallelize sequential on the flyYou mean Soft Machines / VISC?
https://www.theregister.com/2016/09/09/intel_soft_...
More: https://www.anandtech.com/show/10025/examining-sof...
> there still aren't all that many embarrassingly parallel problems.
Well, the HTML rendering engines in web browsers seem to have been parallelized for quite a while. I think a lot of apps parallelize rather well, if you can just reduce the communication & synchronization overhead by enough.
mode_13h - Friday, December 10, 2021 - link
> One of the basic paradigms of an OOO design is> it effectively acts like a hardware thread scheduler
Um... a little bit, I suppose. However, unless you're talking about scheduling instructions from different SMT threads, they operate at *completely* different scales.
> It goes back to the old comparisons between CISC and RISC.
I think you're right about one thing, which is that RISC simplifies the decoder, enabling a smaller minimum core size. And yes, that means perf/area should scale better, as you add more of them.
> RISC is much more sensitive to clockspeeds than CISC
Simple RISC uArchs are. When they get sufficiently complex, that stops being true.
> With RISC designs it's mostly a question of keeping the pipeline as short as possible
> (it's already magnitudes shorter than CISC
A few stages shorter, at most. You really shouldn't base so many assumptions on 30+ -year-old orthodoxy -- it will only lead you astray. Please compare specifics of some more modern CPUs.
Zoolook - Monday, December 13, 2021 - link
Back when there was a RISC vs CISC discussion we didn't have multicores, so yeah it feels a bit strange.mode_13h - Friday, December 10, 2021 - link
This week, the Linux Foundation held a RISC-V Summit. Since it's not being reported in Anandtech's news feed, you can see for yourself what it included:https://events.linuxfoundation.org/riscv-summit/pr...
mode_13h - Friday, December 10, 2021 - link
...something is going on with Anandtech. Financial problems?Thud2 - Saturday, December 11, 2021 - link
Holidays?lmcd - Saturday, December 11, 2021 - link
Andrei got poachedmode_13h - Saturday, December 11, 2021 - link
Uh, poached? Sounds like you're saying he's at another publication, but I think not.He previously held (presumably an engineering) position at Imagination Tech. I don't know if he was doing any comparable work, since then, and only doing these articles as a side gig... but, whatever is now his "real" job, apparently conflict of interest is keeping him from writing any more articles in this space.
Anyway, it seems like Andrei's beat was pretty much all things mobile. With RISC-V not being strictly mobile, you'd think somebody else could've potentially covered it.
Furthermore, just *look* at the lack of newsfeed articles! Check out other tech sites and you'll see what sorts of things they *could* be covering.
Maybe Anandtech is losing support/budget vs. Tom's Hardware. Toms tends not to go in quite as much depth as the better articles on this site but they have had some good writers. Paul Alcorn is usually pretty solid, and Lucian Armasu was one of their better newsfeed contributors (though he seems to have been inactive since the pandemic hit). I've not been reading it for a while, but maybe I'll have to start following it, again.
TeXWiller - Saturday, December 11, 2021 - link
Thanks for posting this.eastcoast_pete - Saturday, December 11, 2021 - link
Wondering about this, too. AWS's new Graviton (3) wasn't covered here, but several other sides did. Was wondering why Andrei or Ian didn't have an article about it here at AT? Yes, it's not something one can buy (strictly in-house), but the tech is still important.mode_13h - Saturday, December 11, 2021 - link
Yeah, it's a good point. I also read about that elsewhere.Seems to be using V1 cores, which came as a bit of a surprised (Graviton 2 used N1 and most people expected the next one to use N2). It also uses MCM, but the compute cores are all on a single die! Lots to talk about, really, including its performance estimates and power budget. Here's one place you can read about it:
https://semianalysis.substack.com/p/amazon-gravito...
Getting back to RISC-V news, imagine my surprise when I saw Ryan Smith retweet this, with nary a mention on this site:
https://twitter.com/andreif7/status/14614410876047...
That's right. Ryan and Andrei both thought it noteworthy enough to tweet about, but not even mention in a quick newsfeed blurb? Even after their extensive Tenstorrent coverage?
eastcoast_pete - Sunday, December 12, 2021 - link
Thanks for those two links! BTW, the Twitter feed by Andrei also states that his December 1st piece would be his last one for Anandtech; if that's true, it'll be a huge loss for this site. I certainly also came here to read his reviews and deep dives in all things mobile and ARM/RISC.Ryan, if you read this, any comments or statements?
mode_13h - Monday, December 13, 2021 - link
> Ryan, if you read this, any comments or statements?What's there to say, besides "thanks" and that he'll be missed? That's about all you can ever do, when someone good decides to chart another path in their career/life.
ABR - Thursday, December 16, 2021 - link
It still means something to say it. And he WILL be missed.Thud2 - Saturday, December 11, 2021 - link
Holidays?mode_13h - Monday, December 13, 2021 - link
Heh, there's been a fair bit of Intel news, since this article was published.The tech industry doesn't stop for the holidays. Maybe companies hold off new consumer product announcements until CES, but plenty of tech news is still happening.