Merging the Cortex-A and Cortex-R profiles is quite unlikely as Cortex-R has a host of features around Tightly Coupled Memories and multiple low latency interfaces for peripherals and external devices that we would not add to the Cortex-A profile. There are various other low level feature differences that are required for the Cortex-R real-time CPU's which would add unnecessary "weight" to Cortex-A application CPU's.
I wasn't aware of “real-time” processors. I've heard of real-time operating systems though. Afaik, they run on regular CPUs. So is this simply marketing speak, or a real special feature?
Not marketing speak. For example, interrupt latency and latency to peripherals needs to be low and deterministic in real-time CPU's. Less so for applications CPU's like Cortex-A. These points, as well as features like Tightly Coupled Memories and a number of other features are what drove the creation of the Cortex-R profile back in 2004 and the roadmap ever since.
I'm not an expert on this matter, far from it, but the way I understand it is this:
Real-time tasks: "Hey, processor, calculate X. No, you cannot do it after something else. I need it RIGHT NOW. No time to lose; get a move on."
Regular applications: "Ok, I need you to process X, Y and Z. Oh, you think doing Y first would make the whole thing be done sooner? Ok, that's fine, go ahead."
A real-time processor doesn't intrinsically process any differently from a normal applications processor. What differentiates it is that it bounds latencies and behaves deterministically. For example, rather than an interrupt latency on applications CPU taking anywhere from 50-1,000 cycles (or more) the interrupt latency can be bounded to under 40-cycles.
Tightly Coupled Memories allow certain routines and data to be stored within the CPU so there's never any chance that a cache eviction has taken place which forces a fetch from DDR (or flash). In a phone or laptop, you don't have routines that absolutely must be accessed in 5-cycles and can't wait 250-cycles for DDR. If you're controlling an HDD you can't have the read head crashing off the spinning disk! Or, in automotive, the spark plug firing at the wrong.
I've had phones with "fast" processors and wait for a stored program to load from onboard storage. It is frustrating that so much of the code is bloatware. I have seen too many newbie "programmers" cobble together something that works but is slow but management needs code any code to push out the door. Cheap labor gets a poor quality product especially when there is little oversight. There has to be more oversight on the software quality control / peer review.
In safety-critical hard-real-time systems, you want to prove that the system satisfies the time constraints under all circumstances. So you don't want to have good typical performance like mainstream CPUs; instead you want predictable worst-case performance. Features that improve typical performance, like caches, branch prediction, and out-of-order execution are not guaranteed to increase worst-case performance, in particular in statically provable ways. I heard about caches with true LRU replacement being pretty ok to analyze (but the usual Pseudo-LRU is much worse). Given that someone else posted that the R82 has a branch predictor, maybe someone found a way to analyze that, too, but again I expect that the branch predictor must have special properties to support the static analysis.
Not entirely sure how a real time processor needs to behave, but maybe the worst case scenario of a mispredict had it's penalty reduced in this design. The R8 was an 11 stage design and already had dynamic branch prediction, but that branch prediction isn't on by default so maybe there is a tradeoff in play to enable branch prediction:
However, the Cortex-R7 and Cortex-R8 do not enable branch prediction automatically at reset. This means software must enable branch prediction to get the maximum hardware performance.
Real-time is not about speed, it's about guarantying that a specific task will always be executed within a specific execution time, for example 10 milliseconds, not 15, not 5.
Analogy: your heart is a biological real-time piece of hardware, it beats at regular intervals and each beat lasts a specific time. The speed may vary if you make some efforts, but it is still real-time. If one's heart shows arrhythmia a pace maker is often the solution. The OS on the pace maker is real-time too for obvious reasons, the task of sending electric impulses must occur in "real time". Same for monitoring cycling events, like a car engine for instance.
Is this core a derivative of an existing Cortex-A design (like an A72?) or is this something completely different?
If it is a derivative, is there any public information on what the changes are? As a µArch enthusiast, I'd love to be given some more information on the changes done to the core and platform in general.
(yes, is is very much directed squarely at Peter Greenhalgh)
For those who don't know and assuming this isn't just someone using his name, Peter Greenhalgh is VP of Tech at ARM, so he probably knows his stuff and can be believed here!
The datasheet on Arm's website mentions this has a "Eight-stage, in-order, superscalar pipeline with direct and indirect branch prediction", a description that also fits the Cortex-A55, which suggests they are at least part of the same family tree.
It also mentions Armv8.4-A as a baseline architecture rather than Armv8.2-A that all of the latest application cores use, so there is more to it than just a Cortex-A55 with small modifications.
Couldn't find the quote myself, but unlike regular processors, wouldn't this design be used for many, many years? (Like R8 before it.) 5nm isn't that far off.
Now we need boards, where we can develop and deploy Linux to do the demanding tasks of today's demanding application, facial recognition, more robust AI, and a potful of thing we haven't even thought of yet! Powering prosthetic limbs thru neural supplementation, virus mapping thru huge cluster compute with lower power consumption. Low power 64 bit compute is the next hot tech.
We’ve updated our terms. By continuing to use the site and/or by logging into your account, you agree to the Site’s updated Terms of Use and Privacy Policy.
26 Comments
Back to Article
SarahKerrigan - Thursday, September 3, 2020 - link
Cortex-A has lockstep execution and Cortex-R has an MMU now? Wonder how long it's going to be worth maintaining them as separate product families.Peter Greenhalgh - Thursday, September 3, 2020 - link
Merging the Cortex-A and Cortex-R profiles is quite unlikely as Cortex-R has a host of features around Tightly Coupled Memories and multiple low latency interfaces for peripherals and external devices that we would not add to the Cortex-A profile. There are various other low level feature differences that are required for the Cortex-R real-time CPU's which would add unnecessary "weight" to Cortex-A application CPU's.skavi - Friday, September 4, 2020 - link
(https://www.linkedin.com/in/peter-greenhalgh-89007...PaulHoule - Thursday, September 3, 2020 - link
Looks like Risc-V has some competition.FreckledTrout - Thursday, September 3, 2020 - link
I'm a fan of RISC-V but it never really has caught up to ARM as far as actual core implementations go.Desierz - Thursday, September 3, 2020 - link
I wasn't aware of “real-time” processors. I've heard of real-time operating systems though. Afaik, they run on regular CPUs. So is this simply marketing speak, or a real special feature?Peter Greenhalgh - Thursday, September 3, 2020 - link
Not marketing speak. For example, interrupt latency and latency to peripherals needs to be low and deterministic in real-time CPU's. Less so for applications CPU's like Cortex-A. These points, as well as features like Tightly Coupled Memories and a number of other features are what drove the creation of the Cortex-R profile back in 2004 and the roadmap ever since.Desierz - Thursday, September 3, 2020 - link
Aha. I wasn't aware. Thanks for clearing that up for me!quiksilvr - Thursday, September 3, 2020 - link
I never appreciated how dumb I was until I read through this explanation three times and I still don't grasp this concept of "real-time" processing.eddman - Thursday, September 3, 2020 - link
I'm not an expert on this matter, far from it, but the way I understand it is this:Real-time tasks: "Hey, processor, calculate X. No, you cannot do it after something else. I need it RIGHT NOW. No time to lose; get a move on."
Regular applications: "Ok, I need you to process X, Y and Z. Oh, you think doing Y first would make the whole thing be done sooner? Ok, that's fine, go ahead."
Peter Greenhalgh - Thursday, September 3, 2020 - link
A real-time processor doesn't intrinsically process any differently from a normal applications processor. What differentiates it is that it bounds latencies and behaves deterministically. For example, rather than an interrupt latency on applications CPU taking anywhere from 50-1,000 cycles (or more) the interrupt latency can be bounded to under 40-cycles.Tightly Coupled Memories allow certain routines and data to be stored within the CPU so there's never any chance that a cache eviction has taken place which forces a fetch from DDR (or flash). In a phone or laptop, you don't have routines that absolutely must be accessed in 5-cycles and can't wait 250-cycles for DDR. If you're controlling an HDD you can't have the read head crashing off the spinning disk! Or, in automotive, the spark plug firing at the wrong.
Latency and determinism can be very important!
JACK4888 - Sunday, September 6, 2020 - link
I've had phones with "fast" processors and wait for a stored program to load from onboard storage. It is frustrating that so much of the code is bloatware. I have seen too many newbie "programmers" cobble together something that works but is slow but management needs code any code to push out the door. Cheap labor gets a poor quality product especially when there is little oversight. There has to be more oversight on the software quality control / peer review.AntonErtl - Friday, September 4, 2020 - link
In safety-critical hard-real-time systems, you want to prove that the system satisfies the time constraints under all circumstances. So you don't want to have good typical performance like mainstream CPUs; instead you want predictable worst-case performance. Features that improve typical performance, like caches, branch prediction, and out-of-order execution are not guaranteed to increase worst-case performance, in particular in statically provable ways. I heard about caches with true LRU replacement being pretty ok to analyze (but the usual Pseudo-LRU is much worse). Given that someone else posted that the R82 has a branch predictor, maybe someone found a way to analyze that, too, but again I expect that the branch predictor must have special properties to support the static analysis.michael2k - Friday, September 4, 2020 - link
Evidently the R7 had both dynamic and static branch prediction, so somehow they were able to satisfy the real time constraint in 2012.This says even the R8 had dynamic branch prediction:
https://community.arm.com/developer/ip-products/pr...
Not entirely sure how a real time processor needs to behave, but maybe the worst case scenario of a mispredict had it's penalty reduced in this design. The R8 was an 11 stage design and already had dynamic branch prediction, but that branch prediction isn't on by default so maybe there is a tradeoff in play to enable branch prediction:
However, the Cortex-R7 and Cortex-R8 do not enable branch prediction automatically at reset. This means software must enable branch prediction to get the maximum hardware performance.
domih - Tuesday, September 15, 2020 - link
Real-time is not about speed, it's about guarantying that a specific task will always be executed within a specific execution time, for example 10 milliseconds, not 15, not 5.Analogy: your heart is a biological real-time piece of hardware, it beats at regular intervals and each beat lasts a specific time. The speed may vary if you make some efforts, but it is still real-time. If one's heart shows arrhythmia a pace maker is often the solution. The OS on the pace maker is real-time too for obvious reasons, the task of sending electric impulses must occur in "real time". Same for monitoring cycling events, like a car engine for instance.
ZeDestructor - Thursday, September 3, 2020 - link
Is this core a derivative of an existing Cortex-A design (like an A72?) or is this something completely different?If it is a derivative, is there any public information on what the changes are? As a µArch enthusiast, I'd love to be given some more information on the changes done to the core and platform in general.
(yes, is is very much directed squarely at Peter Greenhalgh)
Andrei Frumusanu - Thursday, September 3, 2020 - link
I would assume they'd make µarch disclosures in the future. If it's a derivative, then it would be of the R8.Peter Greenhalgh - Thursday, September 3, 2020 - link
Not an R8 derivative. You'll have to wait to find out more though!Alistair Symonds - Friday, September 4, 2020 - link
For those who don't know and assuming this isn't just someone using his name, Peter Greenhalgh is VP of Tech at ARM, so he probably knows his stuff and can be believed here!michael2k - Friday, September 4, 2020 - link
Then the name R82 is kind of misleading :)arnd - Thursday, September 3, 2020 - link
The datasheet on Arm's website mentions this has a "Eight-stage, in-order, superscalar pipeline with direct and indirect branch prediction", a description that also fits the Cortex-A55, which suggests they are at least part of the same family tree.It also mentions Armv8.4-A as a baseline architecture rather than Armv8.2-A that all of the latest application cores use, so there is more to it than just a Cortex-A55 with small modifications.
dotjaz - Thursday, September 3, 2020 - link
I find it unrealistic where they list PPA figures assuming 5nm will be used.Hul8 - Sunday, September 6, 2020 - link
Couldn't find the quote myself, but unlike regular processors, wouldn't this design be used for many, many years? (Like R8 before it.) 5nm isn't that far off.ballsystemlord - Thursday, September 3, 2020 - link
@Andrei , what's the power efficiency of the core compared to the R8?ksec - Friday, September 4, 2020 - link
So we will end up having a Real Time Linux Sub Storage System within our system?Anyway cant wait to see more details. I am guessing this will be the key piece as we progress and push from PCI-E 4.0 to future PCI-E 6.0 SSD.
JACK4888 - Sunday, September 6, 2020 - link
Now we need boards, where we can develop and deploy Linux to do the demanding tasks of today's demanding application, facial recognition, more robust AI, and a potful of thing we haven't even thought of yet! Powering prosthetic limbs thru neural supplementation, virus mapping thru huge cluster compute with lower power consumption. Low power 64 bit compute is the next hot tech.