I'm always fascinated by just how great a portion of the die is taken up by uncore components. The CPU and GPU clusters here look like they take up ... 30% of the non-I/O parts of the design? A bit more? I guess various interconnects take up a good deal of space, as well as the NPU (even if it can't be easily identified). But the rest? How much space does a video encode/decode block need, even supporting all kinds of codecs at 4k60? And what are those giant caches (I think?) spread across the die for? If the L3 blocks outlined are 4MB, there's at least as much in the blotchy lumps directly below the CPU cluster.
It would be incredibly cool if some SoC vendor gave an in-depth explanation of the layout of one of their chips, even an older one like the SD820 or similar, detailing all the various parts and how they are arranged and connected.
@Andrei: Thanks! Always enjoy your reviews of SoC architecture. Two wishes/requests: 1. Please do a deep(er) dive into the current state of NPUs on current and upcoming mobile SoCs. I was really struck by the size of the NPUs on Apple's A12, but don't know how heterogeneous both architecture/real estate and capabilities (known or rumored) of NPU circuitry are that are found in the A12, the Kirin 980 and the Snapdragon 845. Don't know if Samsung's newest mongoose has any NPUs on it, but I've been wrong before. 2. You'll probably address this as part of your Mate 20/Mate 20 Pro review, but this Kirin may, right now, have the most powerful big core in the mobile space outside Apple's Vortex (big A12 core). Look forward to a side-by-side comparison, as much as this is even possible. Thanks!
Based on this die shot and the A12's, the Vortex and A76 appear fairly similar in size. Are they annotated differently, or are they actually that close?
The bright side is that it looks like A76 is actually quite smaller than what I expected it will be but on the other hand it doesn't actually meet the performance increase promised by ARM (I calculated 50% gain clock per clock when compared to the A73/A72 while I expected about 66% still this way its ahead of my expectation for performance/gate). All in all it really looks like A76 is a remarkable design. Hopefully we will see more in depth analysis soon.
Well according to Andrei's post here https://www.realworldtech.com/forum/?threadid=1812... it is about 80% IPC improvement on SPECINT and 100% on SPECFP over Cortex-A73. I don't see how that is less than promised, it looks far more. Combined with a small frequency increase we're looking at a solid doubling of performance in less than 2 years!
First off all it's in five years as A73 didn't bring any performance uplift compared to the A72, it whose however a more power efficient design. I base my projection on Geekbench publicly available scores not a server industrial one's. I specifically said clock per clock comparation. If you think how sustainable leakage threshold changed with smaller FinFET node's well think again. You either get a big reduction in power consumption or opportunity to get a modest clock bump at same consumption that's why wider OoO designs are bein made in the first place. As the looks of things A76 is less then 2x the size of the A73 while achieving 50% better performance is actually huge design win because it will use significantly less power at 2GHz then A73 on 3GHz disregarding of used node. To put the things into right perspective initial TSMC 7nm node has 2x density & 50% lower power consumption of the first generation 14 nm on which A73's ware build. On the other hand Samsung second gen 7nm HD node the first one with EUV had the 66/60% higher density than their 10 nm LPE/14 nm UHD third generation (11nm). As the things go you want be going frequency bumps neither now nor in the future. We will see & discus about A76 after the Andrei is finished, before that it simply doesn't have sense.
Why would anyone take geekbench over a standarized server benchmark to compare CPU designs? Anandtech already proved its inconsistencies during the Exynos 9810 vs SD845 S9 review.
Because this ain't server CPU nor will be used like that. While being far from optimal Geekbench is a set of comprehensive benchmarks. You cant get consistent result on something like Android at all. What you can get are normalised ones on large pattern.
5 years?!? Even Cortex-A57 is less than 4 years old, first Exynos using it was late 2014... Cortex-A72 was never popular in high-end phones, both QC and Samsung went straight to Cortex-A73 after Cortex-A57.
Even if GB4 shows 50% IPC gain overall, it looks IPC on floating point is also up 2x like SPECFP. It's not clear why, maybe FP and SPEC benefit more from the better memory system and larger caches.
Neither of them went straight to A73...when I say 5 years I mean it! Take a look at references at wiki. https://en.m.wikipedia.org/wiki/ARM_Cortex-A72 SIMD (NEON) does indeed benefit a lot from faster access & it whose almost at the point of being useless on A72, A73's because it whose only a little bit faster than VFP. So 2x on SIMD still ain't all that much. It's still less than two A55's which use half the power. Until SSE kicks in & on small core's you won't see anything remarkable regarding FPU SIMD performance on ARM. Even it precompiled version of Geekbench is pretty bad regarding optimisations for any platform it's still good enough pointer what you will be getting in the real world.
You seem to have an issue with counting, 5 years ago Cortex-A72 didn't exist and phones were all 32 bit. First Cortex-A72 in a high-end mobile was early 2016, less than 3 years ago: https://www.anandtech.com/show/9878/the-huawei-mat...
Technically it's 4 years & 9 months since Cortex A72 whose publicly announced by ARM. Your intelligence whose non existed then as it is now not even on tau level (two bit).
You can only compare with the first time Cortex-A72 was available in high-end phones, ie. only 3 years ago. Using the A72 announcement is lying especially if you use it to claim phone SoC performance hasn't increased dramatically. Can you understand that?
It whosent only announced, the IP whose ready for licensing. It took almost two years to see the first implementation in phone's. ARM worked a lot with founderis in the mean time & developed POP IP's of newer core's shortening the time to silicone to eith - nine months now. We are seeing the first A76 on silicone in less than six months from time ARM announced it publicly meaning it whose announced at last six months before that to crucial partners & IP's are probably available for 10~11 months. The performance in general use only got a minor bump, primarily working core's are still small in order one's. A55 is only about 12% faster than A53 when you add into the mix and transition from planar to FinFET for midrange segment with 15~20% clock bump you end up with 25~30% combined in those 3+ years.
Splitting the SoC into smaller parts does not make sense since it is less than 100mm^2 - it is already a chiplet. Note a phone SoC already combines multiple dies made using different processes: DRAM and flash dies are stacked vertically in the same package.
We’ve updated our terms. By continuing to use the site and/or by logging into your account, you agree to the Site’s updated Terms of Use and Privacy Policy.
27 Comments
Back to Article
psychobriggsy - Wednesday, November 7, 2018 - link
That improves the transistor density measurement somewhat to 93M/mm^2 - very dense for a real-world design.Valantar - Wednesday, November 7, 2018 - link
I'm always fascinated by just how great a portion of the die is taken up by uncore components. The CPU and GPU clusters here look like they take up ... 30% of the non-I/O parts of the design? A bit more? I guess various interconnects take up a good deal of space, as well as the NPU (even if it can't be easily identified). But the rest? How much space does a video encode/decode block need, even supporting all kinds of codecs at 4k60? And what are those giant caches (I think?) spread across the die for? If the L3 blocks outlined are 4MB, there's at least as much in the blotchy lumps directly below the CPU cluster.It would be incredibly cool if some SoC vendor gave an in-depth explanation of the layout of one of their chips, even an older one like the SD820 or similar, detailing all the various parts and how they are arranged and connected.
Andrei Frumusanu - Wednesday, November 7, 2018 - link
Top off my head:- CPU complexes
- GPU
- NPU
- Modem
- ISP (multiple blocks, multiple cameras)
- DSP
- Video enc & dec
- Image scalers and encoders/decoders
- Sensor hub/microcontroller
- Display controller
- UFS/eMMC controllers
- Peripherals (USB, HSIC, etc)
- A ton of random microcontrollers
- Audio subsystem (CPU and all)
- Secure subsystem (CPU and all)
- Memory controllers
- Interconnects for it all
You can't really hope to identify most of these, especially if they're smaller blocks.
nullington - Wednesday, November 7, 2018 - link
Whats that in the center of the upper edge? NPU?DanNeely - Wednesday, November 7, 2018 - link
stuff on the edge is almost always IO of some sort.nullington - Wednesday, November 7, 2018 - link
It already has the GPU core Layout 1 near the edge.name99 - Wednesday, November 7, 2018 - link
Those are 16 ethernet ports. Did you not know that this thing is also a kick-ass switch?:-)
eastcoast_pete - Wednesday, November 7, 2018 - link
@Andrei: Thanks! Always enjoy your reviews of SoC architecture.Two wishes/requests: 1. Please do a deep(er) dive into the current state of NPUs on current and upcoming mobile SoCs. I was really struck by the size of the NPUs on Apple's A12, but don't know how heterogeneous both architecture/real estate and capabilities (known or rumored) of NPU circuitry are that are found in the A12, the Kirin 980 and the Snapdragon 845. Don't know if Samsung's newest mongoose has any NPUs on it, but I've been wrong before.
2. You'll probably address this as part of your Mate 20/Mate 20 Pro review, but this Kirin may, right now, have the most powerful big core in the mobile space outside Apple's Vortex (big A12 core). Look forward to a side-by-side comparison, as much as this is even possible. Thanks!
Andrei Frumusanu - Wednesday, November 7, 2018 - link
Both will be in the review.skavi - Wednesday, November 7, 2018 - link
Based on this die shot and the A12's, the Vortex and A76 appear fairly similar in size. Are they annotated differently, or are they actually that close?Dragonstongue - Wednesday, November 7, 2018 - link
power optimised would be like 900Mhz and the 2.6 for the "performance" cores"They got the technology gentleman"
ZolaIII - Wednesday, November 7, 2018 - link
The bright side is that it looks like A76 is actually quite smaller than what I expected it will be but on the other hand it doesn't actually meet the performance increase promised by ARM (I calculated 50% gain clock per clock when compared to the A73/A72 while I expected about 66% still this way its ahead of my expectation for performance/gate). All in all it really looks like A76 is a remarkable design. Hopefully we will see more in depth analysis soon.Wilco1 - Wednesday, November 7, 2018 - link
Well according to Andrei's post here https://www.realworldtech.com/forum/?threadid=1812... it is about 80% IPC improvement on SPECINT and 100% on SPECFP over Cortex-A73. I don't see how that is less than promised, it looks far more. Combined with a small frequency increase we're looking at a solid doubling of performance in less than 2 years!ZolaIII - Thursday, November 8, 2018 - link
First off all it's in five years as A73 didn't bring any performance uplift compared to the A72, it whose however a more power efficient design. I base my projection on Geekbench publicly available scores not a server industrial one's. I specifically said clock per clock comparation. If you think how sustainable leakage threshold changed with smaller FinFET node's well think again. You either get a big reduction in power consumption or opportunity to get a modest clock bump at same consumption that's why wider OoO designs are bein made in the first place. As the looks of things A76 is less then 2x the size of the A73 while achieving 50% better performance is actually huge design win because it will use significantly less power at 2GHz then A73 on 3GHz disregarding of used node. To put the things into right perspective initial TSMC 7nm node has 2x density & 50% lower power consumption of the first generation 14 nm on which A73's ware build. On the other hand Samsung second gen 7nm HD node the first one with EUV had the 66/60% higher density than their 10 nm LPE/14 nm UHD third generation (11nm). As the things go you want be going frequency bumps neither now nor in the future. We will see & discus about A76 after the Andrei is finished, before that it simply doesn't have sense.NICOXIS - Thursday, November 8, 2018 - link
Why would anyone take geekbench over a standarized server benchmark to compare CPU designs? Anandtech already proved its inconsistencies during the Exynos 9810 vs SD845 S9 review.ZolaIII - Friday, November 9, 2018 - link
Because this ain't server CPU nor will be used like that. While being far from optimal Geekbench is a set of comprehensive benchmarks. You cant get consistent result on something like Android at all. What you can get are normalised ones on large pattern.Wilco1 - Thursday, November 8, 2018 - link
5 years?!? Even Cortex-A57 is less than 4 years old, first Exynos using it was late 2014... Cortex-A72 was never popular in high-end phones, both QC and Samsung went straight to Cortex-A73 after Cortex-A57.Even if GB4 shows 50% IPC gain overall, it looks IPC on floating point is also up 2x like SPECFP. It's not clear why, maybe FP and SPEC benefit more from the better memory system and larger caches.
ZolaIII - Friday, November 9, 2018 - link
Neither of them went straight to A73...when I say 5 years I mean it!Take a look at references at wiki.
https://en.m.wikipedia.org/wiki/ARM_Cortex-A72
SIMD (NEON) does indeed benefit a lot from faster access & it whose almost at the point of being useless on A72, A73's because it whose only a little bit faster than VFP. So 2x on SIMD still ain't all that much. It's still less than two A55's which use half the power. Until SSE kicks in & on small core's you won't see anything remarkable regarding FPU SIMD performance on ARM. Even it precompiled version of Geekbench is pretty bad regarding optimisations for any platform it's still good enough pointer what you will be getting in the real world.
Wilco1 - Saturday, November 10, 2018 - link
You seem to have an issue with counting, 5 years ago Cortex-A72 didn't exist and phones were all 32 bit. First Cortex-A72 in a high-end mobile was early 2016, less than 3 years ago: https://www.anandtech.com/show/9878/the-huawei-mat...ZolaIII - Sunday, November 11, 2018 - link
Technically it's 4 years & 9 months since Cortex A72 whose publicly announced by ARM. Your intelligence whose non existed then as it is now not even on tau level (two bit).Wilco1 - Sunday, November 11, 2018 - link
You can only compare with the first time Cortex-A72 was available in high-end phones, ie. only 3 years ago. Using the A72 announcement is lying especially if you use it to claim phone SoC performance hasn't increased dramatically. Can you understand that?ZolaIII - Sunday, November 11, 2018 - link
It whosent only announced, the IP whose ready for licensing. It took almost two years to see the first implementation in phone's. ARM worked a lot with founderis in the mean time & developed POP IP's of newer core's shortening the time to silicone to eith - nine months now. We are seeing the first A76 on silicone in less than six months from time ARM announced it publicly meaning it whose announced at last six months before that to crucial partners & IP's are probably available for 10~11 months. The performance in general use only got a minor bump, primarily working core's are still small in order one's. A55 is only about 12% faster than A53 when you add into the mix and transition from planar to FinFET for midrange segment with 15~20% clock bump you end up with 25~30% combined in those 3+ years.NICOXIS - Thursday, November 8, 2018 - link
Can the chiplet approach be implemented in ARM designs?Wilco1 - Thursday, November 8, 2018 - link
Splitting the SoC into smaller parts does not make sense since it is less than 100mm^2 - it is already a chiplet. Note a phone SoC already combines multiple dies made using different processes: DRAM and flash dies are stacked vertically in the same package.ZolaIII - Sunday, November 11, 2018 - link
Sure it can & much, much more.https://fuse.wikichip.org/wp-content/uploads/2018/...
TechDeal - Saturday, November 10, 2018 - link
The size is a bit disappointing. I really hope single core performance improves as much as expected.malisajason - Monday, November 26, 2018 - link
This article is informative.