If the Cortex A7s are meant to be low powered cores, wouldn't it make more sense to have 4 Cortex A15s and then 2 Cortex A7s? Most tasks that won't need a lot of power would be the ones needing a dual core or less, not a quad core. Usually it's the demanding tasks that would take full advantage of the higher core count. The 5260 doesn't really make sense to me.
As I know it, the A7's are perfectly fine for day to day tasks, it's only on load times that the Cortex A15's are loaded to ramp up and speed up load times. They're kind of like the Nitro for the processors.
A7's perform perfectly fine though, as you'll see from the Motorola Moto G.
Having two high power cores means that you can concentrate on getting the hard work done quickly with the least amount of power expenditure. Having two cores running the show means that they can stay clocked up for longer than a similar quad core processor, being far more effective in the lightly threaded workloads pervasive on mobile platforms. Having four low power cores means that you can do as many background tasks as necessary with minimal power draw. Ideally, you'll probably be using only one or two of those cores, but you can still get a significant amount done with all four before needing two switch up to two Cortex A15s and really amp up your power draw. I'd imagine that all four Cortex A7s would run in less power envelope than the two Cortex A15s.
I think the 2x A15 4x A7 split makes sense. Everything I've seen for phone work loads says that it's very uncommon to have more than 2 heavily loaded cores with the 3rd and 4th cores usually being used less frequently and often at lower speeds. That makes sense given how hard it is to multithread most things well and doing so across 4+ cores is even harder. Usually the only things that can consistently and easily max out multiple cores like that is things that either aren't done on a phone (software rendering in Max, maya etc) or are done with dedicated hardware like compression. On the other hand a phone is full of low performance background tasks like checking for updates, sending and receiving messages, streaming music, etc. that are perfect for an A7
The Hexacore seems to have one A7 core too many. I would go for a PentaCore config with three A7 cores but one of the A7 cores dedicated to I/O processing. Solely so and the scheduler should be aware of it. This way the twin A15 has a perfect A7 shadow core to offload to. Minimising the cache snoops. With I/O tuned to a dedicated A7 core, it becomes a simpler SoC.
There whole POINT of the slow background tasks you describe is that they are SLOW.
You do realize you can multitask multiple such tasks on a single CPU? I ask this in all seriousness because EVERY FREAKING TIME this issue comes up, 90% of the commenters act as though they do not understand and are completely unaware of this. If the total CPU requirements of all your background tasks are less than the performance of a single A7 (and they damn well should be, otherwise you have a very crappy OS) then the optimal solution is a single core running all the tasks.
I'm with deltatux. only I'd be more extreme. Much like I stated when big.LITTLE was first announced, I STILL believe that for all realistic scenarios the only config that makes sense is a SINGLE slow companion core. nVidia don't get much right, but this was one they did get right.
The big.LITTLE matching core scheme is justifiable if you have a crappy OS (which WAS the case back then) which can't do a decent job of handling heterogenous cores, so you're using two cores to fake a single high dynamic range core. But it make no sense when you have a heterogenous-aware OS.
You can't use 4 A15's at the same time in a smartphone thermal envelope anyway, despite manufacturers putting them in. So running 2 of them at maximum frequency is hardly slower than 4 of them running at significantly reduced frequency.
Do you know this for a fact? There are three possible pain points:
- the temperature gets too high
- the power draw gets too high
- the power draw becomes "undesirable" ie short battery life.
The first two are nonnegotiable, the third is a matter of opinion, and some people might be happy with it, especially if connected to power. My understanding was that the issue is the second and third points more than the first. The first is an interesting point because if you allow for thermal inertia (ie like modern Intel) you get a few milliseconds to run everything at high speed, and that may be valuable for snappiness, although not for on-going performance.
I suppose 5260 will be a good test, but I'm still not sold on big.LITTLE. The problem remains largely a software one because the DVFS is handled by the kernel, so it isn't totally insurmountable, but there needs to be some very serious vertical cooperation here in order to make big.LITTLE both as power-efficient and fast as it theoretically can be. I'm sure there's more room for simple kernel-only optimizations based on the phone state (power-saving mode, screen on, perhaps even connectivity). Perhaps there could be an application-level switch indicating whether a task is time-sensitive and should be performed on a big core (e.g. UI event handler) or lazier background work and can be performed on LITTLE cores (e.g. file transfers, synchronization). Still, that would put an additional onus on developers to write good, power-efficient mobile apps, which is already far too uncommon.
Hopefully Samsung can get their CPU governors to a state where the UI never drops frames due to a core swap, but anecdotally--just comparing 5420 devices to their MSM8974 equivalents--they've got a long way to go.
Could someone explain why Samsung is still using Qualcomm SoCs for their LTE-enabled devices? Surely Samsung could have produced their own LTE compatible Exynos SoC by now, no?
Exynos is LTE compatible. There are many variants of SGS4 and Note 3 with Exynos and LTE. It's just that in certain markets (US, Canada, Australia) carriers are very picky and require a lot of testing and validation of new platforms. Qualcomm already does that heavy lifting with its SoCs so if you use a Qualcomm platform you can skip a lot of that grunt work whereas if Samsung wanted to use its own SoC, that would require a lengthy and time-consuming validation and testing process with these carriers and time to market is very important in this industry.
+1 I would like to add to that one point. Samsung, like any other OEM, is a business first and last. They'll go with whatever is profitable, especially since Qualcomm's are very popular (and reliable) ATM.
Yeah I agree that 2+4 is probably the best combination.. 4 A7's at up to 2 GHz and 2 A15's(r3p3) at 2.1GHz would be better than the 8 core Exynos 5422 imo.
5422 HMP sounds a bit pointless: Do they really think you're going to be able to run 8 cores simultaneously in thought thermal evelope? Sounds unlikely to me!
HMP doesn't imply that all 8 cores must be running simultaneously, or at their maximum frequency. HMP means that if you say have 1 high performance task and 3 low performance tasks, it may use 3 little cores and 1 big core. Without HMP you'd be forced to use 2 or 3 big cores, which is less efficient.
But surely if you want 1 big and 3 small cores you could just use CPU migration (the second variant of big.little) without having to go full fat hmp, no?
Hmp only adds value if you plan to have 5+cores firing at once, right? But I reckon once you have more than a couple of A15s firing at once you'll get getting into thermal throttling territory anyway in a typical smartphone? (May be diff for tablets)
"Hmp only adds value if you plan to have 5+cores firing at once, right?"
Nope. "Ideal" HMP adapts more efficiently, and optimally, to any type workload. HMP is more dynamic than other platforms. If you only need 1 little core, then only one little core is fired up, which is more efficient than, for example, having 1 krait core fired up. If you need 1 big core, then only one is loaded, which (again) is faster than a krait core. That's what I mean by "dynamic". There's only so much optimization you can do to an individual core to handle both light and heavy workloads, having 2 (one optimized for either) gives you a better contrast in power draw.
This, of course, stands IF a chip can power gate each core individually, not only on the cluster level. Don't forget that power draw isn't only dependent on CPU cores, but a platform as a whole (cores, cache, interconnect, bus, RAM, etc..)
Having 4 little cores makes a LOT of sense, since most workloads aren't heavy (most users are on messaging apps and light games), big cores are only needed less than 10% of the time (for most consumers) to generally increase responsiveness of a platform. As far as I undersand, 2 A7 cores draw less power than 1 A15, so loading multiple smaller tasks on MORE little cores should be more efficient than loading multiple tasks on a big core. That said, there are times when loading tasks on big cores is more efficient since they'll be able to be processed significantly faster. This all depends on the type of workload and user habits, that's why designing a kernel that handles all of that isn't an easy task.
Throttling is an issue of any/every high performance core on 28nm, of course, but (again) don't forget that the big cores do more "work" before they get to that point, and the more you have of them, the better when needed, IF the power gating conditions above are met, even if only 1 or 2 of them are needed most of the time during heavy workloads.
There isn't an easy way to describe/explain this. But a less ideal setup (in MOBILE) would be having less small cores than big. Exynos 5422 is the better chip in ideal HMP. It's faster than the 5260, but the 5260 isn't necessarily more power efficient ;) And yes, I don't agree with the author on which is the more "ideal" setup. It all depends on the efficiency of the implemenation and the software/firmware behind all of that at this point.
I agree. 2+4 cores is not by definition "ideal" like Anand suggests. If you have 4+4 with HMP then if the workload requires only 2 big cores, it works exactly like 2+4. However you can also fire up 2 extra big cores *if* you need to. Also due to voltage scaling running a workload on 4 big cores at a reduced frequency is far more power efficient than running 2 cores at their maximum frequency.
So it is obvious that 4+4 is always better than 2+4, both in terms of performance and power efficiency. However it is also true that 2+4 is good enough (since A15 is fast), and the main reason for it is cost/area optimization for mid-end devices.
I think that ARM have explained their design philosophy for smart phones; 4 A53s and 2 A57s plus a Mali600. The latter gives hundreds of Gflops/s. No more A57s are needed, because the GPGPU is so powerful. Most benchmarks used, e.g. by Anandtech, appear to be futile, because they do not recognise the use of the GPGPU. According to ARM, Android 4.2 on has supported RenderScript, e.g. on the Google Nexus 10. The Cache Coherent Interconnect supports the GPGPU. This was demonstrated on a stock Nexus 10 in June 2013. ARM and their partners give dozens of use cases for a GPGPU. One use is to provide ISP functionality in software; no ISP hardware required. Another is H.265 in software, again, no special hardware required. This can provide up to 8k resolution, with 2:1 speed-up in compression compared with H.264. There are hundreds of companies working in the ARM ecosphere, and it must be very difficult to follow all developments. Products will gradually appear on the market, and then we will all become more aware of the technical details. Intel do not seem to be competitve. ARM state that they will have 100% penetration of the mobile phone market; we shall see if this is true. The Intel processors for PCs may become redundant, because of these developments. The collosal mobile phone market is driving processor developments, and they will ripple up the processing hierarchy to server farms and HPC. Comparing CPUs is not the point. It is the GPGPUs that count.
We’ve updated our terms. By continuing to use the site and/or by logging into your account, you agree to the Site’s updated Terms of Use and Privacy Policy.
35 Comments
Back to Article
sherlockwing - Tuesday, February 25, 2014 - link
Do you have info on whether these are paired with Intel LTE chips?webmastir - Wednesday, February 26, 2014 - link
I don't think they typically answer questions in comments.xaueious - Tuesday, April 29, 2014 - link
And they did it. Galaxy K Zoom.sherlockwing - Tuesday, February 25, 2014 - link
Nvm, skipped the part where you mentioned Intel LTE.deltatux - Tuesday, February 25, 2014 - link
If the Cortex A7s are meant to be low powered cores, wouldn't it make more sense to have 4 Cortex A15s and then 2 Cortex A7s? Most tasks that won't need a lot of power would be the ones needing a dual core or less, not a quad core. Usually it's the demanding tasks that would take full advantage of the higher core count. The 5260 doesn't really make sense to me.hamsteyr - Tuesday, February 25, 2014 - link
As I know it, the A7's are perfectly fine for day to day tasks, it's only on load times that the Cortex A15's are loaded to ramp up and speed up load times. They're kind of like the Nitro for the processors.A7's perform perfectly fine though, as you'll see from the Motorola Moto G.
jjj - Tuesday, February 25, 2014 - link
You activate individual cores not the entire cluster lol.coder543 - Tuesday, February 25, 2014 - link
Having two high power cores means that you can concentrate on getting the hard work done quickly with the least amount of power expenditure. Having two cores running the show means that they can stay clocked up for longer than a similar quad core processor, being far more effective in the lightly threaded workloads pervasive on mobile platforms. Having four low power cores means that you can do as many background tasks as necessary with minimal power draw. Ideally, you'll probably be using only one or two of those cores, but you can still get a significant amount done with all four before needing two switch up to two Cortex A15s and really amp up your power draw. I'd imagine that all four Cortex A7s would run in less power envelope than the two Cortex A15s.kpb321 - Tuesday, February 25, 2014 - link
I think the 2x A15 4x A7 split makes sense. Everything I've seen for phone work loads says that it's very uncommon to have more than 2 heavily loaded cores with the 3rd and 4th cores usually being used less frequently and often at lower speeds. That makes sense given how hard it is to multithread most things well and doing so across 4+ cores is even harder. Usually the only things that can consistently and easily max out multiple cores like that is things that either aren't done on a phone (software rendering in Max, maya etc) or are done with dedicated hardware like compression. On the other hand a phone is full of low performance background tasks like checking for updates, sending and receiving messages, streaming music, etc. that are perfect for an A7fteoath64 - Wednesday, February 26, 2014 - link
The Hexacore seems to have one A7 core too many. I would go for a PentaCore config with three A7 cores but one of the A7 cores dedicated to I/O processing. Solely so and the scheduler should be aware of it. This way the twin A15 has a perfect A7 shadow core to offload to. Minimising the cache snoops. With I/O tuned to a dedicated A7 core, it becomes a simpler SoC.name99 - Wednesday, February 26, 2014 - link
There whole POINT of the slow background tasks you describe is that they are SLOW.You do realize you can multitask multiple such tasks on a single CPU? I ask this in all seriousness because EVERY FREAKING TIME this issue comes up, 90% of the commenters act as though they do not understand and are completely unaware of this.
If the total CPU requirements of all your background tasks are less than the performance of a single A7 (and they damn well should be, otherwise you have a very crappy OS) then the optimal solution is a single core running all the tasks.
I'm with deltatux. only I'd be more extreme. Much like I stated when big.LITTLE was first announced, I STILL believe that for all realistic scenarios the only config that makes sense is a SINGLE slow companion core. nVidia don't get much right, but this was one they did get right.
The big.LITTLE matching core scheme is justifiable if you have a crappy OS (which WAS the case back then) which can't do a decent job of handling heterogenous cores, so you're using two cores to fake a single high dynamic range core. But it make no sense when you have a heterogenous-aware OS.
MrSpadge - Wednesday, February 26, 2014 - link
You can't use 4 A15's at the same time in a smartphone thermal envelope anyway, despite manufacturers putting them in. So running 2 of them at maximum frequency is hardly slower than 4 of them running at significantly reduced frequency.lagokc - Wednesday, February 26, 2014 - link
"You can't use 4 A15's at the same time in a smartphone thermal envelope anyway, despite manufacturers putting them in. "Nonsense. There just hasn't been a phone manufacture that's been willing to pack in a big enough heatsink and fan yet.
jimjamjamie - Wednesday, February 26, 2014 - link
Nvidia Shield 2 w/ closed loop water cooling! woopname99 - Wednesday, February 26, 2014 - link
Do you know this for a fact?There are three possible pain points:
- the temperature gets too high
- the power draw gets too high
- the power draw becomes "undesirable" ie short battery life.
The first two are nonnegotiable, the third is a matter of opinion, and some people might be happy with it, especially if connected to power.
My understanding was that the issue is the second and third points more than the first.
The first is an interesting point because if you allow for thermal inertia (ie like modern Intel) you get a few milliseconds to run everything at high speed, and that may be valuable for snappiness, although not for on-going performance.
jjj - Tuesday, February 25, 2014 - link
Odd to announce the 6 cores SoC after announcing a phone that is using it, the Samsung Galaxy Note 3 Neo.Gam3sTr - Tuesday, February 25, 2014 - link
Is the 5422 their flagship soc or are they going to announce an even better one this year?dylan522p - Tuesday, February 25, 2014 - link
They probably will have a A53/A57 one with 20nm at the end of the year for the Note 3.coder543 - Wednesday, February 26, 2014 - link
I think you mean Note 4?teiglin - Tuesday, February 25, 2014 - link
I suppose 5260 will be a good test, but I'm still not sold on big.LITTLE. The problem remains largely a software one because the DVFS is handled by the kernel, so it isn't totally insurmountable, but there needs to be some very serious vertical cooperation here in order to make big.LITTLE both as power-efficient and fast as it theoretically can be. I'm sure there's more room for simple kernel-only optimizations based on the phone state (power-saving mode, screen on, perhaps even connectivity). Perhaps there could be an application-level switch indicating whether a task is time-sensitive and should be performed on a big core (e.g. UI event handler) or lazier background work and can be performed on LITTLE cores (e.g. file transfers, synchronization). Still, that would put an additional onus on developers to write good, power-efficient mobile apps, which is already far too uncommon.Hopefully Samsung can get their CPU governors to a state where the UI never drops frames due to a core swap, but anecdotally--just comparing 5420 devices to their MSM8974 equivalents--they've got a long way to go.
porphyr - Tuesday, February 25, 2014 - link
I'm really happy that we have a 2 big/4LITTLE configuration around now. Hopefully someone gets an implementation into your handsArmanUV - Wednesday, February 26, 2014 - link
Could someone explain why Samsung is still using Qualcomm SoCs for their LTE-enabled devices? Surely Samsung could have produced their own LTE compatible Exynos SoC by now, no?aryonoco - Wednesday, February 26, 2014 - link
Exynos is LTE compatible. There are many variants of SGS4 and Note 3 with Exynos and LTE. It's just that in certain markets (US, Canada, Australia) carriers are very picky and require a lot of testing and validation of new platforms. Qualcomm already does that heavy lifting with its SoCs so if you use a Qualcomm platform you can skip a lot of that grunt work whereas if Samsung wanted to use its own SoC, that would require a lengthy and time-consuming validation and testing process with these carriers and time to market is very important in this industry.lilmoe - Friday, February 28, 2014 - link
+1I would like to add to that one point. Samsung, like any other OEM, is a business first and last. They'll go with whatever is profitable, especially since Qualcomm's are very popular (and reliable) ATM.
darkich - Wednesday, February 26, 2014 - link
Yeah I agree that 2+4 is probably the best combination.. 4 A7's at up to 2 GHz and 2 A15's(r3p3) at 2.1GHz would be better than the 8 core Exynos 5422 imo.darkich - Wednesday, February 26, 2014 - link
The dream SoC, as I see it:4 A53 cores at up to ~2 GHz, 2 A57 cores at up to~3GHz, Maxwell ULP GPU, all done on 14 - 16 nm.
darkich - Wednesday, February 26, 2014 - link
Anand, here's a test of the Note 3 Neo with the Exynos Hexa:http://www.gsmarena.com/samsung_galaxy_note_3_neo-...
The chip doesn't seem to beat the Snapdragon 800, on contrary actually.
darkich - Wednesday, February 26, 2014 - link
..beat it efficiency-wise, that is.Jon Tseng - Thursday, February 27, 2014 - link
5422 HMP sounds a bit pointless: Do they really think you're going to be able to run 8 cores simultaneously in thought thermal evelope? Sounds unlikely to me!Wilco1 - Thursday, February 27, 2014 - link
HMP doesn't imply that all 8 cores must be running simultaneously, or at their maximum frequency. HMP means that if you say have 1 high performance task and 3 low performance tasks, it may use 3 little cores and 1 big core. Without HMP you'd be forced to use 2 or 3 big cores, which is less efficient.Jon Tseng - Thursday, February 27, 2014 - link
But surely if you want 1 big and 3 small cores you could just use CPU migration (the second variant of big.little) without having to go full fat hmp, no?Hmp only adds value if you plan to have 5+cores firing at once, right? But I reckon once you have more than a couple of A15s firing at once you'll get getting into thermal throttling territory anyway in a typical smartphone? (May be diff for tablets)
lilmoe - Friday, February 28, 2014 - link
"Hmp only adds value if you plan to have 5+cores firing at once, right?"Nope. "Ideal" HMP adapts more efficiently, and optimally, to any type workload. HMP is more dynamic than other platforms. If you only need 1 little core, then only one little core is fired up, which is more efficient than, for example, having 1 krait core fired up. If you need 1 big core, then only one is loaded, which (again) is faster than a krait core. That's what I mean by "dynamic". There's only so much optimization you can do to an individual core to handle both light and heavy workloads, having 2 (one optimized for either) gives you a better contrast in power draw.
This, of course, stands IF a chip can power gate each core individually, not only on the cluster level. Don't forget that power draw isn't only dependent on CPU cores, but a platform as a whole (cores, cache, interconnect, bus, RAM, etc..)
Having 4 little cores makes a LOT of sense, since most workloads aren't heavy (most users are on messaging apps and light games), big cores are only needed less than 10% of the time (for most consumers) to generally increase responsiveness of a platform. As far as I undersand, 2 A7 cores draw less power than 1 A15, so loading multiple smaller tasks on MORE little cores should be more efficient than loading multiple tasks on a big core. That said, there are times when loading tasks on big cores is more efficient since they'll be able to be processed significantly faster. This all depends on the type of workload and user habits, that's why designing a kernel that handles all of that isn't an easy task.
Throttling is an issue of any/every high performance core on 28nm, of course, but (again) don't forget that the big cores do more "work" before they get to that point, and the more you have of them, the better when needed, IF the power gating conditions above are met, even if only 1 or 2 of them are needed most of the time during heavy workloads.
There isn't an easy way to describe/explain this. But a less ideal setup (in MOBILE) would be having less small cores than big. Exynos 5422 is the better chip in ideal HMP. It's faster than the 5260, but the 5260 isn't necessarily more power efficient ;) And yes, I don't agree with the author on which is the more "ideal" setup. It all depends on the efficiency of the implemenation and the software/firmware behind all of that at this point.
Wilco1 - Saturday, March 1, 2014 - link
I agree. 2+4 cores is not by definition "ideal" like Anand suggests. If you have 4+4 with HMP then if the workload requires only 2 big cores, it works exactly like 2+4. However you can also fire up 2 extra big cores *if* you need to. Also due to voltage scaling running a workload on 4 big cores at a reduced frequency is far more power efficient than running 2 cores at their maximum frequency.So it is obvious that 4+4 is always better than 2+4, both in terms of performance and power efficiency. However it is also true that 2+4 is good enough (since A15 is fast), and the main reason for it is cost/area optimization for mid-end devices.
gliatiotis - Thursday, February 27, 2014 - link
The 5260 variant is already available inside the Galaxy Note Neo, strangely enough in the LTE one.Systems Analyst - Friday, February 28, 2014 - link
I think that ARM have explained their design philosophy for smart phones; 4 A53s and 2 A57s plus a Mali600. The latter gives hundreds of Gflops/s. No more A57s are needed, because the GPGPU is so powerful. Most benchmarks used, e.g. by Anandtech, appear to be futile, because they do not recognise the use of the GPGPU. According to ARM, Android 4.2 on has supported RenderScript, e.g. on the Google Nexus 10. The Cache Coherent Interconnect supports the GPGPU. This was demonstrated on a stock Nexus 10 in June 2013. ARM and their partners give dozens of use cases for a GPGPU. One use is to provide ISP functionality in software; no ISP hardware required. Another is H.265 in software, again, no special hardware required. This can provide up to 8k resolution, with 2:1 speed-up in compression compared with H.264. There are hundreds of companies working in the ARM ecosphere, and it must be very difficult to follow all developments. Products will gradually appear on the market, and then we will all become more aware of the technical details. Intel do not seem to be competitve. ARM state that they will have 100% penetration of the mobile phone market; we shall see if this is true. The Intel processors for PCs may become redundant, because of these developments. The collosal mobile phone market is driving processor developments, and they will ripple up the processing hierarchy to server farms and HPC.Comparing CPUs is not the point. It is the GPGPUs that count.