Helpful to keep the memory bus powered down more often, not explicitly for raw performance gains (though it helps there too). 8 MB on a mobile SoC is still a lot of cache.
The RAM is DDR, if you power it down you lose your data. A larger L2 does let you run the the RAM at a lower clock speed with less perceived performance impact for the user. A lower clock speed will generally lower power consumption.
Not really. It only seems large compared to recent Intel designs which have focussed on having large L3 and small (but low-latency and high throughput L2). Compare with Penryn, for example, in 2007, which gave 3MiB to each core. Apple is giving 2.67MiB to each core --- basically the same sort of capacity.
The main thing to take away, I think, is that the exact details of a cache system (even at the most basic level of the sizes and the inclusivity) don't have a single correct answer --- the space of "good design" is fairly voluminous, and it doesn't take much of a change in exactly what you're trying to optimize for to shift the design in a way that looks substantial, but is still only a percent or so different in performance.
I believe he implicitly (yet very obviously) meant "... for a mobile SoC". Your comparison is from an entirely different product category, so it really makes no sense.
Actually no BEOL is, or can possibly be, 10 or 14nm for a 10 or 14nm process, because if you make the copper wires of all the BEOL layers that thin you will both increase their resistance and they will just sublimate from the heat. BEOLs have multiple layers, and the real problem with them is not with the upper layers (where the copper wires get progressively bigger) but with the lowest couple of layers that interface with the FEOL part (aka the transistors). Only these one or two bottom layers need to be very thin, because they interface with the multitude of tiny transistors, and only these are the layer(s) that can potentially be almost as small as the lithography process (the part Intel calls "14nm BEOL", which is actually only the bottom BEOL layer). These one or two layers are also the weakest part of a CPU, due to the very high resistance of the copper wires, due to them being very thin. The bottom BEOL layer can also be viewed as the top BEOL layer, depending on how you look at a CPU stack. But it's always the one that directly interfaces with the FEOL segment.
Does this have something to do with delivering electricity throughout the chip? Maybe the wires are larger on purpose to allow for higher voltages on the BEOL layers. (?)
Unclear. The screen refresh rate is adaptive. It may well be a net win under almost all circumstances. Most of the time when you're just reading something the refresh rate can be lower, likewise for movies; it only has to kick in for UI (animations+tracking).
My experience (which is of course only anecdotal, not scientific) is that my iPad Pro 12.9" is drawing down battery substantially slower than my iPad Air, even though I mostly use it for reading technical PDFs.
So much ink on Apple going 10nm for iPad first when the simple fact is that the timing for this node allowed it, that's all. I do find it interesting that a medium volume SoC like this one gets coverage while when Techinsights took a look at SD835 , there was silence and that SoC is many times more relevant.
The SD835 is far less interesting technologically. Its single core performance is so far behind Intel and Apple that it is uninteresting. It runs 8 cores at a time to reach reasonable peak scores in multicore benchmarks, but that that 8-core score is mostly irrelevant to end users, as they won't be running software that takes full advantage of all 8 cores simultaneously.
That's actually the problem with mobile devices. You have to install a hundred apps to do the same thing you can do with a single browser on a PC. Because every website out there tries to get you to install their app instead of use their website (probably so they can harvest your data and track everything you do). Heck, some of the forum websites I visit on my phone or tablet spam me with a popup to install their app. If I did that for every site I visit, I'd need 200+ GB of storage on my phone.
Can you imagine how horrible the Internet would be if each website you visited required you to install a new program to access the site? That's what mobile is like. Programs/apps are for when you're doing stuff locally. Browsers and remote desktops are for when you're accessing data remotely.
Nonsense. You can pretty much use the website for many things, but it's nice to have the app for the places you visit often. For me, I've installed eBay and Amazon apps, but I just use the websites for things like PayPal or B&H. Sometimes the apps are a better option, like when notifications matter.
The browser rarely makes use of all 8 cores, there aren’t that many processes created by the browser that are multithreaded and in order to take full advantage of 8 cores it really needs a good multithreaded process. The browsers do run more than 1 process at the same time but the main processes are single-threaded and there aren’t that many concurrent processes happening.
Javascript is far and away the biggest CPU drain in any browser, and it's single threaded. Due to the nature of JS and browsers, it's also generally blocking - meaning nothing else the browser wants to do can happen until JS has been processed, including processing other JS. Single core performance is far and away the biggest indicator of browser speed...
Unclear. What do you define as "the browser" (ie what sort of workload) and what evidence do you have that this workload is substantially accelerated by having 8 cores available? It is true that JS CAN be written to use threads; but it's also true that large amounts of the browser experience (basic layout, CSS, DOM manipulation, running JIT and networking on a separate thread, etc) run out of steam after more that two CPUs.
It is instructive to compare the browser benchmark results for iPad Pro vs iPhone7. We have the essentially same micro-architecture and frequency. iPad has larger L2, but iPhone has exclusive L3 so not THAT different overall. iPad was wider memory bus so same DRAM latency but twice the bandwidth. And, most important, iPad has three cores, iPhone has two.
You can see the results here: https://arstechnica.com/apple/2017/06/review-10-5-... Basically across all three browser benchmarks, the iPad results are not THAT larger than iPhone --- the sort of improvement you'd expect from the caches and memory subsystem, but not 50% extra from an extra core.
Javascript is run in the browser and that's largely only leverages a single core. That's part of the reason why iOS devices trounce anything on Android with Javascript benchmarks.
Besides, Apple is already beating every ARM design with one hand tied to their back by just with 3 cores and the cores collectively only take a small piece of the 96.4mm2 real estate. If they drop the kiddo gloves and throw 8 of them in there, I bet Apple would even give desktop Ryzen a run for its MT money.
What you are usually getting are apps that have more than one process, to separate rendering from app logic, something that a 2 core CPU will do just as well. 8 core multithreaded processes is something that you will only find in a few apps in image, video and audio processing, and some games, and it needs good code.
What about background multitasking? With a few gigs of RAM, it's conceivable that a rendering process could run at the same time a web page was being loaded.
Does iOS run more on the timeslicing multitasking model, with a fast CPU constantly switching between tasks, or is it more like the Android throw-more-cores model?
What you are describing is what I described: apps have more than one process, to separate rendering from app logic. A 2 core CPU can do that just as well and if the cores are faster you will probably see better performance. If you want to talk about running another app in the background, or system background tasks, I would think there would be a small responsive advantage of having more than 2 cores CPU in a smartphone, but there isn’t much need for 8 cores in something like that either. But since there isn’t a public benchmark to compare background app multitasking performance between the iPhone and Android phones, it is uncertain which CPU solution has a background performance advantage . iOS uses every resource it has available, it has basically the same multicore and multiprocessor support as the macOS - Apple designed SoC have 2 to 6 cores (3 cores available per app).
People still don't understand that background multitasking (implemented in a way that makes it possible to run user code in the background without restrictions) is useless on mobile devices. Use the APIs for background tasks (like fetching new info or doing push notifs) instead. Thanks for the troll.
The OS itself runs many processes in the background though. In theory, with a small enough and efficient enough core, it makes sense to run that on a core that's separate from the large performance core. This lets the performance core somewhat dedicate itself to a heavy javascript process without context switching.
"One of the more intriguing mysteries in the Apple ecosystem has been the question over what process the company would use for the A10X SoC" Hardly a mystery though, there were several rumours that it was on 10nm
Perhaps the reason Apple went to 10nm with this SOC was to also be able to use it in the next generation iPhone as well? If we're looking at them launching the 7S and an ultra-premium anniversary edition, they might be planning to use this SOC for one of those 2 models.
I don’t understand. If the chip has the same amount of GPU cores as the A9X, and they are the same cores, and they are clocked similarly, how is the GPU benchmarking so much faster than the A9X?
Because Apple's SoC design team is has been far and away the best in the entire industry. They also just killed ImgTec in GPU, the rumors are the guys next on the Apple custom design chopping block is Qualcomm and Dialog on baseband and power management IC respectively.
They aren’t the same GPU cores of the A9X. It should use a similar GPU core to the A10 which was a tweaked version from the previous A9 GPU. We don’t know if they have a similar clock speed, the numbers that are shown are for the CPU not the GPU.
Does it say the cores are clocked similarly though? I thought they could have a different relative clock speed to the overall SoC clock. Also, are we sure they are the exact same cores? Or just that there are the same number of them and they have no new "features" from Apple's software standpoint.
What’s really interesting here is that with such a major shrinkage in t gives Apple a chance to add a lot more to the chip. I imagine that the soon to appear A11 is taking advantage of the same process. Since it’s going to be the second chip using the 10nm process, possibly Apple will feel that they can advance it even more.
Generally, we find that the next generation phone SoC from them has GPU performance about equal to the previous generations iPad GPU, that had double the cores.
It’s will be interesting to see whether the A11 has performance exceeding the CPU of the A10x, and GPU performance at almost the same level. Of course, it’s not likely to have 3/3 CPU cores - or will it?
This makes me wonder what they have in store for the A11 in the next iPhone due later this year. I think this sets up the expectation that Apple will use 10 nm there as well. I'd still expect a dual big + dual little design. The change maybe that Apple could enable all four cores simultaneously under heavy load. More cache as we've seen on the A10X is probably a given, I'd guess 6 MB. GPU side is where I'd see the big changes happening for the A11 with a new cluster design. I don't think they'll have their custom GPU ready by then but Apple has been known to surprise. I see Apple adopting the latest PowerVR design and increasing the cluster count.
"I'd still expect a dual big + dual little design."
This is not a useful way to look at it; it reflects ARM thinking, not Apple thinking. Apple, as far as we can tell, does not design of think of these as "dual big" and "dual little", they think of them as a "flexi-core" that consists of a big and a little very tightly coupled. The difference is that the unit of construction is the "big+little" it's not clusters of big and clusters of little.
We appear to know that switching between a big and its companion little is done by HW. (Apple talks about a "HW performance controller" doing this job). It also seems to be the case that the two can't run independently (big and little running simultaneously) though it's not clear if this is a HW limitation, an OS limitation, or just a policy decision (Apple experimented and could find no circumstances under which it really made sense).
If I had to bet, my betting would be that as we move forward the big and little cores will become ever closer, ever more like two sides of a single "flexicore", so perhaps even moving to sharing L1 cache for example. We'll see...
(ARM has STARTED down this path with DynamIQ --- at least now big and little cores can have a tighter association rather than being forced into separate clusters using separate L2s. Not clear yet if DynamIQ allows for HW to control the toggling between big and little rather than software.)
I'm not disagreeing with you but there isn't much in terms of terminology to quickly describe that arrangement. It is an implementation distinctly different from what ARM is doing but they both do the same thing at a high level.
Sharing L1 cache would be nice as swapping between the two designs wouldn't haven't move data for warming caches. However, I can it being difficult to keep L1 latencies low in such a scenario Perhaps just a shared L1 data cache and dedicated L2 instruction caches?
Heck, since it's on the same silicon and using the same process you can theoretically even share the registers and just swap the big/little pipelines and execution units.
One problem is that you want the little core not just to have a simpler micro-architecture but ALSO to be built of slower transistors. That reality would seem to constrain how aggressively you can push sharing.
But there are academic designs (Univ North Carolina Chapel Hill has done a lot work in this) that share almost everything and do the big/little transition by shutting down parts of the big microarchitecture. They utilize counters to predict regions of code that will not benefit from the wide micro-architecture (maybe lots of misses to memory, maybe lots of sequentially dependent instructions, maybe lots of hard to predict misses), and switch between the wide and the more narrow configs at around every thousand instructions or so. In theory these give substantial energy savings at a performance loss of 3..5% (which you can easily make up and more just by cranking the frequency higher). But I'm guessing it will be some time before the commercial world gets there! Let's see if they're at least headed that way by seeing whether Apple's next config pulls the two CPUs tighter together.
Oh yeah, slower less leaky transistors definitely help, but even just switching fewer transistors with a simpler pipeline would yield non-trivial power savings.
Actually this got me thinking may be there wont be A11. Apple will use the same A10X for iPhone 8. Since 10nm is a short node, it is only a stepping stone to 7nm. May be the innovation, Apple made GPU, new CPU architecture, will only come next year?
Now where are Samsung and Qualcomm with their higher end SoCs for tablet use? Seems ridiculous to use the same one as your phone when a tablet is where you want to extra power.
Seems like Apple is the only one with the profit margin and tablet sales to justify developing customized higher end chips. For Samsung their high end phone chips are "good enough". Of course Apple does this too but only in their lower end tablets at this point (lone exception was the iPad Air).
Cleans up the fabrication process, you will take a higher loss on the production of these chips because the process isnt completely mature. It paves the way for better yields on your more profitable or numerous later products. It cleans out the gunk in the pipes.
Ok, that makes sense. Would've been nice to have had the term explained--I don't think I've ever seen it used here before, although I admittedly don't read every article.
That's because you don't know enough about the internet. Same thing happens there.
Once a month VZW, Global Crossing, China Telecom and so on, all the big ISPs, pour pipe cleaner into the internet to clean out the pipes. That's why you get occasional hiccups in the speed. Has to be done carefully and synchronized around the whole world so that the cleaner poured into China, for example, can flow out in time and doesn't collide with the cleaner poured into the US.
Bear in mind that's feeding the GPU at the same time. CPUs dont generally need as much bandwidth, and performant GPUs come with their own faster memory.
The core are bigger because Apple's Ax core are using a ton of cache. You can run an entire program from Apple's cache. However Apple's GPUs are slow. The A9x is rated at 350 gigaflops. The SD 820 is rated at 498 gigaflops. The SD 821 is sl...
this is my first time i visit here. I found so many entertaining stuff in your blog, especially its discussion. From the tons of comments on your articles, I guess I am not the only one having all the leisure here! Keep up the excellent work. his is my first time i visit here. I found so many entertaining stuff in your blog, especially its discussion. From the tons of comments on your articles, I guess I am not the only one having all the leisure here! Keep up the excellent work.
We’ve updated our terms. By continuing to use the site and/or by logging into your account, you agree to the Site’s updated Terms of Use and Privacy Policy.
87 Comments
Back to Article
III-V - Friday, June 30, 2017 - link
That's a beefy L2Kevin G - Friday, June 30, 2017 - link
Helpful to keep the memory bus powered down more often, not explicitly for raw performance gains (though it helps there too). 8 MB on a mobile SoC is still a lot of cache.thesandbenders - Saturday, July 1, 2017 - link
The RAM is DDR, if you power it down you lose your data. A larger L2 does let you run the the RAM at a lower clock speed with less perceived performance impact for the user. A lower clock speed will generally lower power consumption.Nokiya Cheruhone - Tuesday, July 4, 2017 - link
I see you have no clue about how DDR-SDRAM works.thesandbenders - Tuesday, July 4, 2017 - link
I didn't realize "powered down" was commonly used to refer to idle and LP states, I stand corrected.name99 - Friday, June 30, 2017 - link
Not really. It only seems large compared to recent Intel designs which have focussed on having large L3 and small (but low-latency and high throughput L2).Compare with Penryn, for example, in 2007, which gave 3MiB to each core. Apple is giving 2.67MiB to each core --- basically the same sort of capacity.
The main thing to take away, I think, is that the exact details of a cache system (even at the most basic level of the sizes and the inclusivity) don't have a single correct answer --- the space of "good design" is fairly voluminous, and it doesn't take much of a change in exactly what you're trying to optimize for to shift the design in a way that looks substantial, but is still only a percent or so different in performance.
Santoval - Sunday, July 2, 2017 - link
I believe he implicitly (yet very obviously) meant "... for a mobile SoC". Your comparison is from an entirely different product category, so it really makes no sense.Eug - Friday, June 30, 2017 - link
First!...to 10 nm FF
StevoLincolnite - Friday, June 30, 2017 - link
It is not a real 10nm process.RPE33 - Friday, June 30, 2017 - link
Fake news confirmed by StevoLincolnite!!!kfishy - Friday, June 30, 2017 - link
According to the article it is a full node scaling, which is pretty impressive these days.StevoLincolnite - Friday, June 30, 2017 - link
TSMC is using a 14nm BEOL for it's 10nm process.It might be a "full node" scaling from it's 16nm FF process, which actually used a 20nm BEOL. But a true 10nm process it is not.
***
I'll assume RPE33 is a sarcastic troll.
Morawka - Saturday, July 1, 2017 - link
Welcome to chip fab 101.. Even Intels 14nm BEOL is using a larger metal interconnect. None of them are properSantoval - Sunday, July 2, 2017 - link
Actually no BEOL is, or can possibly be, 10 or 14nm for a 10 or 14nm process, because if you make the copper wires of all the BEOL layers that thin you will both increase their resistance and they will just sublimate from the heat. BEOLs have multiple layers, and the real problem with them is not with the upper layers (where the copper wires get progressively bigger) but with the lowest couple of layers that interface with the FEOL part (aka the transistors). Only these one or two bottom layers need to be very thin, because they interface with the multitude of tiny transistors, and only these are the layer(s) that can potentially be almost as small as the lithography process (the part Intel calls "14nm BEOL", which is actually only the bottom BEOL layer). These one or two layers are also the weakest part of a CPU, due to the very high resistance of the copper wires, due to them being very thin. The bottom BEOL layer can also be viewed as the top BEOL layer, depending on how you look at a CPU stack. But it's always the one that directly interfaces with the FEOL segment.EasyListening - Monday, July 3, 2017 - link
Does this have something to do with delivering electricity throughout the chip? Maybe the wires are larger on purpose to allow for higher voltages on the BEOL layers. (?)omf - Friday, June 30, 2017 - link
They certainly spent some of the power budget in those new iPads on higher screen refresh rates.name99 - Friday, June 30, 2017 - link
Unclear. The screen refresh rate is adaptive. It may well be a net win under almost all circumstances. Most of the time when you're just reading something the refresh rate can be lower, likewise for movies; it only has to kick in for UI (animations+tracking).My experience (which is of course only anecdotal, not scientific) is that my iPad Pro 12.9" is drawing down battery substantially slower than my iPad Air, even though I mostly use it for reading technical PDFs.
kfishy - Friday, June 30, 2017 - link
The 12.9" Pro also has a substantially larger battery, which would also help when reading static documents.jjj - Friday, June 30, 2017 - link
So much ink on Apple going 10nm for iPad first when the simple fact is that the timing for this node allowed it, that's all.I do find it interesting that a medium volume SoC like this one gets coverage while when Techinsights took a look at SD835 , there was silence and that SoC is many times more relevant.
ws3 - Friday, June 30, 2017 - link
The SD835 is far less interesting technologically.Its single core performance is so far behind Intel and Apple that it is uninteresting.
It runs 8 cores at a time to reach reasonable peak scores in multicore benchmarks, but that that 8-core score is mostly irrelevant to end users, as they won't be running software that takes full advantage of all 8 cores simultaneously.
Spunjji - Friday, June 30, 2017 - link
...except for the browser, which is a huge part of what people use and takes full advantage of as many cores as you throw at it.WinterCharm - Friday, June 30, 2017 - link
People spend more time in apps than they do on the web browser of a mobile device.Solandri - Friday, June 30, 2017 - link
That's actually the problem with mobile devices. You have to install a hundred apps to do the same thing you can do with a single browser on a PC. Because every website out there tries to get you to install their app instead of use their website (probably so they can harvest your data and track everything you do). Heck, some of the forum websites I visit on my phone or tablet spam me with a popup to install their app. If I did that for every site I visit, I'd need 200+ GB of storage on my phone.Can you imagine how horrible the Internet would be if each website you visited required you to install a new program to access the site? That's what mobile is like. Programs/apps are for when you're doing stuff locally. Browsers and remote desktops are for when you're accessing data remotely.
melgross - Friday, June 30, 2017 - link
That’s not true either. Where are you getting this from?lefty2 - Friday, June 30, 2017 - link
Yeah, but app developers need to make a living!RPE33 - Friday, June 30, 2017 - link
I'm sorry, but what you said is complete tripe.MonkeyPaw - Friday, June 30, 2017 - link
Nonsense. You can pretty much use the website for many things, but it's nice to have the app for the places you visit often. For me, I've installed eBay and Amazon apps, but I just use the websites for things like PayPal or B&H. Sometimes the apps are a better option, like when notifications matter.nsandersen - Thursday, January 25, 2018 - link
What Solandri says seems true to me - lots of apps being pushed, which doesn't do much a good mobile website couldn't do.asendra - Friday, June 30, 2017 - link
WAT?Browsers may be the best single use case for the necessity of faster cores, because Javascript is not multi-threaded.
melgross - Friday, June 30, 2017 - link
Browsers don’t use that many cores.Ppietra - Friday, June 30, 2017 - link
The browser rarely makes use of all 8 cores, there aren’t that many processes created by the browser that are multithreaded and in order to take full advantage of 8 cores it really needs a good multithreaded process. The browsers do run more than 1 process at the same time but the main processes are single-threaded and there aren’t that many concurrent processes happening.blackcrayon - Friday, June 30, 2017 - link
The browser uses all 8 cores? Then you would think it would outperform Safari on the new iPads. Unless the browser code is also extremely inefficient.DarrenR - Friday, June 30, 2017 - link
Javascript is far and away the biggest CPU drain in any browser, and it's single threaded. Due to the nature of JS and browsers, it's also generally blocking - meaning nothing else the browser wants to do can happen until JS has been processed, including processing other JS. Single core performance is far and away the biggest indicator of browser speed...name99 - Friday, June 30, 2017 - link
Unclear. What do you define as "the browser" (ie what sort of workload) and what evidence do you have that this workload is substantially accelerated by having 8 cores available?It is true that JS CAN be written to use threads; but it's also true that large amounts of the browser experience (basic layout, CSS, DOM manipulation, running JIT and networking on a separate thread, etc) run out of steam after more that two CPUs.
It is instructive to compare the browser benchmark results for iPad Pro vs iPhone7. We have the essentially same micro-architecture and frequency. iPad has larger L2, but iPhone has exclusive L3 so not THAT different overall. iPad was wider memory bus so same DRAM latency but twice the bandwidth. And, most important, iPad has three cores, iPhone has two.
You can see the results here:
https://arstechnica.com/apple/2017/06/review-10-5-...
Basically across all three browser benchmarks, the iPad results are not THAT larger than iPhone --- the sort of improvement you'd expect from the caches and memory subsystem, but not 50% extra from an extra core.
techconc - Friday, June 30, 2017 - link
Javascript is run in the browser and that's largely only leverages a single core. That's part of the reason why iOS devices trounce anything on Android with Javascript benchmarks.kfishy - Friday, June 30, 2017 - link
Can you imagine the throttling if browsers use 8 cores all the time...StrangerGuy - Saturday, July 1, 2017 - link
Besides, Apple is already beating every ARM design with one hand tied to their back by just with 3 cores and the cores collectively only take a small piece of the 96.4mm2 real estate. If they drop the kiddo gloves and throw 8 of them in there, I bet Apple would even give desktop Ryzen a run for its MT money.nikaldro - Friday, June 30, 2017 - link
This is BS. Mobile apps have been getting very well threaded in the last years.Ppietra - Friday, June 30, 2017 - link
What you are usually getting are apps that have more than one process, to separate rendering from app logic, something that a 2 core CPU will do just as well. 8 core multithreaded processes is something that you will only find in a few apps in image, video and audio processing, and some games, and it needs good code.serendip - Saturday, July 1, 2017 - link
What about background multitasking? With a few gigs of RAM, it's conceivable that a rendering process could run at the same time a web page was being loaded.Does iOS run more on the timeslicing multitasking model, with a fast CPU constantly switching between tasks, or is it more like the Android throw-more-cores model?
Ppietra - Saturday, July 1, 2017 - link
What you are describing is what I described: apps have more than one process, to separate rendering from app logic. A 2 core CPU can do that just as well and if the cores are faster you will probably see better performance.If you want to talk about running another app in the background, or system background tasks, I would think there would be a small responsive advantage of having more than 2 cores CPU in a smartphone, but there isn’t much need for 8 cores in something like that either. But since there isn’t a public benchmark to compare background app multitasking performance between the iPhone and Android phones, it is uncertain which CPU solution has a background performance advantage .
iOS uses every resource it has available, it has basically the same multicore and multiprocessor support as the macOS - Apple designed SoC have 2 to 6 cores (3 cores available per app).
Nokiya Cheruhone - Tuesday, July 4, 2017 - link
People still don't understand that background multitasking (implemented in a way that makes it possible to run user code in the background without restrictions) is useless on mobile devices. Use the APIs for background tasks (like fetching new info or doing push notifs) instead. Thanks for the troll.metafor - Friday, July 7, 2017 - link
The OS itself runs many processes in the background though. In theory, with a small enough and efficient enough core, it makes sense to run that on a core that's separate from the large performance core. This lets the performance core somewhat dedicate itself to a heavy javascript process without context switching.kfishy - Friday, June 30, 2017 - link
Snapdragon 835 was manufactured by Samsung, as the article states this is the first TSMC 10nm SoC shipping in actual devices.pav1 - Friday, June 30, 2017 - link
Moral of the story - wait till A11 to see more. The IPad Pro flies... so be happy with what you have.gigathlete - Friday, June 30, 2017 - link
Thanks for this article Ryan, hopefully you guys will be able to give us a performance preview of this A10X. Seems like a true beast.lefty2 - Friday, June 30, 2017 - link
"One of the more intriguing mysteries in the Apple ecosystem has been the question over what process the company would use for the A10X SoC"Hardly a mystery though, there were several rumours that it was on 10nm
melgross - Friday, June 30, 2017 - link
That’s why it was a mystery. They were rumors.MonkeyPaw - Friday, June 30, 2017 - link
Perhaps the reason Apple went to 10nm with this SOC was to also be able to use it in the next generation iPhone as well? If we're looking at them launching the 7S and an ultra-premium anniversary edition, they might be planning to use this SOC for one of those 2 models.Anticipate - Friday, June 30, 2017 - link
I don’t understand. If the chip has the same amount of GPU cores as the A9X, and they are the same cores, and they are clocked similarly, how is the GPU benchmarking so much faster than the A9X?tipoo - Friday, June 30, 2017 - link
10nm allows higher clocks at the same power.melgross - Friday, June 30, 2017 - link
The clock is only about 5% higher for a 40% improvement in performance.kfishy - Friday, June 30, 2017 - link
Might be much faster memory performance, mobile GPUs nowadays are pretty bandwidth hungry.StrangerGuy - Friday, June 30, 2017 - link
Because Apple's SoC design team is has been far and away the best in the entire industry. They also just killed ImgTec in GPU, the rumors are the guys next on the Apple custom design chopping block is Qualcomm and Dialog on baseband and power management IC respectively.melgross - Friday, June 30, 2017 - link
It’s a later series.Ppietra - Friday, June 30, 2017 - link
They aren’t the same GPU cores of the A9X. It should use a similar GPU core to the A10 which was a tweaked version from the previous A9 GPU.We don’t know if they have a similar clock speed, the numbers that are shown are for the CPU not the GPU.
blackcrayon - Friday, June 30, 2017 - link
Does it say the cores are clocked similarly though? I thought they could have a different relative clock speed to the overall SoC clock. Also, are we sure they are the exact same cores? Or just that there are the same number of them and they have no new "features" from Apple's software standpoint.Nokiya Cheruhone - Tuesday, July 4, 2017 - link
Apple doesn't use the same GPU, its (now designed in-house) design is evolving constantly.tipoo - Friday, June 30, 2017 - link
Soo, any chance of a deep dive? Merged into the a10 one?melgross - Friday, June 30, 2017 - link
What’s really interesting here is that with such a major shrinkage in t gives Apple a chance to add a lot more to the chip. I imagine that the soon to appear A11 is taking advantage of the same process. Since it’s going to be the second chip using the 10nm process, possibly Apple will feel that they can advance it even more.Generally, we find that the next generation phone SoC from them has GPU performance about equal to the previous generations iPad GPU, that had double the cores.
It’s will be interesting to see whether the A11 has performance exceeding the CPU of the A10x, and GPU performance at almost the same level. Of course, it’s not likely to have 3/3 CPU cores - or will it?
Kevin G - Friday, June 30, 2017 - link
This makes me wonder what they have in store for the A11 in the next iPhone due later this year. I think this sets up the expectation that Apple will use 10 nm there as well. I'd still expect a dual big + dual little design. The change maybe that Apple could enable all four cores simultaneously under heavy load. More cache as we've seen on the A10X is probably a given, I'd guess 6 MB. GPU side is where I'd see the big changes happening for the A11 with a new cluster design. I don't think they'll have their custom GPU ready by then but Apple has been known to surprise. I see Apple adopting the latest PowerVR design and increasing the cluster count.name99 - Friday, June 30, 2017 - link
"I'd still expect a dual big + dual little design."This is not a useful way to look at it; it reflects ARM thinking, not Apple thinking.
Apple, as far as we can tell, does not design of think of these as "dual big" and "dual little", they think of them as a "flexi-core" that consists of a big and a little very tightly coupled. The difference is that the unit of construction is the "big+little" it's not clusters of big and clusters of little.
We appear to know that switching between a big and its companion little is done by HW. (Apple talks about a "HW performance controller" doing this job). It also seems to be the case that the two can't run independently (big and little running simultaneously) though it's not clear if this is a HW limitation, an OS limitation, or just a policy decision (Apple experimented and could find no circumstances under which it really made sense).
If I had to bet, my betting would be that as we move forward the big and little cores will become ever closer, ever more like two sides of a single "flexicore", so perhaps even moving to sharing L1 cache for example. We'll see...
(ARM has STARTED down this path with DynamIQ --- at least now big and little cores can have a tighter association rather than being forced into separate clusters using separate L2s. Not clear yet if DynamIQ allows for HW to control the toggling between big and little rather than software.)
Kevin G - Friday, June 30, 2017 - link
I'm not disagreeing with you but there isn't much in terms of terminology to quickly describe that arrangement. It is an implementation distinctly different from what ARM is doing but they both do the same thing at a high level.Sharing L1 cache would be nice as swapping between the two designs wouldn't haven't move data for warming caches. However, I can it being difficult to keep L1 latencies low in such a scenario Perhaps just a shared L1 data cache and dedicated L2 instruction caches?
kfishy - Friday, June 30, 2017 - link
Heck, since it's on the same silicon and using the same process you can theoretically even share the registers and just swap the big/little pipelines and execution units.name99 - Friday, June 30, 2017 - link
One problem is that you want the little core not just to have a simpler micro-architecture but ALSO to be built of slower transistors. That reality would seem to constrain how aggressively you can push sharing.But there are academic designs (Univ North Carolina Chapel Hill has done a lot work in this) that share almost everything and do the big/little transition by shutting down parts of the big microarchitecture. They utilize counters to predict regions of code that will not benefit from the wide micro-architecture (maybe lots of misses to memory, maybe lots of sequentially dependent instructions, maybe lots of hard to predict misses), and switch between the wide and the more narrow configs at around every thousand instructions or so. In theory these give substantial energy savings at a performance loss of 3..5% (which you can easily make up and more just by cranking the frequency higher).
But I'm guessing it will be some time before the commercial world gets there! Let's see if they're at least headed that way by seeing whether Apple's next config pulls the two CPUs tighter together.
kfishy - Sunday, July 2, 2017 - link
Oh yeah, slower less leaky transistors definitely help, but even just switching fewer transistors with a simpler pipeline would yield non-trivial power savings.iwod - Friday, June 30, 2017 - link
Actually this got me thinking may be there wont be A11. Apple will use the same A10X for iPhone 8. Since 10nm is a short node, it is only a stepping stone to 7nm. May be the innovation, Apple made GPU, new CPU architecture, will only come next year?Nullify - Friday, June 30, 2017 - link
Now where are Samsung and Qualcomm with their higher end SoCs for tablet use? Seems ridiculous to use the same one as your phone when a tablet is where you want to extra power.Araa - Friday, June 30, 2017 - link
The thing is they don't have anything better than what they put in their phones.blackcrayon - Friday, June 30, 2017 - link
Seems like Apple is the only one with the profit margin and tablet sales to justify developing customized higher end chips. For Samsung their high end phone chips are "good enough". Of course Apple does this too but only in their lower end tablets at this point (lone exception was the iPad Air).1_rick - Friday, June 30, 2017 - link
What the heck is a pipecleaner, in this context?artk2219 - Friday, June 30, 2017 - link
Cleans up the fabrication process, you will take a higher loss on the production of these chips because the process isnt completely mature. It paves the way for better yields on your more profitable or numerous later products. It cleans out the gunk in the pipes.1_rick - Friday, June 30, 2017 - link
"It cleans out the gunk in the pipes."Ok, that makes sense. Would've been nice to have had the term explained--I don't think I've ever seen it used here before, although I admittedly don't read every article.
name99 - Friday, June 30, 2017 - link
That's because you don't know enough about the internet. Same thing happens there.Once a month VZW, Global Crossing, China Telecom and so on, all the big ISPs, pour pipe cleaner into the internet to clean out the pipes. That's why you get occasional hiccups in the speed.
Has to be done carefully and synchronized around the whole world so that the cleaner poured into China, for example, can flow out in time and doesn't collide with the cleaner poured into the US.
Notmyusualid - Friday, June 30, 2017 - link
@ name99Thanks!
Icehawk - Friday, June 30, 2017 - link
I hope this increases battery life significantly, I have the 9.7 Pro and it's battery life is much worse than the prior iPads I've owned.blackcrayon - Friday, June 30, 2017 - link
Probably any savings are eaten up by this sweet 120Hz (when it needs it) screen.Ej24 - Friday, June 30, 2017 - link
Holy crap that memory bandwidth is nuts. Why can't we have that on desktops?!tipoo - Sunday, August 20, 2017 - link
Bear in mind that's feeding the GPU at the same time. CPUs dont generally need as much bandwidth, and performant GPUs come with their own faster memory.SydneyBlue120d - Saturday, July 1, 2017 - link
Is the CPU 64bit only?NetMage - Sunday, July 2, 2017 - link
iPad Pro 10.5 runs iOS 10 so obviously not.darkich - Monday, July 3, 2017 - link
The crazy thing is, iPad pro uses 50% less power than surface pro 5, while crushing it in raw performance benchmarkstipoo - Sunday, August 20, 2017 - link
Is there a source that showed its power use?MrJBlacked - Monday, July 24, 2017 - link
10mn gets me all tinglyhapeid - Friday, August 25, 2017 - link
The core are bigger because Apple's Ax core are using a ton of cache. You can run an entire program from Apple's cache. However Apple's GPUs are slow. The A9x is rated at 350 gigaflops. The SD 820 is rated at 498 gigaflops. The SD 821 is sl...http://apkmodx.net
gamezoneandroid - Sunday, October 15, 2017 - link
<a href="http://aplikasiandroid-dl.blogspot.co.id/2017/10/m... Rush Despicable Me Apk + Mod Unlocked Terbaru</a>for Review <a href="https://blog.jpnn.com/artikel/Minion-Rush-Despicab...
this is my first time i visit here. I found so many entertaining stuff in your blog, especially its discussion. From the tons of comments on your articles, I guess I am not the only one having all the leisure here! Keep up the excellent work. his is my first time i visit here. I found so many entertaining stuff in your blog, especially its discussion. From the tons of comments on your articles, I guess I am not the only one having all the leisure here! Keep up the excellent work.
coolrock2008 - Wednesday, November 8, 2017 - link
Its been a while, but I cant seem to find reviews of ipad pros listed above on anandtech. did the reviews not get tagged as ipad/ipad pros?