![](/Content/images/logo2.png)
Original Link: https://www.anandtech.com/show/9837/snapdragon-820-preview
The Qualcomm Snapdragon 820 Performance Preview: Meet Kryo
by Ryan Smith & Andrei Frumusanu on December 10, 2015 11:00 AM EST- Posted in
- Snapdragon
- Qualcomm
- SoCs
- Snapdragon 820
![](https://images.anandtech.com/doci/9837/Front2_678x452.jpg)
I don’t think there’s any way to sugarcoat this, but 2015 has not been a particularly great year for Qualcomm in the high-end SoC business. The company remains a leading SoC developer, but Snapdragon 810, the company’s first ARMv8 AArch64-capable SoC, did not live up to expectations. Seemingly held back by design matters and a rough 20nm planar manufacturing process – a problem shared by many vendors in the last year – Snapdragon 810 couldn’t make good use of its highly clocked ARM Cortex-A57 cores, and ultimately struggled in the face of SoCs built on better processes such as Samsung’s surprisingly early Exynos 7420.
But the purpose of today’s article isn’t to reminisce about the past, rather it’s to look towards the future. Qualcomm knows all too well what has happened in the past year and the cost to the company that has come from it, so now they need to dust themselves off and try again. With Samsung’s more advanced 14nm FinFET process in hand, a new CPU core, a new GPU, and a number of other advancements, Qualcomm is ready to try again; to try to recapture the good old days of 28nm and their Krait CPU architecture.
To that end Qualcomm started talking about Snapdragon 820 early and doing so loudly. Last month the company held their first press demonstration of the SoC, showcasing early demonstrations in action and going into more detail than ever before on their performance and power projections for their next-generation SoC.
If there is any unfortunate aspect to any of this, it’s that while Qualcomm is showing off Snapdragon 820 today, it won’t be ready for the holidays (lining up with what we expect will be the typical spring smartphone refreshes). But some of this is clearly driven by Qualcomm’s business needs and the aforementioned effort at Qualcomm to quickly pick themselves up and try again.
Meanwhile after last month’s demonstrations, this month Qualcomm is ready to move on to the next phase in what has become their traditional roll-out process for a new SoC: giving the press access to the company’s Mobile Development Platform (MDP) devices. Designed for software developers to begin building apps and (for lack of a better word) experiences around the new SoC, the MDP is something of the home-stretch in SoC development, as it means Qualcomm is ready to let the press and developers see the hardware and near-final software stack. We’ve previously previewed the Snapdragon 800, 805, and 810 via their MDPs, and for Snapdragon 820 Qualcomm has once again opted to do the same. So without further ado, let’s take our first look at Snapdragon 820.
Qualcomm Snapdragon S810 Specifications | |||
SoC | Snapdragon 820 | Snapdragon 810 | Snapdragon 800 |
CPU | 2x [email protected] 512KB(?) L2 cache 2x [email protected] 1MB(?) L2 cache |
4x [email protected] 512KB L2 cache 4x [email protected] 2MB L2 cache |
4x Krait [email protected] 4x512KB L2 cache |
Memory Controller |
2x 32-bit LPDDR4 @ 1803MHz 28.8GB/s b/w |
2x 32-bit LPDDR4 @ 1555MHz 24.8GB/s b/w |
2x 32-bit LPDDR3 @ 933MHz 14.9GB/s b/w |
GPU | Adreno 530 @ 624MHz |
Adreno 430 @ 600MHz |
Adreno 330 @ 600MHz |
Mfc. Process |
Samsung 14nm LPP |
TSMC 20nm SoC |
TSMC 28nm HPm |
Taking a trek down to sunny San Diego, Qualcomm handed to us the Snapdragon 820 MDP/S. A 6.2” phablet, the MDP/S is a development kit designed for function over form, containing a full system implementation (sans cellular) in an otherwise utilitarian design. Along with the Snapdragon 820 SoC, the 820 MDP/S also includes a 6.2” 2560x1600 display, 3GB of LPDDR4 memory runnning at a slightly higher 1804MHz instead of 1555MHz we've seen on the Snapdragon 810 and Exynos 7420, a 64GB Universal Flash Storage package, a 21MP rear camera, 802.11ac WiFi, and a Sense ID ultrasonic fingerprint scanner. Overall the aesthetics of the MDP/S differs significantly from what retail phones will go for, but internally the MDP/S won’t be far removed from the kinds of configurations we’ll see in 2016 smartphones.
Overall there’s little to report on the MDP/S experience itself. Qualcomm is still sorting out some driver bugs – only one device in our group was ready to run PCMark – and to be sure like past Qualcomm MDP previews this is very much a preview. However the experience was otherwise unremarkable (in a good way) with our unit completing all of our tests bar part of SPEC CPU 2000, which will require further analysis.
More interesting from a testing perspective is that Qualcomm opted to demonstrate Snapdragon 820 using the MDP/S smartphone development kit, instead of a larger MDP/T tablet development kit. Qualcomm has used MDP/T for the press demonstrations on both Snapdragon 800 and Snapdragon 810, so the fact that they are once again using the MDP/S is very notable. From a pure performance perspective the MDP/T allowed Qualcomm to show off previous Snapdragon designs at their best – these are just performance previews, after all – but after Snapdragon 810 I don’t doubt that had this been another MDP/T that the 820’s thermals and power consumption would be called into question. So instead we are looking at 820 in a phablet, and while this may not put 820 in the best possible light, the end result is that we get to see what performance in a large phone looks like, and for Qualcomm there isn’t any doubt about 820’s suitability for a smartphone.
As for Snapdragon 820 itself, we’ve already covered the SoC in some depth in past articles – and this week’s preview doesn’t come with much in the way of new architectural information – but here’s a quick recap of what we know so far. 820 uses a new Qualcomm developed CPU core called Kryo. The quad core CPU is best described as an HMP solution with two high-performance cores clocked at 2150 MHz and two low-power cores clocked at 1593MHz. The CPU architectures of both clusters are identical, but with differences in cache configuration and their power/frequency tuning.
Meanwhile the GPU inside 820 is the Adreno 530. This is a next-generation design from Qualcomm and includes functionality that until now has only been found in PC desktops, such as shared virtual memory with the CPU, which allows an OpenCL host program and a device's kernel to share a virtual address space so access to data structures like lists and trees can be easily shared between the host and GPU. The underlying architecture is capable of Renderscript and OpenCL 2.0 on the compute side – a significant step up from Adreno 400 – and on the graphics side supports OpenGL ES 3.1 + AEP and Vulkan. We know the 530 should be powerful, but like past Qualcomm designs the company is saying virtually nothing about the underlying architecture.
Finally, while it’s not something that can be covered in our brief testing, the 820 contains a new DSP block, the Hexagon 680. Hexagon 680 and its Hexagon Vector Extensions (HVX) are designed to handle significant compute workloads for image processing applications such as virtual reality, augmented reality, image processing, video processing, and computer vision. This means that tasks that might otherwise be running on a relatively power hungry CPU or GPU can run a comparatively efficient DSP instead. The HVX has 1024-bit vector data registers, with the ability to address up to four of these slots per instruction, which allows for up to 4096 bits per cycle.
CPU Performance: Meet Kryo
To dive right into the heart of matters then, after getting our standard benchmarks out of the way we had enough time left to load up some of our more advanced analysis tools to run on the 820 MDP/S. While Qualcomm has been somewhat forthcoming in the Kryo CPU architecture, they have never been as forward as say ARM (who is in the business of licensing the IP), so there are still some unanswered questions about what Kryo is like under the hood.
Qualcomm CPU Core Comparison | |||||||
Snapdragon 800 | Snapdragon 810 | Snapdragon 820 | |||||
CPU Codename | Krait | ARM Cortex-A57 | Kryo | ||||
ARM ISA | ARMv7-A (32-bit) | ARMv8-A (32/64-bit) | ARMv8-A (32/64-bit) | ||||
Integer Add | 1 | 2 | 1 | ||||
Integer Mul | 1 | 1 | 1 | ||||
Shifter ALUs | 1 | 2 | 1 | ||||
Addition (FP32) Latency | 3 cycles | 5 cycles | 3 cycles | ||||
Multiplication (FP32) Latency | 6 cycles | 5 cycles | 5 cycles | ||||
Addition (INT) Latency | 1.5 cycles | 1 cycle | 1 cycle | ||||
Multiplication (INT) Latency | 4 cycles | 3 cycles | 4 cycles | ||||
L1 Cache | 16KB I$ + 16KB D$ | 48KB I$ + 32KB D$ | 32KB I$ + 32KB D$? | ||||
L3 Cache | N/A | N/A | N/A |
One thing that immediately jumps out is how similar some of our results are to Krait. According to our initial tests, the number of integer and FP ALUs would appear to be unchanged. Similarly the latency for a lot of operations is similar as well. This isn’t wholly surprising as Krait was a solid architecture for Qualcomm, and there is a good chance they agreed and decided to use it as their starting point. At the same time however I do want to note that these are our initial results done rather quickly on what’s essentially a beta device; further poking later on may reveal more differences than what we’ve seen so far.
But with the above said, there’s a big difference between how many execution units a CPU design has and how well it can fill them, which is why even similar designs can have wildly different IPC. We’ll investigate this a bit more in a moment, however it’s worth noting that this is exactly the philosophy ARM has gone into with Cortex-A72, so it is neither unprecedented nor even unexpected.
Looking at the memory hierarchy and latency, our results point to a 32KB L1 data cache. For the moment I’m assuming the instruction cache is identical, as is the case on most designs, but this test is purely a data test. Meanwhile L2 cache size is a bit harder to pin down; we know that the different CPU clusters on 820 will be using different L2 cache sizes. Ultimately it's pretty much impossible to pin down the exact L2 cache size from this test alone, especially since we can't see the amount of L2 attached to the lower clocked Kryo cluster.
According to our colleague Matt Humrick over at Tom's Hardware, while investigating the matter, it seems that Qualcomm disclosed that we're looking at an 1MB L2 for the performance cluster and a 512KB L2 for the power cluster. We're still looking into independently confirming this bit of information with Qualcomm.
However what you won’t find – and much to our surprise – is an L3 cache. Our test results indicate (and Qualcomm confirms) that Snapdragon 820 does not have an L3 cache as we initially expected, with the L2 cache being the highest cache level on the chip. We initially reported there to be an L3 due to the fact that we found evidence and references to this cache block in Qualcomm's resources, but it seems the latest revision of the SoC doesn't actually employ such a piece in actual silicon, as demonstrated by the latency graph. This means that there isn’t any kind of cache back-stopping interactions between the two CPU clusters, or between the CPU and GPU. Only simple coherency, and then beyond that main memory.
Geekbench 3 Memory Bandwidth Comparison (1 thread) | ||||||
Stream Copy | Stream Scale | Stream Add | Stream Triad | |||
SD 801 (2458MHz) | 7.6 GB/s | 4.6 GB/s | 4.6 GB/s | 5.2 GB/s | ||
SD 810 (1958MHz) | 7.5 GB/s | 7.4 GB/s | 6.4 GB/s | 6.6GB/s | ||
SD 820 (2150MHz) | 17.4 GB/s | 11.5 GB/s | 13.1 GB/s | 12.8 GB/s | ||
SD 820 > 810 Advantage | 131% | 55% | 103% | 94% |
Meanwhile looking at Geekbench 3 memory performance, one can see that memory bandwidth is greatly improved over both Snapdragon 800/801 and 810. Stream copy in particular is through the roof, increasing by 131% (over double 810’s performance). Even the other tests, though not as great, are between 55% and 103%. The Snapdragon 820 also shows improved latency to main memory when compared to the Snapdragon 810, so it seems that Qualcomm made definite improvements in the memory controller and general memory architecture of the chipset, allowing the CPUs to get nearer to the theoretical total memory bandwidth offered by the memory controllers.
Moving on, let’s shift to some benchmarks that make a more comprehensive look at performance, starting with SPECint2000. Developed by the Standard Performance Evaluation Corporation, SPECint2000 is the integer component of their larger SPEC CPU2000 benchmark. Designed around the turn of the century, officially SPEC CPU2000 has been retired for PC processors, but with mobile processors roughly a decade behind their PC counterparts in performance, SPEC CPU2000 is currently a very good fit for the capabilities contemporary SoCs.
SPECint2000 - Estimated Scores | ||||||
Snapdragon 810 | Snapdragon 820 | % Advantage | ||||
164.gzip |
823
|
1176
|
43%
|
|||
175.vpr |
2456
|
1707
|
-30%
|
|||
176.gcc |
1341
|
1641
|
22%
|
|||
181.mcf |
789
|
593
|
-25%
|
|||
186.crafty |
1492
|
1449
|
-3%
|
|||
197.parser |
753
|
962
|
28%
|
|||
252.eon |
2321
|
3333
|
44%
|
|||
253.perlbmk |
1090
|
1384
|
27%
|
|||
254.gap |
1325
|
1447
|
9%
|
|||
255.vortex |
1043
|
1583
|
52%
|
|||
256.bzip2 |
867
|
1041
|
20%
|
|||
300.twolf |
DNC
|
DNC
|
N/A
|
Even though this early preview means we don’t have the luxury of building a binary with a compiler aware of Kryo, using our A57 binaries produces some preliminary results on the 820 MDP/S. Performance does regress in a couple of places – but in other places we see performance increases by up to 52%. 820 does have a slight 10% frequency advantage over 810, so when taking into account the clock difference the IPC improvements are slightly lower. This is also showcased when comparing the Snapdragon 820 to a more similarly clocked Exynos 7420 (A57 @ 2100MHz), where the maximum advantage drops to 33% and similarly to a clock-normalized Snapdragon 810, the overall average comes in at only 5-6%. Once we get the opportunity to have more time with a Snapdragon 820 device we'll be able to verify how much the compiler settings affect the score on the Kryo architecture.
Our other set of comparison benchmarks comes from Geekbench 3. Unlike SPECint2000, Geekbench 3 is a mix of integer and floating point workloads, so it will give us a second set of eyes on the integer results along with a take on floating point improvements.
Geekbench 3 - Integer Performance | ||||||
Snapdragon 810 | Snapdragon 820 | % Advantage | ||||
AES ST |
739.7 MB/s
|
700.7 MB/s
|
-5%
|
|||
AES MT |
3.05 GB/s
|
1.99 GB/s
|
-35%
|
|||
Twofish ST |
89.8 MB/s
|
102.7 MB/s
|
14%
|
|||
Twofish MT |
448.5 MB/s
|
345.5 MB/s
|
-23%
|
|||
SHA1 ST |
628.9 MB/s
|
983 MB/s
|
56%
|
|||
SHA1 MT |
3.02 GB/s
|
2.84 GB/s
|
-6%
|
|||
SHA2 ST |
83.5 MB/s
|
134.9 MB/s
|
61%
|
|||
SHA2 MT |
393.4 MB/s
|
374.6 MB/
|
-5%
|
|||
BZip2Comp ST |
5.01 MB/s
|
7.29 MB/s
|
45%
|
|||
BZip2Comp MT |
20.5 MB/s
|
20.5 MB/s
|
0%
|
|||
Bzip2Decomp ST |
7.99 MB/s
|
9.76 MB/s
|
24%
|
|||
Bzip2Decomp MT |
30.8 MB/s
|
24.9 MB/s
|
-19%
|
|||
JPG Comp ST |
18.9 MP/s
|
23.3 MP/s
|
23%
|
|||
JPG Comp MT |
88.9 MP/s
|
76.7 MP/s
|
-14%
|
|||
JPG Decomp ST |
41.5 MP/s
|
62.2 MP/s
|
49%
|
|||
JPG Decomp MT |
182.7 MP/s
|
176.6 MP/s
|
-3%
|
|||
PNG Comp ST |
1.11 MP/s
|
1.56 MP/s
|
43%
|
|||
PNG Comp MT |
4.78 MP/s
|
4.61 MP/s
|
-4%
|
|||
PNG Decomp ST |
17.9 MP/s
|
24.2 MP/s
|
35%
|
|||
PNG Decomp MT |
94.1 MP/s
|
64.3 MPs
|
-32%
|
|||
Sobel ST |
53.3 MP/s
|
86.3 MP/s
|
62%
|
|||
Sobel MT |
248.4 MP/s
|
244.8 MP/s
|
-1%
|
|||
Lua ST |
1.30 MB/s
|
1.59 MB/s
|
22%
|
|||
Lua MT |
5.93 MB/s
|
4.5 MB/s
|
-24%
|
|||
Dijkstra ST |
3.38 Mpairs/s
|
5.52 Mpairs/s
|
63%
|
|||
Dijkstra MT |
13.7 Mpairs/s
|
13.7 Mpairs/s
|
0%
|
The actual integer performance gains with GeekBench 3 are rather varied. Single-threaded results consistently show gains, ranging from a minor -5% regression for AES up to a 61% improvement for SHA2. Given the architecture shift involved here, this is a bit surprising (and in Qualcomm’s favor) since you wouldn’t necessarily expect Kryo to beat Cortex-A57 on everything. On the other hand MT results typically show a regression, since Snapdragon 810 had a 4+4 big.LITTLE configuration that meant that it had the 4 Cortex-A53 cores contributing to the task, along with the big cores all running at their near-full clockspeed, while Kryo’s second cluster runs at a reduced clockrate. And though one could have a spirited argument about whether single-threaded or multi-threaded performance is more important, I’m firmly on the side of ST for most use cases.
Geekbench 3 - Floating Point Performance | ||||||
Snapdragon 810 | Snapdragon 820 | % Advantage | ||||
BlackScholes ST |
5.46 Mnodes/s
|
12.3 Mnodes/s
|
125%
|
|||
BlackScholes MT |
25.5 Mnodes/s
|
32.1 Mnodes/s
|
26%
|
|||
Mandelbrot ST |
1.2 GFLOPS
|
2 GFLOPS
|
67%
|
|||
Mandelbrot MT |
6.41 GFLOPS
|
6.23 GFLOPS
|
-3%
|
|||
Sharpen Filter ST |
1.07 GFLOPS
|
2.15 GFLOPS
|
100%
|
|||
Sharpen Filter MT |
5.02 GFLOPS
|
6.11 GFLOPS
|
22%
|
|||
Blur Filter ST |
1.27 GFLOPS
|
3.14 GFLOPS
|
147%
|
|||
Blur Filter MT |
6.14 GFLOPS
|
8.84 GFLOPS
|
44%
|
|||
SGEMM ST |
2.29 GFLOPS
|
4.09 GFLOPS
|
79%
|
|||
SGEMM MT |
6.12 GFLOPS
|
9.19 GFLOPS
|
50%
|
|||
DGEMM ST |
1.05 GFLOPS
|
1.95 GFLOPS
|
85%
|
|||
DGEMM MT |
2.81 GFLOPS
|
4.53 GFLOPS
|
61%
|
|||
SFFT ST |
1.25 GFLOPS
|
1.98 GFLOPS
|
58%
|
|||
SFFT MT |
4.11 GFLOPS
|
5.65 GFLOPS
|
37%
|
|||
DFFT ST |
1.03 GFLOPS
|
1.68 GFLOPS
|
63%
|
|||
DFFT MT |
2.97 GFLOPS
|
4.76 GFLOPS
|
60%
|
|||
N-Body ST |
486.6 Kpairs/s
|
841 Kpairs/s
|
73%
|
|||
N-Body MT |
1.72 Mpairs/s
|
2.34 Mpairs/s
|
36%
|
|||
Ray Trace ST |
1.84MP/s
|
2.86 MP/s
|
55%
|
|||
Ray Trace MT |
8.16 MP/s
|
8.46 MP/s
|
4%
|
GeekBench 3’s floating point results are even more positive for Snapdragon 820. There is only a single performance regression, a -3% in Mandelbrot multi-threaded. Otherwise in both MT and ST workloads, performance is significantly up. This is a prime example of where Kryo is taking better advantage of its execution units than any high-end Qualcomm SoC before it, as even holding steady (or on paper having a slight deficit) it in practice comes out significantly ahead.
CPU Performance, Cont
Having taken a look at Snapdragon 820 and the Kryo CPU from an architectural perspective, let’s look at our higher level benchmarks. We’ll start as always with the web benchmarks.
There are two things we can immediately take away from these results. The first is that currently Google Chrome is incredibly unoptimized for Kryo, and this is something Qualcomm was also quick to mention. We won’t wax on about this as there’s nothing to say we haven’t said before, but Chrome could certainly stand to implement optimized JS engines sooner.
Otherwise if we look at Qualcomm’s native browser, things are greatly improved. Relative to both the Exynos 7420 (A57) powered Note 5 and the Snapdragon 810 (A57) powered Mi Note Pro, the MDP/S shows a significant lead. In fact it pretty much blows past those devices in Kraken. However while it easily takes the top spot for an Android device, even with Qualcomm’s native browser the 820 isn’t going to be able to catch up to the iPhone 6s Plus and its A9 SoC.
Basemark OS II 2.0 on the other hand is less consistent. The overall score again pegs the MDP/S as the best Android device, and by over 20%. However for reasons yet to be determined, the system score is still below the latest Samsung devices. Instead where the 820 shows a clear lead is with the storage (memory) score and the graphics score. In some cases it’s even beating the iPhone 6s Plus, though overall it will fall short.
Our final system benchmark, PCMark, once again puts the MDP/S in a good light overall, while the individual sub-tests are more widely varied. Likely owing to the same optimization issues that dogged Chrome performance, web browsing performance trails the A57 devices. Meanwhile video playback closely trails the Snapdragon 810 powered HTC One M9, and writing performance won’t quite surpass the Galaxy S6. Where the 820 MDP/S makes up for it is in the photo editing score, which is through the roof. Here Qualcomm’s development device holds a 34% performance lead over the next-fastest device, the 810/A57 based Mi Note Pro.
GPU Performance
Shifting gears, let’s take a look at GPU performance. As we mentioned earlier, Qualcomm isn’t disclosing much about this GPU other than that it packs quite a bit more computational power than its predecessor and should be quite a bit faster in the process. This points to a potentially significant architectural shift, but that determination will have to wait for another time.
Starting with 3DMark Ice Storm Unlimited, the performance honestly doesn’t start out great. The overall score is significantly influenced by the physics score, which in turn is more concerned with the number of cores and their throughput on simple code than the ability to extract complex IPC. As a result the 4 CPU core 820 simply can’t catch up with the likes of the Samsung devices and their high-clocked big.LITTLE configurations. On the other hand the graphics score makes this the fastest Android phone to date, though relative to the 810 Mi Note Pro, perhaps not by a ton. Ultimately as this is an OpenGL ES 2.x test it’s not the most strenuous of tests these days, and comments from Qualcomm indicate that it may be a CPU-limited test on 820.
GFXBench on the other hand shows some massive gains for the 820 relative to any other Android device. In offscreen rendering mode, all 3 game tests – Manhattan ES 3.1, Manhattan ES 3.0, and T-Rex HD – put the 820 MDP/S as being 52% (or more) faster than the next-fastest Android device, either the 810 based Mi Note Pro or the Exynos 7420 based Samsung Galaxy Note 5. The single biggest jump we see is with Manhattan ES 3.0 at 72%, while the ES 3.1 version dials that back down to 52%. Even the iPhone 6s Plus, well known for its powerful GPU, is handily and consistently surpassed by the 820 here. Only due to the 6s Plus’s lower rendering resolution of 2208x1242 does it surpass the MDP/S in onscreen tests, as the latter needs to render at 2560x1600 (~50% more pixels). Qualcomm was aiming for some big GPU performance gains here and so far they are delivering.
Curiously, GFXBench’s synthetic feature tests don’t show the same gains. Offscreen ALU performance is only slightly improved over the 810 (10%) or in the case of texturing is an outright regression. None-the-less full gaming performance is clearly in the 820’s favor. I’ve long suspected that the Adreno 430 GPU in the 810 had some kind of architectural bottleneck – perhaps an ALU/texture array that was difficult to fully utilize – and what we’re seeing here would back up that claim, as if that was the case then correcting it would have allowed Qualcomm to significantly boost their rendering performance while only barely changing their synthetic performance. Otherwise I find it a bit surprising that the driver overhead score is a bit worse on 820 than 810, which may be a result of the immature GPU drivers on this early device.
Closing Thoughts
Wrapping things up, after Qualcomm’s experiences with the Snapdragon 810 (and to a lesser extent the 808), the company has a lot to do if they wish to recapture their grip on the high-end SoC market, and less time than they’d like to do it. What has happened with the 810 is now in the past, but to recover Qualcomm needs to show they can correct their mistakes and produce a new generation of chips as well designed as the 800/801. And they need to do so at a particularly sensitive time when customer/competitor/supplier Samsung has fully ramped up their own SoC CPU design team, which presents yet more of a challenge to Qualcomm.
As is always the case with these MDP previews, it’s critical to note that we’re looking at an early device with unoptimized software. And at the same time that we’re looking at a device and scenario where Qualcomm is looking to show off their new SoC in the best light possible. Which is to say that between now and retail devices there’s room for performance to grow and performance to shrink depending on what happens with software, thermal management, and more. However at least in the case of the Snapdragon 820 MDP/S preview, I am hopeful that our experiences here will more closely mirror retail devices since we’re looking at a phablet form factor device and not a full-size tablet has was the case in the past couple of generations.
To that end, then, Snapdragon 820 looks like Qualcomm has regained their orientation. Performance is improved over 810 – usually greatly so – at both the CPU and GPU level. And for what it’s worth, while we don’t have extensive temperature/clockspeed logs from the MDP/S, at no point did the device get hot to the touch or leave us with the impression that it was heavily throttling to avoid getting hot to the touch. Power consumption and especially efficiency (Performance/W) is clearly going to be important consideration on 820 after everyone’s experiences with 810, and while we’ll have to see what the retail devices are like, after what Samsung was able to do in their own transition from 20nm to 14nm FinFET, I feel it bodes well for Qualcomm as well.
Meanwhile more broadly speaking, our initial data doesn’t paint Snapdragon 820 as the SoC that is going to dethrone Apple’s commanding lead in ARM CPU performance. Even if retail devices improve performance, Apple A9/Twister’s performance lead in CPU-bound scenarios is extensive (particularly in lightly-threaded scenarios), more so than I’d expect any kind of software refinements to close. What seems to be rather concerning is the performance of existing software that isn't yet optimized for the new architecture, well have to see how targeted compilers for Kryo will be able to improve scores in that regard. The Adreno 530 on the other hand looks to to perform very well for a smartphone SoC, besting Apple's latest, and I think there’s a good chance for retail devices to hold their edge here.
Otherwise within the Android SoC space, the big wildcards right now are ARM’s Cortex-A72 and Samsung’s forthcoming M1 CPU. Initial performance estimates of the A72 don't put it very far from Kryo, and given that we'll be seeing some very high clocked SoCs such as the Kirin 950 at 2.3GHz or MediaTek's X20 at 2.5GHz, Qualcomm will seem to have some competition in terms of CPU performance. With the former ARM is striving for performance gains rather similar to what we’ve seen with Snapdragon 820, and Samsung's CPU is still a complete mystery at the moment. Even with their significant gains over the Snapdragon 810, if Kryo is to beat A72 and M1, then I don’t expect it will be an easy win for Qualcomm.