Original Link: https://www.anandtech.com/show/8035/qualcomm-snapdragon-805-performance-preview
Qualcomm Snapdragon 805 Performance Preview
by Anand Lal Shimpi on May 21, 2014 8:00 PM EST- Posted in
- Snapdragon
- Qualcomm
- Mobile
- Tablets
- SoCs
- Snapdragon 805
Last year Qualcomm announced a new tier in its high end SoC roadmap with the Snapdragon 805. Priced somewhat above the current Snapdragon 800/801, the 805 would be the last 32-bit high-end SoC from Qualcomm. It would be the grand finale in Krait's lineage, which started back in 2012 with Krait 200 and MSM8960 and saw iterative improvements over the years. The Snapdragon 805 was not only designed to drive CPU performance higher but also be the launch vehicle for Qualcomm's brand new Adreno 4xx GPU architecture.
The Snapdragon 805 SoC is a beast. It features four Krait 450 cores, each a mild tweak of the Krait 400 design used in the S800/801. These cores can now run at up to 2.7GHz compared to 2.5GHz in the Snapdragon 801 (Krait 400). As always, Qualcomm advertises customer-friendly frequencies rounded up to the nearest 100MHz, the actual max frequency of each Krait 450 core is 2.65GHz (compared to 2.45GHz in Krait 400).
The 8% increase in max frequency comes from tuning at the circuit level, there's no impact to IPC. All four cores sit behind a shared 2MB L2 cache. As is the case with all multi-core Krait SoCs, each CPU core can be power gated, clock gated and even clocked independently of the rest.
The S805 features Qualcomm's Adreno 420 GPU with full support for OpenGL ES 3.1 (with some extensions), OpenCL 1.2 and Direct3D feature level 11_2 (with a hardware tessellation engine). In typical Qualcomm fashion, it isn't disclosing any material details on the underlying Adreno 420 architecture so we'll have to guess based on what the benchmarks tell us. Adreno 420 includes support for Adaptive Scalable Texture Compression (ASTC), a new texture compression first introduced by ARM in 2011.
There are other architectural improvements including better texturing performance and faster depth rejection. The architecture should be more efficient than Adreno 3xx as well, making better use of the underlying hardware.
The GPU runs at a max frequency of 600MHz.
Qualcomm claims a 20% reduction in power consumption compared to Adreno 330 (Snapdragon 800) when running the T-Rex HD test from GFXBench at 1080p (onscreen).
For the first time, the GPU now gets its own direct path to the SoC's memory interface. In the past the GPU shared a bus with the ISP and video engines, but in order to feed the beast that had to change. The memory interface on S805 features two 64-bit LPDDR3-800 partitions (4 x 32-bit external interfaces), each capable of supporting 1600MHz datarate LPDDR3 for an aggregate peak theoretical bandwidth figure of 25.6GB/s. The Krait 450 cores themselves aren't big enough to use all of that memory bandwidth. The wide memory interface is really there for the GPU and video engines. We haven't seen a memory interface this wide on a mobile SoC since Apple's A5X/A6X designs.
In order to accommodate the wider memory interface but still make Snapdragon 805 suitable for use in a smartphone as well as a tablet, Qualcomm turned to a different packaging technology. Since the Snapdragon 805 is an APQ part, it lacks the integrated modem of the MSM SoCs we've found in most of Qualcomm's recent flagships. S805 uses a Moulded Embedded Package (MEP) that allows Qualcomm to route its 128-bit wide memory interface to on-package DRAM, giving it all of the benefits of a PoP stack as well as the wider memory interface. Qualcomm wouldn't provide me with a ton of details on MEP other than to say that rather than using the perimeter of the SoC's package to connect to memory stacked above it, MEP uses a substrate layer on top of the SoC to connect to the memory, giving the SoC more surface area to route lines to the DRAM. Qualcomm also claims the amount of metal it uses in the DRAM's substrate layer has some small impact on improving thermals on the overall package. The result is that Snapdragon 805 is still compact enough to go into a smartphone as long as the design can accomodate a discrete modem.
The Snapdragon 805 also marks Qualcomm's first SoC with a hardware H.265/HEVC video decode engine. There's no hardware H.265 encode acceleration however, that won't come until Snapdragon 810 in 2015.
The S805's ISP sees an increase in performance as well. The SoC retains Qualcomm's dual-ISP design, now capable of pushing up to 1.2 Gigapixels/s through the engine. If Qualcomm arrives at that number the same way as it has in the past, that would imply a 600MHz ISP operating frequency (up from 465MHz in the Snapdragon 801). The new ISP supports up to four MIPI camera inputs (TrioCam + FF anyone?). The ISP can support 4k30 and 1080p120 video capture.
Qualcomm also claims improved autofocus performance and better noise reduction.
Just as in years past, Qualcomm invited us out to a benchmarking workshop to get some hands on time with its Snapdragon 805 Mobile Development Platform (MDP) ahead of actual device availability. And just like we saw with the Snapdragon 800 benchmarking workshop, the S805's MDP comes in tablet form.the Snapdragon 805 MDP/T features a 10.6" 2560 x 1440 display, 3GB of LPDDR3 memory and 64GB of internal storage (eMMC 5.0). The chassis looks very similar to previous MDP/T designs.
Just as before, the benchmarks that follow are of a pre-production device that isn't shipping hardware. Although Qualcomm has significantly improved the delta we've seen between MDPs and shipping devices, there's always the caveat that performance could be different once we are looking at a shipping device, running on battery power. Although Qualcomm gave us access to the MDP/T, the devices were running on AC power with no power instrumentation connected. Qualcomm's own data shows a reduction in power consumption for Snapdragon 805 vs. 800, but once again we'll have to wait for shipping devices to really understand the impact of the SoC on battery life. What follows is exactly what the title of this piece indicates: a preview of Snapdragon 805 performance. Although Qualcomm pre-loaded the MDP/T with some commonly used benchmarks, we installed our own copies of everything we ran.
Qualcomm's Snapdragon 8xx Lineup | |||||||
Snapdragon 810 | Snapdragon 808 | Snapdragon 805 | Snapdragon 801 | Snapdragon 800 | |||
Internal Model Number | MSM8994 | MSM8992 | APQ8084 | MSM8974 v3 | MSM8974 v2 | ||
Manufacturing Process | 20nm | 20nm | 28nm HPm | 28nm HPm | 28nm HPm | ||
CPU | 4 x ARM Cortex A57 + 4 x ARM Cortex A53 (big.LITTLE) | 2 x ARM Cortex A57 + 4 x ARM Cortex A53 (big.LITTLE) | 4 x Qualcomm Krait 450 | 4 x Qualcomm Krait 400 | 4 x Qualcomm Krait 400 | ||
ISA | 32/64-bit ARMv8-A | 32/64-bit ARMv8-A | 32-bit ARMv7-A | 32-bit ARMv7-A | 32-bit ARMv7-A | ||
GPU | Adreno 430 | Adreno 418 | Adreno 420 | Adreno 330 | Adreno 330 | ||
H.265 Decode | Yes | Yes | Yes | No | No | ||
H.265 Encode | Yes | No | No | No | No | ||
Memory Interface | 2 x 32-bit LPDDR4-1600 | 2 x 32-bit LPDDR3-933 | 4 x 32-bit LPDDR3-800 | 2 x 32-bit LPDDR3-800/933 | 2 x 32-bit LPDDR3-800/933 | ||
Integrated Modem | 9x35 core, LTE Category 6/7, DC-HSPA+, DS-DA | 9x35 core, LTE Category 6/7, DC-HSPA+, DS-DA | - | 9x25 core, LTE Category 4, DC-HSPA+, DS-DA | 9x25 core, LTE Category 4, DC-HSPA+, DS-DA | ||
Integrated WiFi | - | - | - | - | - | ||
eMMC Interface | 5.0 | 5.0 | 5.0 | 5.0 | 4.5 | ||
Camera ISP | 14-bit dual-ISP | 12-bit dual-ISP | 1.2 GP/s | 930 MP/s | 640 MP/s | ||
Shipping in Devices | 1H 2015 | 1H 2015 | 2H 2014 | Now | Now |
I pulled comparison results from our new combined Phone/Tablet 2014 category in Bench. The key comparisons here are the iPad Air (for obvious reasons), ASUS' Transformer Pad TF701T (Tegra 4 in a tablet), ASUS' Transformer Book T100 (Intel's Bay Trail in a tablet) and the HTC One (M8)/Samsung Galaxy S 5 (both are Snapdragon 801 devices). With the exception of the Bay Trail based T100, everything else runs iOS or Android.
CPU Performance
As always we'll start out our performance investigation with a handful of CPU bound web browser based tests. In all cases we used Chrome on the MDP/T. Remember there's only an 8% increase in peak CPU frequency here, so I wouldn't expect a huge difference vs. Snapdragon 801.
Here the MDP/T scales pretty well, showing a 6% improvement in performance over the Snapdragon 801 based Galaxy S 5. In the case of the GS5 we are looking at a 2.5GHz Snapdragon 801 implementation, so the improvement makes sense. Both the Cortex A15 (TF701T/Shield) and Apple's Cyclone (in the iPad Air) are higher performing designs here. Since there's no fundamental change to Krait's IPC, the only gains we see here are from the higher clock speed.
Kraken appears to be at its limit when it comes to Krait 400/450, there's effectively no additional frequency scaling beyond 2.3GHz. We're either running into an architectural limitation or limits of the software/browser combination itself.
Similarly we don't see any real progress in the Google Octane test either. Snapdragon 805's CPU cores may run at a higher peak frequency but that's definitely not the story here.
Basemark OS II
Basemark OS II gives us a look at native application performance across a variety of metrics. There are tests that hit the CPU, GPU as well as storage subsystems here. The gains here are exclusively on the graphics side, which makes sense given what we've just seen. Snapdragon 805's biggest gains will be GPU facing.
Geekbench 3.0
Although I don't typically use Geekbench, I wanted to include some numbers here to highlight that the increase in memory bandwidth for S805 over S801 doesn't really benefit the CPU cores:
Geekbench 3.0 | |||||
Snapdragon 801 2.3GHz (HTC M8) | Snapdragon 805 2.7GHz (MDP/T) | % Increase for S805 | |||
Overall (Single thread) | 1001 | 1049 | 4.8% | ||
Overall (Multi-threaded) | 2622 | 2878 | 9.7% | ||
Integer (Single thread) | 956 | 996 | 4.2% | ||
Integer (Multi-threaded) | 2999 | 3037 | 1.3% | ||
FP (Single thread) | 843 | 925 | 9.7% | ||
FP (Multi-threaded) | 2636 | 3155 | 19.7% | ||
Memory (Single thread) | 1411 | 1406 | 0% | ||
Memory (Multi-threaded) | 1841 | 1949 | 6% |
I wouldn't read too much into the multithreaded FP results, I suspect we're mostly seeing differences in thermal dissipation of the two test units. A closer look at the memory bandwidth numbers confirms that while the 805 has more memory bandwidth, most of it is reserved for GPU use:
Geekbench 3.0 - Memory Bandwidth | |||||
Snapdragon 801 2.3GHz (HTC M8) | Snapdragon 805 2.7GHz (MDP/T) | % Increase for S805 | |||
Stream Copy (Single thread) | 7.89 GB/s | 8.04 GB/s | 1.9% | ||
Stream Copy (Multi-threaded) | 9.53 GB/s | 10.1 GB/s | 5.9% | ||
Stream Scale (Single thread) | 5.36 GB/s | 5.06 GB/s | - | ||
Stream Scale (Multi-threaded) | 7.31 GB/s | 7.63 GB/s | 4.3% | ||
Stream Add (Single thread) | 5.27 GB/s | 5.2 GB/s | - | ||
Stream Add (Multi-threaded) | 6.84 GB/s | 7.51 GB/s | 9.8% | ||
Stream Triad (Single thread) | 5.64 GB/s | 5.85 GB/s | 3.7% | ||
Stream Triad (Multi-threaded) | 7.65 GB/s | 7.89 GB/s | 3.1% |
GPU Performance
3DMark
Although it's our first GPU test, 3DMark doesn't do much to show Adreno 420 in a good light. 3DMark isn't the most GPU intensive test we have, but here we see marginal increases over Snapdragon 800/Adreno 330. I would be interested in seeing if there are any improvements on the power consumption front since performance doesn't really change.
Basemark X 1.1
Basemark X 1.1 starts to show a difference between Adreno 420 and 330. At medium quality settings we see a 25% increase in performance over the Snapdragon 801 based Adreno 330 devices. Move to higher quality settings and the performance advantage increases to over 50%. Here even NVIDIA's Shield with Tegra 4 cooled by a fan can't outperform the Adreno 420 GPU.
GFXBench 3.0
Manhattan continues to be a very stressful test but the onscreen results are pretty interesting. Adreno 420 can drive a 2560 x 1440 display at the same frame rate that Adreno 330 could drive a 1080p display.
In an apples to apples comparison at the same resolution, Adreno 430 is over 50% faster than Adreno 330. It's also faster than the PowerVR G6430 in the iPad Air.
Once again we see an example where Adreno 420 is able to drive the MDP/T's panel at 2560 x 1440 at the same performance as Adreno 330 can deliver at 1080p
At 1080p, the Adreno 420/S805 advantage grows to 45%.
I've included all of the low level GFXBench tests below if you're interested in digging any deeper. It's interesting that we don't see a big increase in the ALU test but far larger increases in the alpha blending and fill rate tests.
Final Words
Qualcomm tends to stagger the introduction of new CPU and GPU IP. Snapdragon 805 ultimately serves as Qualcomm's introduction vehicle for its Adreno 420 GPU. The performance gains there over Adreno 330/Snapdragon 801 can be substantial, particularly at high resolutions and/or higher quality settings. Excluding 3DMark, we saw a 20 - 50% increase in GPU performance compared to Snapdragon 801. Adreno 420 is a must have if you want to drive a higher resolution display at the same performance as an Adreno 330/1080p display combination. With OEMs contemplating moving to higher-than-1080p resolution screens in the near term, leveraging Snapdragon 805 may make sense there.
The gains on the CPU side are far more subtle. At best we noted a 6% increase in performance compared to a 2.5GHz Snapdragon 801, but depending on thermal/chassis limitations of shipping devices you may see even less of a difference.
Qualcomm tells us that some of its customers will choose to stay on Snapdragon 801 until the 810 arrives next year, while some will choose to release products based on 805 in the interim. Based on our results here, if an OEM is looking to specifically target the gaming market I can see Snapdragon 805 making a lot of sense. For most of those OEMs that just launched Snapdragon 801 based designs however, I don't know that there's a huge reason to release a refresh in the interim.
I am curious to evaluate the impact of ISP changes as well as dive deeper into 4K capture and H.265 decode, but that will have to wait until we see shipping designs. The other big question is just how power efficient Adreno 420 is compared to Adreno 330. Qualcomm's internal numbers are promising, citing a 20% reduction in power consumption at effectively the same performance in GFXBench's T-Rex HD onscreen test.