Original Link: https://www.anandtech.com/show/8706/imagination-announces-powervr-series7-gpus-series7xt-series7xe



Though the first PowerVR Series 6XT-equipped products have only recently launched – including the unexpectedly powerful iPad Air 2 – the development cycle for SoCs and the realities of IP licensing mean that Imagination is already focusing on GPU designs for late 2015 and beyond. Just as one design reaches consumer hands the next generation gets completed, and the SoC integration work begins.

Taking place this week are Imagination Technologies’ Chinese idc14 and Imagination Summits developer events. While Imagination holds these events in multiple countries over the year, the Chinese event is in many respects the most important from a hardware standpoint. With the bulk of SoC GPU licensees headquartered in the Asia-Pacific region – firms such as Allwinner, Rockchip, and Samsung – Imagination’s Chinese event is perhaps their biggest customer event and consequently an important venue for product announcements. Again this backdrop Imagination will be using this week’s events to announce the next generation of PowerVR GPUs, PowerVR Series7.

PowerVR Series7 is the the successor to Imagination’s current PowerVR Series 6XT lineup of GPUs. Like Series 6XT, Series7 is composed of two variants, Series7XT for the high end and Series7XE for the low-end, and in turn each contains a number of individual configurations. Ranging from half a shader cluster (USC) to 16 clusters, Imagination is seeking to cover virtually the entire range of SoC-equipped devices, from high-end IoT/wearables to tablets, set top boxes, and even HPC severs.

From an architectural standpoint Series7 will be a further iteration on Imagination’s Rogue architecture, which was first used in 2012’s Series6. With each generation Imagination has further tweaked and expanded their designs to improve performance/efficiency and to cover new use cases, and for Series7 the story is much the same. This year Imagination has sat down with us to give us an overview of what’s new and changed in their architecture for Series7, so let’s dive right in.

PowerVR GPU Comparison
  Series7XT Series7XE Series6XT Series6XE
Clusters 2 - 16 0.5 - 1 2 - 8 0.5 - 1
FP32 FLOPs/Clock 128 - 1024 32 - 64 128 - 512 32 - 64
FP16 FLOPs/Clock 256 - 2048 64 - 128 256 - 1024 64 - 128
Pixels/Clock (ROPs) 4 - 32? 2 - 4? 4 - 16 2 - 4
Texels/Clock 4 - 32 1 - 2 4 - 16 1 - 2
OpenGL ES 3.1 3.1 3.1 3.1
Android Extension Pack / Tessellation Yes Optional Optional No
Direct3D Base: FL 10_0
Optional: FL 11_1
FL 9_3 FL 10_0 FL 9_3
OpenCL Base: 1.2 EB
Optional: 1.2 FP
1.2 EB 1.2 EB 1.2 EB
Architecture Rogue Rogue Rogue Rogue

 

PowerVR Series7 Architecture

From an architectural standpoint Imagination is already starting in a strong position for Series7 with the Rogue architecture. With the Rogue USCs implementing a modern shader pipeline, there’s no innate weakness to the design that requires correction. However as is the case for all SoC GPUs, there is a constant need to deliver better power efficiency and space efficiency, as these are the primary factors limiting overall performance and fabrication improvements alone can’t deliver all of the necessary gains. For this reason Imagination has continued to iterate on the Rogue architecture for Series7 to further improve its efficiency and resulting performance.

Outside of the underlying architecture however, there is also the need to deliver new features to keep up with modern APIs, developer demands, and of course the competitive landscape. In that respect Series6XT is a bit more dated; while it supports OpenGL ES 3.1 its base configuration (by far the most common) does not have the hardware features to support the more extensive Android Extension Pack, and for that matter it also lacks the features necessary to support Direct3D feature level 11. For these reasons Series7 will also be responsible for delivering feature improvements to Imagination’s GPU lineup to keep it up to date with the latest standards.

Looking at the overall architecture then (with an emphasis on 7XT), what we find is still very much Rogue in nature and is called as much by Imagination. While various blocks have been upgraded or overhauled in some manner, there is only a single new block. Available on the base configuration of the 7XT and as an option for the 7XE it is the Tessellation Co-Processor. Exactly as the name describes it, the Tessellation Co-Processor is hardware block responsible for and working in conjunction with the Vertex Data Master to implement full tessellation support. The tessellator itself is fixed function for power efficiency reasons, with hull and domain shading handled through shading hardware. The addition of tessellation hardware along with the standard inclusion of ASTC support are the major functional changes that enable Android Extension Pack support on the base Series7XT over the base Series6XT.

For the other blocks, Imagination has implemented improvements all throughout the architecture. The geometry performance of the Vertex Data Master (geometry frontend) has been doubled to alleviate bottlenecking there. Meanwhile the Compute Data Master has been upgraded as well to allow it to setup wavefronts more quickly (up to 300% faster), which is especially helpful for quickly processing large numbers of small kernels, something Imagination tells us was more common than expected.

Finally the Coarse Grain Scheduler has also been upgraded in conjunction with the USCs. Primarily focusing on reducing inter-tile dependencies, Series7 can now more frequently issue work to idle USCs that in Series6/XT/XE were waiting on other USCs to finish their work before the whole block moved on. With fewer dependencies, idle USCs can now be issued work from other sources or move on to their next tile in a larger number of circumstances.

Diving into the Series7 USC, what we find is again largely similar to Series6XT. The number of FP16 and FP32 ALUs and resulting floating point operation throughput is unchanged, however the Special Function Unit (SFU) has received a pair of changes. First and foremost, the SFU can now natively handle FP16 operations along with FP32 operations, whereas the 6XT SFU would promote everything to FP32. By offering native FP16 execution, Imagination is able to avoid wasting power by not doing unnecessary higher precision work on FP16 data sets. Keep in mind that SFU operations are already relatively expensive, so native FP16 special functions should have a tangible impact on power consumption. Meanwhile though it’s drawn as a single SFU in Imagination’s logical diagrams, I suspect that part of this change is that Imagination has implemented separate FP16 and FP32 SFUs as part of the existing FP16 and FP32 ALU blocks, in which case there are actually 2 SFUs (though just like the ALUs you can only use one at a time).

Speaking of utilizing, the second SFU enhancement has to deal with when it can be used. Starting with Series7, SFU operations can now be co-issued with ALU operations, allowing for both blocks to be used at once as opposed to one or the other on 6XT. Now to be clear here only SFUs can be co-issued, and wavefronts can only use either the FP16 or FP32 ALUs (and not both at once), but there is now a degree of co-issue capability within a USC that was not available before. Imagination tells us that SFUs were coming up in code more than expected, and as a result adding co-issue capabilities would improve performance.

To accomplish this, Imagination has expanded their instruction set to enable co-issue functionality along with further improving performance. New bundled/fused instructions have been added, which are what trigger the co-issued SFU. These fused instructions also allow for certain common sequences that are issued over multiple instructions to instead be issued as a single fused instruction, which in turn reduces code size slightly and potentially allows for these operations to be performed in fewer cycles.

Meanwhile, exclusive to Series7XT is optional support for FP64 operations. If the FP64 is included in the exact 7XT core licensed, each pipeline gets a single FP64 ALU, which allows them to process up to 2 FLOPs/USC/clock.

Finally, while not a graphics feature pre-se, Series7 will be introducing one more feature to the family. A base feature in 7XT and optional to 7XE will be GPU support for hardware security zones, which uses virtualization technology to create up to 8 zones that are fully isolated from each other.

Within the mobile space application sandboxing is already common, and indeed this functionality is already present on a number of CPUs. However in the case of security zones that are only supported on the CPU, the zone separation essentially has to be emulated on the GPU, requiring a full task flush and reload of the entire GPU in order to switch between tasks. Besides not being performant, software enforced security is functionally less robust than hardware enforced security and in turn means the GPU can in theory be used to attack other zones.

Consequently for Series7 Imagination is adding security zone support to their hardware to go along with the security zones already supported with CPUs. From a practical standpoint what we’re looking at is the capability to do better application sandboxing to keep applications from getting out and touching other parts of the system. This is something of a mixed bag for users since sandboxing can be used for both good and evil. Hardware zones can be used to secure certain high-profile applications (banking, health, Apple Pay, etc), but said zones are also responsible for enabling stronger DRM on video content and hardening the system against jailbreaks in cases where direct root access is not allowed by the manufacturer.



Series7XT In Detail

Now that we’ve had a chance to look at the common Series7 architecture, let’s take a look at the features and properties of Series7XT in particular.

Series7XT will be offered with 2 optional feature additions. The first is the aforementioned FP64 ALU, which is being offered as part of what Imagination is calling the HPC Feature Pack. As FP64 operations are not necessary for graphics work (and often even FP16 will do), the FP64 functionality is being offered to customers who want to build HPC hardware out of Series7XT. PowerVR hardware has up until now not been a competitor in the HPC space, so this marks a significant turning point for Imagination and would have them challenging frontrunner NVIDIA in this space. Also of note here, as the base 7XT configuration only supports OpenCL 1.2 Embedded Profile, the HPC pack upgrades 7XT’s OpenCL capabilities to 1.2 Full Profile.

Meanwhile the other optional feature pack for Series7XT is the Direct3D 11 pack, which is primarily geared towards customers who would be building Windows Phone and Windows RT devices. Imagination made Direct3D 11 an optional feature on 6XT, and is doing the same on 7XT. In the case of 6XT the D3D option would have added the necessary tessellation capabilities that are now default on 7XT, so for 7XT this is likely more about D3D features such as S3TC that require additional licensing.

Moving on, for as much as Imagination’s various enhancements ultimately improve performance, really it’s power efficiency that’s driving most of Imagination’s performance gains, and 7XT in turn is designed to further improve on Imagination’s power efficiency. Unfortunately Imagination isn’t throwing out any numbers here – just that 7XT can offer similar performance as 6XT for less power – but on the subject of power efficiency they have documented their efforts to deal with throttling.

To be clear here this is a matter ultimately in the hands of SoC integrators and is not something Imagination can directly control, but as a supplier they can offer advice and suggestions to their customers to improve the experience. Short of making PowerVR GPUs low power in the first place (and this is something everyone in this space tries to do), the next best thing they can do is to encourage customers to be mindful of throttling and to discourage designing their clockspeed governors to be bursty. While the “hurry up and go to sleep” motto makes a lot of sense for CPUs, it makes less sense for GPUs due to the fact that most workloads are sustained. By providing good real-time power usage data to the OS and by discouraging high maximum clockspeeds that lead to burst-and-throttle behavior from governors, for Series7 Imagination is at least trying to ensure that throttling is minimized.

Finally, Imagination has outlined the different configurations that Series7XT will be available in. Starting in 2 cluster configurations, 7XT scales up to 16 cluster configurations, or twice as large as 6XT. 2-4 cluster configurations are expected to be used in phones and TVs, meanwhile 6-8 cluster configurations are expected to be used in tablets, automotive, and ultrabooks. Finally the 16 cluster configuration would be targeted at non-traditional spaces for PowerVR products, such as full notebooks, dedicated (set-top) gaming devices, and servers. With a hefty 512 FP32 ALUs Imagination expects that the 16 cluster configuration should rival lower-end discrete GPUs, which would certainly be the competition for the device categories that Imagination is chasing.

Series7XE In Detail

Moving on, at the other end of the spectrum we have the Series7XE GPUs. These products are the successors to the Series6XE GPUs, and like their predecessors are focused on a narrower feature set for low cost devices, with an emphasis on area efficiency over power efficiency.

Of the Series7 features we’ve covered so far, Series7XE gains access to virtually all of those features. However a larger number of those features are optional and are not in the base configuration. Of note, all of the general enhancements for the frontends and the USC are carried over for the base configuration. However the tessellation block (and hence AEP support) is optional.

As a result 7XE has 4 optional feature packs to build on top of its base OpenGL ES 3.1 functionality. The AEP adds the tessellator and other AEP-centric functionality from 7XT that isn’t in 7XE’s default configuration. Meanwhile the Compression Pack segregates certain compression features from 7XE so that they’re only included in designs that need them (since SoC manufacturers may want to use 3rd party compression technology). HEVC and 10-bit YUV support is also optional for 7XE, and finally the virtualization features we discussed earlier are optional as well.

Since 7XT is targeted at 2 cluster and above configurations, 7XE is designed to cover the 1 and ½ cluster configurations. This results in 2 configurations, the GE7800 which implements a full cluster, and the GE7400 which implements a half-cluster. With the 1 cluster configuration targeted at low-end phones and TVs, the half-cluster configuration will be in the cheapest and simplest devices, along with being a candidate for high-end wearable devices.

Imagination says that at the low clockspeeds they’re envisioning for 7XE wearables, the full load power for the GPU would be under 1W, with low/idle power consumption of course being much lower yet. Any kind of power consumption approaching 1W definitely also approaches a “high-end” niche for wearables, but none the less it is viable if for any reason someone needed to build a wearable device that could handle OpenGL ES 3.1 graphics.



Performance Estimates

Wrapping things up, as Imagination is an IP licenser there isn’t any specific hardware to talk about or benchmark today, but Imagination has provided some performance estimates. As always these should be taken with a grain of salt, but until SoCs are released using the Series7 designs later next year, these are the best estimates we are going to see.

Ultimately the overall performance gains from 6XT to 7XT will depend on the application and the specific design goals of the SoC integrator, with Imagination’s internal data looking positive. At equal clockspeeds and cluster configurations, Imagination is showing performance gains of 30-60% for Series7XT. It should be noted that these numbers do not hold power consumption equal – higher utilization means transistors doing more work and burning energy more often – so real-world performance gains in power/heat limited scenarios would not be as great. But we are told to expect that at equal power levels performance should still be greatly improved over Series6XT, and even more over older Series6 devices.

Meanwhile from a performance perspective Imagination’s numbers paint 7XE as looking even better than 7XT on a relative gain basis. Once more holding core configurations and clockspeeds equal, performance is improved over 6XE by anywhere between 40% and 100%. No official explanation is provided for why 7XE benefits more than 7XT, but as 7XE has fewer USCs it also faces fewer bottlenecks from scaling, which can certainly be an advantage.

Closing Thoughts

All told, from top to bottom Series7 is designed to scale by a factor of 32; from ½ of a cluster up to 16 of them. With such a large range of configurations, Imagination is anticipating being able to cover the entire market with Series7, from wearable devices up to notebooks and even server compute clusters.

Meanwhile Imagination estimates that we should start seeing Series7 appear in retail roughly a year from now. This would be consistent with Series6XT/6XE, which were formally introduced at CES 2014 and hit the consumer market late in Q3 in devices such as the iPhone 6. We should see Series7 devices in roughly the same timeframe and consequently at this point it’s not unreasonable to expect Series7 to appear in Apple’s next SoC, judging from their history with Series6 and Series6XT.

With that said, it will be interesting to see how the SoC GPU market evolves over the next year leading up to Series7. Though very successful in supplying GPU IP to Apple, Imagination has been locked out of a number of potential Android devices due to Qualcomm’s strong position in that market and their own vertical integration. The launch of Android 5.0 “Lollipop” and its 64bit support serve as a potential catalyst for change in the Android hardware ecosystem, and while this is a battle that will initially be fought by Series6XT (as Cortex-A57 and A53 are already available), the transition time and the lead-in time required for SoC development means that Series7 will still play a big part in that.

Ultimately Series7 will be going up against stiff competition from a variety of competitors, so design wins will be hard-fought. In the GPU licensing space Imagination’s principle competition will be ARM and the recently-announced Mali 800 series (which will be available at roughly the same time). Meanwhile integrated SoC designers like Qualcomm and NVIDIA will also have competitive products such as the Adreno 400 series and Maxwell based Erista SoC respectively. So for 2015 as has been the case through 2013 and 2014, the SoC GPU space continues to be a busy and crowded market for Imagination, PowerVR Series7, and its competitors.

Log in

Don't have an account? Sign up now