Original Link: https://www.anandtech.com/show/16210/intels-discrete-gpu-era-begins-intel-launches-xe-max-for-entrylevel-laptops
Intel’s Discrete GPU Era Begins: Intel Launches Iris Xe MAX For Entry-Level Laptops
by Ryan Smith on October 31, 2020 12:01 PM ESTToday may be Halloween, but what Intel is up to is no trick. Almost a year after showing off their alpha silicon, Intel’s first discrete GPU in over two decades has been released and is now shipping in OEM laptops. The first of several planned products using the DG1 GPU, Intel’s initial outing in their new era of discrete graphics is in the laptop space, where today they are launching their Iris Xe MAX graphics solution. Designed to complement Intel’s Xe-LP integrated graphics in their new Tiger Lake CPUs, Xe MAX will be showing up in thin-and-light laptops as an upgraded graphics option, and with a focus on mobile creation.
We’ve been talking about DG1 off and on since CES 2020, where Intel first showed off the chip in laptops and in a stand-alone development card. The company has continuously been coy about the product, but at a high level it’s been clear for some time that this was going to be an entry-level graphics solution suitable for use in smaller laptops. Based heavily on the integrated graphics in Intel’s Tiger Lake-U CPU, the Xe-LP architecture GPU is a decidedly entry-level affair. None the less, it’s an important milestone for Intel: by launching their first DG1-based product, Intel has completed first step in their plans to establish themselves as a major competitor in the discrete GPU space.
Sizing up Intel’s first dGPU in a generation, Intel has certainly made some interesting choices with the chip and what markets to pursue. As previously mentioned, the chip is based heavily on Tiger Lake-U’s iGPU – so much so that it has virtually the same hardware, from EUs to media encoder blocks. As a result, Xe MAX isn’t as much a bigger Xe-LP graphics solution so much as it is an additional discrete version of the Tiger Lake iGPU. Which in turn has significant ramifications in the performance expectations for the chip, and how Intel is going about positioning it.
To cut right to the chase on an important question for our more technical readers, Intel has not developed any kind of multi-GPU rendering technology that allows for multiple GPUs to be used together for a single graphics task (ala NVIDIA’s SLI or AMD’s CrossFire). So there is no way to combine a Tiger Lake-U iGPU with Xe MAX and double your DOTA framerate, for example. Functionally, Xe MAX is closer to a graphics co-processor – literally a second GPU in the system.
As a result, Intel isn’t seriously positioning Xe MAX as a gaming solution – in fact I’m a little hesitant to even attach the word “graphics” to Xe MAX, since Intel’s ideal use cases don’t involve traditional rendering tasks. Instead, intel is primarily pitching Xe MAX as an upgrade option for mobile content creation; an additional processor to help with video encoding and other tasks that leverage GPU-accelerated computing. This would be things like Handbrake, Topaz’s Gigapixel AI image upsampling software, and other such productivity/creation tasks. This is a very different tack than I suspect a lot of people were envisioning, but as we’ll see, it’s the route that makes the most sense for Intel given what Xe MAX can (and can’t) do.
At any rate, as an entry-level solution Xe MAX is being setup to compete with NVIDIA’s last-generation entry-level solution, the MX350. Competing with the MX350 is a decidedly unglamorous task for Intel’s first discrete graphics accelerator, but it’s an accurate reflection of Xe MAX’s performance capabilities as an entry-level part, as well as a ripe target since MX350 is based on last-generation NVIDIA technology. NVIDIA shouldn’t feel too threatened since they also have the more powerful MX450, but Xe MAX has a chance to at least dent NVIDIA’s near-absolute mobile marketshare by going after the very bottom of it. And, looking at the bigger picture here for Intel’s dGPU efforts, Intel needs to walk before they can run.
Finally, as mentioned previously, today is Xe MAX’s official launch. Intel has partnered with Acer, ASUS, and Dell for the first three laptops, most of which were revealed early by their respective manufacturers. These laptops will go on sale this month, and the fact that today’s launch was timed to align with midnight on November 1st in China offers a big hint of what to expect. Intel’s partners will be offering Xe MAX laptops in China and North America, but given China’s traditional status as the larger, more important market for entry-level hardware, don’t be too surprised if that’s where most Xe MAX laptops end up selling, and where Intel puts its significant marketing muscle.
Intel GPU Specification Comparison | ||||||
Iris Xe MAX dGPU |
Tiger Lake iGPU |
Ice Lake iGPU |
Kaby Lake iGPU |
|||
ALUs | 768 (96 EUs) |
768 (96 EUs) |
512 (64 EUs) |
192 (24 EUs) |
||
Texture Units | 48 | 48 | 32 | 12 | ||
ROPs | 24 | 24 | 16 | 8 | ||
Peak Clock | 1650MHz | 1350MHz | 1100MHz | 1150MHz | ||
Throughput (FP32) | 2.46 TFLOPs | 2.1 TFLOPs | 1.13 TFLOPs | 0.44 TFLOPs | ||
Geometry Rate (Prim/Clock) |
2 | 2 | 1 | 1 | ||
Memory Clock | LPDDR4X-4266 | LPDDR4X-4266 | LPDDR4X-3733 | DDR4-2133 | ||
Memory Bus Width | 128-bit | 128-bit (IMC) |
128-bit (IMC) |
128-bit (IMC) |
||
VRAM | 4GB | Shared | Shared | Shared | ||
TDP | ~25W | Shared | Shared | Shared | ||
Manufacturing Process | Intel 10nm SuperFin | Intel 10nm SuperFin | Intel 10nm | Intel 14nm+ | ||
Architecture | Xe-LP | Xe-LP | Gen11 | Gen9.5 | ||
GPU | DG1 | Tiger Lake Integrated |
Ice Lake Integrated | Kaby Lake Integrated | ||
Launch Date | 11/2020 | 09/2020 | 09/2019 | 01//2017 |
Kicking off the deep dive portion of today’s launch, let’s take a look at the specs for the Xe MAX. As previously mentioned, Xe MAX is derived from Tiger Lake’s iGPU, and this is especially obvious when looking at the GPUs side-by-side. Xe-LP as an architecture was designed to scale up to 96 EUs; Intel put 96 EUs in Tiger Lake, and so a full DG1 GPU (and thus Xe MAX) gets 96 EUs as well.
In fact Xe MAX is pretty much Tiger Lake’s iGPU in almost every way. On top of the identical graphics/compute hardware, the underlying DG1 GPU contains the same two Xe-LP media encode blocks, the same 128-bit memory controller, and the same display controller. Intel didn’t even bother to take out the video decode blocks, so DG1/Xe MAX can do H.264/H.265/AV1 decoding, which admittedly is handy for doing on-chip video transcoding.
And, to be sure, Intel has confirmed that DG1 is a real, purpose-built discrete GPU. So Xe MAX is not based on salvaged Tiger Lake CPUs or the like; Intel is minting discrete GPUs just for the task. As is usually the case, Intel is not disclosing die sizes or transistor counts for DG1. Our own best guess for the die size is an incredibly rough 72mm2, and this is based on looking at how much of Tiger Lake-U’s 144mm2 die is estimated to occupied by GPU blocks. In reality, this is probably an underestimate, but even so, it’s clear that DG1 is a rather petite GPU, thanks in part to the fact that it’s made on Intel’s 10nm SuperFin process.
Overall, given the hardware similarities, the performance advantage that Xe MAX has over Tiger Lake’s iGPU is that the discrete adapter gets a higher clockspeed. Xe MAX can boost to 1.65GHz, whereas the fastest Tiger Lake-U SKUs can only turbo to 1.35GHz. That means all things held equal, the discrete adapter has a 22% compute and rasterization throughput advantage on paper. But since we’re talking about laptops, TDPs and heat management are going to play a huge role in how things actually work.
Meanwhile the fact that Xe MAX gets Tiger Lake’s memory controller makes for an interesting first for a discrete GPU: this is the first stand-alone GPU with LPDDR4X support. Intel’s partners will be hooking up 4GB of LPDDR4X-4266 to the GPU, which with its 128-bit memory bus will give it a total memory bandwidth of 68GB/sec. Traditionally, entry-level mobile dGPUs use regular DDR or GDDR memory, with the latter offering a lot of bandwidth even on narrow memory buses, but neither being very energy efficient. So it will be interesting to see how Xe MAX’s total memory power consumption compares to the likes of the GDDR5/64-bit MX350.
As an added bonus on the memory front, because this is a discrete GPU, Xe MAX doesn’t have to share its memory bandwidth with other devices. The GPU gets all 68GB/sec to itself, which should improve real-world performance.
And since Xe MAX is a discrete adapter, it also gets its own power budget. The part is nominally 25W, but like TDPs for Intel’s Tiger Lake CPUs, it’s something of an arbitrary value; in reality the chip has as much power and thermal headroom to play with as the OEMs grant it. So an especially thin-and-light device may not have the cooling capacity to support a sustained 25W, and other devices may exceed that during turbo time. Overall we’re not expecting any more clarity here than Intel and its OEMs have offered with Tiger Lake TDPs.
Last but not least, let’s talk about I/O. As an entry-level discrete GPU, Xe MAX connects to its host processor over the PCIe bus; Intel isn’t using any kind of proprietary solution here. The GPU’s PCIe controller is fairly narrow with just a x4 connection, but it supports PCIe 4.0, so on the whole it should have more than enough PCIe bandwidth for its performance level.
Meanwhile the part offers a full display controller block as well, meaning it can drive 4 displays over HDMI 2.0b and DisplayPort 1.4a at up to 8K resolutions. That said, based on Intel’s descriptions it sounds like most (if not all) laptops are going to be taking an Optimus route, and using Tiger Lake’s iGPU to handle driving any displays. So I’m not expecting to see any laptops where Xe MAX’s display outputs are directly wired up.
Intel’s Vision for Sharing Work: Deep Link & Additive Ai
As mentioned towards the start of this article, Intel is taking an interesting tack with Xe MAX. From a graphics standpoint, the company has not developed a multi-GPU solution to combine the rendering power of Xe MAX with Tiger Lake’s iGPU. As a result, Xe MAX is not significantly more powerful than Tiger Lake’s iGPU for 3D rendering/gaming tasks, greatly limiting the utility of Xe MAX for gaming purposes.
Although this wipes out the obvious route for using Xe MAX to augment Intel’s iGPU – and thus make Xe MAX a significant upgrade for graphics purposes – on the whole it’s a decision that makes sense for Intel. Multi-GPU graphics is hard, and it’s only getting harder. Even NVIDIA, with all of its experience in the field, has essentially pulled out as of their latest generation of hardware, thanks to rendering techniques getting less and less multi-GPU friendly. So what chance would Intel have, especially with such low-end hardware? Probably not much.
But that doesn’t mean that Xe MAX doesn’t have a purpose. Even if it can’t be used to help with any single task/thread, it can still handle additional tasks/threads, essentially having it function as a co-processor to offload tasks to or to spin up extra tasks on. This is a use case that professional-grade video cards and associated software have supported for a number of years, and it’s the same route Intel is taking with Xe MAX.
This kind of functionality is a core part of what Intel is terming its Deep Link technology, which is their umbrella name for all of the technologies backing and abilities sprouting from using Intel’s CPU and dGPU together. In practice, Deep Link is Intel’s software and firmware stack for Xe MAX, ranging from how they’re balancing TDP allocations between the CPU and GPU, out to how they present the additional processing blocks from an Xe MAX GPU to software so that it can easily use them. There is no real hardware magic here – as previously mentioned, Intel is using a standard PCIe 4.0 x4 link back to the CPU – but the company sees the synergy between their CPUs and Xe MAX as being a defining factor of the graphics solution – and why customers would want it.
Arguably the most critical part of Deep Link is what Intel is terming “Additive Ai”, which is the ability to use the iGPU and dGPU together within a program. As previously mentioned, Intel’s focus here is on enabling developers to use Xe MAX for additional workloads. Among other things, Intel’s examples have included using Xe MAX’s compute resources for batch processing images in Gigapixel AI, and using the chip’s video encode blocks to increase the number of video streams that can be simultaneously encoded.
This sort of batch-focused software is the ideal use case for Xe MAX. If a task can be broken down into multiple independent pieces, then it can easily be farmed out to both GPUs simultaneously – and thus justifying adding Xe MAX to the mix rather than just relying on Tiger Lake’s iGPU.
As for what software can use these capabilities, conceptually any software that can handle issuing work to multiple GPUs is in a good place. Even if it can’t handle Xe MAX out of the box, it should take very little work to get it seeing multiple Intel GPUs. Otherwise, this is where Intel’s control of the software stack should be an advantage, as it gives them opportunities to abstract certain parts of the multi-GPU equation from software developers. Though at the end of the day, that software still needs to be able to issue independent workloads to properly make use of Xe MAX.
The need for independent workloads and batch processing, in turn, is why Intel is focusing on what they term “mobile creation” workloads. These tasks aren’t typically processed in real-time, and broadly speaking have the greatest overlap with what the Xe MAX hardware can do. So although Xe MAX isn’t especially useful as stand-alone graphics adapter, Intel sees it as an excellent accelerator.
Overall, Intel is still in the early days of software support for Deep Link and Xe MAX. The company is working with software developers to get multi-GPU support added to more software down the line, so that more programs can take advantage of farming work out to Xe MAX. Along with getting more batch-style software enabled, the company is also working to enable Xe MAX to help with large, single-stream video encoding. Since video encoding is not hard-bound to being a serial task, Intel is looking at ways to split up a large encoding job so that each Xe encode block gets a chunk of the video to work on, a similar process to how multi-core CPU encoding works today. For now, Intel is targeting the first half of next year.
Sharing Power: Extending Adaptix to dGPUs
Along with sharing work, Deep Link also encompasses Intel’s technology for sharing/allocating power between their CPUs and Xe Max. Intel calls this Dynamic Power Share, and its an extension of their Adaptix power management technology, which the company has offered since Ice Lake.
Intel’s Adaptix is a suite of technologies that includes Dynamic Tuning 2.0, which implements DVFS feedback loops on top of supposedly AI-trained algorithms to help the system deliver power to the parts of the processor that need it most, such as CPU, GPU, interconnect, or accelerators. With Adaptix enabled, the idea is that the power can be more intelligently managed, giving a longer turbo profile, as well as a better all-core extended turbo where the chassis is capable.
Intel already uses Adaptix to allocate power between their CPU cores and iGPU, among other blocks, so extending it to include Xe MAX is a natural (and necessary) extension of the technology. According to Intel, the company has also learned a great deal from their previous dGPU-style effort, Kaby Lake-G and its on-chip AMD dGPU, which they have taken into account when extending Adaptix for Xe MAX.
Like Adaptix for CPUs, just how well this feature is used is going to be largely in OEM hands. Intel provides the tools, but it’s up to OEMs to set their various tuning values and plan for how that interacts with the power delivery and cooling capabilities of a laptop. But with Intel starting small on Xe MAX’s rollout – there are only 3 laptops shipping this year – hopefully it means Intel has been able to give the OEMs and the devices an appropriate level of attention.
Ultimately, Intel considers Adaptix/Dynamic Power Share to be another software-driven advantage for their gear. From a competitive standpoint the company believes that their tech does a better job of power management than how MX350-enabled laptops handle power allocations – particularly, that Xe MAX laptops don't have to permanently and continually reserve thermal and power headroom for the dGPU – and thus can unlock more performance even in CPU-limited workloads. That said, it's a bit of a dubious (or at least, non-intuitive) claim, as laptops have been able to shut off dGPUs for years now. But, as is often the case with power-saving features, how well any of this is tuned in shipping system is up to the OEMs – and Intel says that they've found that most systems in this class with (rival) dGPUs aren't allocating the CPU its full headroom.
A Word on Gaming Performance
Since Intel lacks a way to combine multiple GPUs for a single rendering/gaming task, the company is not really pushing Xe MAX as a gaming solution for obvious reasons. Nonetheless, on paper Xe MAX should be faster than Tiger Lake-U integrated graphics by around 20% thanks to the discrete adapter’s higher clockspeeds, so there are potential advantages to gaming on Xe MAX. So it’s something that Intel is making sure to support all the same.
The final pillar for Intel’s software stack, Xe MAX’s drivers include an arbiter of sorts to help direct games to use the correct GPU. The “correct” GPU in this case is often – but not always – the Xe MAX GPU. But in a surprising (and welcome) bit of transparency from Intel, the company admits that in some scenarios Tiger Lake’s iGPU may outperform Xe MAX, and as a result those games shouldn’t run on Xe MAX. So the arbiter’s job is to direct a game to use whatever Intel’s software has deemed the best choice for a given game, be it the iGPU or the dGPU.
This is another case where Intel will be providing a degree of abstraction, ideally hiding all of this from a game developer. Unless a game specifically goes ahead and implements support to detect and select from multiple GPUs, then Intel’s drivers should pick the right GPU for a game.
Functionally, all this sounds very close to how NVIDIA’s Optimus technology works, just with an added wrinkle of purposely sending some games to the iGPU rather than favoring the dGPU for all games. Now that Intel has mobile dGPUs they need a way to manage their use, and this is it. Plus Intel’s long-term plans of course call for more powerful Xe-HPG GPUs, so getting their GPU switching tech out and debugged now is going to benefit them in the long run.
As for performance expectations, with Xe MAX’s higher clockspeeds, Intel is promoting Xe MAX as being generally performance competitive with MX350. Mind you, Intel isn’t aiming to set a very high bar here, but Xe MAX should at least be good for 1080p gaming (most of the time).
Launch Laptops: Acer, ASUS, & Dell
Last but not least, let’s take a look at the first laptops that will be shipping with Xe MAX graphics. Intel is starting things off with a relatively small number of laptops, with Acer, ASUS, and Dell all set to release their Xe MAX-equipped notebooks in November. These are the Acer Swift 3X, the ASUS VivoBook TP470, and the Dell Inspiron 15 7000 2in1.
All three laptops generally fit the thin-and-light paradigm that Intel is pushing with Xe MAX. The Swift 3x is a 14-inch laptop at 3lbs, and the VivoBook Flip TP470 is 14-inches as well at a slightly heavier 3.3lbs. Finally, Dell’s Inspiron is a 15-inch convertible notebook that weighs around 4lbs. All of these notebooks come with high-end versions of Intel’s Tiger Lake-U SoCs using G7-class iGPUs.
At this point we’re still waiting for pricing info on the complete set of laptops. With this being a major Intel launch Intel is going to want to put their best foot forward – and will likely eat most of the marketing costs in the process – though at the same time the company is looking to sell Xe MAX-equipped laptops as premium notebooks, so there is a careful balance to be had.
Officially, Xe MAX is launching today. However it’s not immediately clear whether any of these laptops are actually going to be available right away, or if they’re going to show up later in the month. So it may be a couple of weeks until there’s actual retail availability. On which note, as far as regional distribution goes, the Acer laptop will be China-only, the ASUS laptop will be sold in both China and North America, while the Dell will be North America-only (sold via Best Buy).
Overall, the launch of Intel’s Xe MAX graphics and the DG1 GPU is an important day for Intel, but this is also a launch that strikes me as Intel having modest expectations. Xe MAX is only being launched in a small number of laptops for now, and Intel is not seriously chasing the gaming market with their first discrete laptop part. None the less, it will be interesting to see what kind of traction Intel can get as a new player in the market, especially with their focus on mobile creation and selling Xe MAX as an accelerator for productivity and content creation tasks. No matter what, Xe MAX will be something that bears keeping an eye on.