I hope someone asks about LPDDR5 in the Q&A. How critical is that to Xe-LP Performance? Have Intel's internal benchmarks been on LP5 or LP4X?
If you look at Comet Lake, very slow LPDDR4X uptake. Why does Intel think partners will re-refresh their laptops for LPDDR5, when most refused to do so for Comet Lake? Or are we going to see a delay in Tiger Lake launches until LPDDR5 variants are released?
Fair enough. I can't find many announcements, but Samsung is supposed to begin their 3rd-gen 1z node for LPDDR5-6400, meanwhile Tiger Lake only supports LPDDR5-5500. Perhaps laptops will get the 2nd-gen 1y node's LPDDR4-5500?
Date for TGL-U/Y is in 2 weeks. Dont think that will change. I think LPDDR5 will be niche and will probably used only in high end ultrabooks like Surface book or XPS or Spectre or Thinkpad. Otherwise we will see laptops with LPDDR4x or low end with DDR4.
Most Ice Lake laptops came later, IIRC. The original launch was with a limited number of devices.
I'd be more prepared to wager that the majority of devices won't have it at all, and once again we'll be left in that situation where you don't know how a given device will perform until you see a review.
Xe HPG is on an external process, apparently on TSMC's 6nm process. Since Intel delayed their 7nm process I was expecting them to use TSMC's 7nm. I feel like Intel have been working on the Xe-HPG in secret and were planning on releasing it on their own 7nm in 2021 but when that got delayed they went with TSMC's 7nm but with everyone turning to TSMC's 7nm, TSMC could only offer decent capacity at 6nm. This is why Intel disclosed the Xe-HPG now because they have secured a deal with TSMC. The Xe GPU Architecture and the Micro-Architectures IP are not tied to a particular process node and can be ported to any internal or external process.
Can we expect RT in tiger lake Xe LP or DG1? Question to Ian is the 16MiB L3 cache including 12MiB from core cache? If it's exclusive to GPU that would be crazy.
So much die space required for marginally better performance than AMD's absolutely freaking MINISCULE Vega 8 7nm iGPU block... What a disappointment... Are we REALLY supposed to be impressed that they finally caught up to AMD's years old GPU IP while needing 2-3x the die space to get there?
comparing Tiger Lake to Raven Ridge doesn't even make the slightest sense, should've compare it to Renoir, it's direct competitor.
Renoir die size are measured around 156 mm², which the GPU take a small size, but I couldn't get the information on how big the GPU block is. you could check it here https://i.imgur.com/fxRNWcM.jpg
I measured it at around 16% of the die (that particular shot seems to leave out a lot of the GPU in its markup, suggesting it's only ~9%, which is silly) - that works out at ~25mm².
As a quick sanity check, based on AMD's own statement about "-61% area" from Raven Ridge to Renoir, if we go with edzieba's estimate of around 63mm² for the 11CU 14nm GPU then we get a result of 24.57mm² for the 7nm 8CU GPU - so it seems in the right ballpark.
Ice Lake G7 averages about 66% of Renoir + Vega 8's performance in real titles. If Intel's prediction of a 2x performance increase from Ice Lake G11 to Tiger Lake Xe is borne out, then they'll have a 33% performance advantage over Renoir at the cost of 55% greater GPU die area.
I'm confident in assuming that the numbers Intel are giving out are based on an LPDDR5 design too (why wouldn't they be?), so they're also factoring in a bandwidth advantage to get that result. It's not dismal by any means, but it's not especially thrilling.
I wouldn't call it gratuitous per se - you have to be either not paying attention at all or deliberately trying to stink up the thread in order to compare completely the wrong things in order to "prove" someone else wrong. Quite personal, though.
It is not fair to Chipzilla, but graphic progress using bolt-on EUs has been a losing proposition for a long time. Xe is a simply faster, re-ordered and expanded EUs in an HD Graphics container. Xe is something like 768:96:16 --- see GT1 to GT4!
It is still 128-bit wide FPU /EU executing 8 16-bit or 4 32-bit operations per clock cycle with the re-ordered SIM blocks. It may be functional, but it is still a bolt-on (with a new double ring bus!)
Marginally better performance, did you see reviews? And also there is no 2-3x space difference, this is nonsense. You are unaware that (initial) Tigerlake and Renoir both use LPDDR4x-4266 at best which makes it super hard to scale iGPU performance up linearly, it's almost impossible. Furthermore AMDs design has fewer units but on the same time a much higher iGPU clock speed. i7-1165G7 Xe can go to up to 1.3 Ghz while Ryzen 4800U can go up to 1.75 Ghz. At 1.75 Ghz Vega 8 can do 1.79 Tflop FP32, a theroetical Xe LP 1.75 Ghz could do 2.69 Tflop, so the compute units on Tigerlake are much more powerful. Even if iGPU Vega would have more units it would struggle to hold this clocks speed in a low power environment and would struggle with the bandwidth feeding the additional units.
I assume what you call a tile and what they call a tile are two different things. There are 16 EUs per compute subslice. The number of compute subslices per compute slice is variable, but at least one of their chips has 96 EUs in a compute slice. So perhaps they are extending SIMD width across subslices within a compute slice, perhaps to the entire compute slice, which would be 768 SIMD lanes for that 96 EU compute slice. All of that happens within one tile, as what they call a tile is an element connected to other elements by their EMIB technology. Although I didn't notice it stated explicitly, perhaps one tile can be comprised of more than one compute slice.
We’ve updated our terms. By continuing to use the site and/or by logging into your account, you agree to the Site’s updated Terms of Use and Privacy Policy.
28 Comments
Back to Article
ikjadoon - Monday, August 17, 2020 - link
I hope someone asks about LPDDR5 in the Q&A. How critical is that to Xe-LP Performance? Have Intel's internal benchmarks been on LP5 or LP4X?If you look at Comet Lake, very slow LPDDR4X uptake. Why does Intel think partners will re-refresh their laptops for LPDDR5, when most refused to do so for Comet Lake? Or are we going to see a delay in Tiger Lake launches until LPDDR5 variants are released?
IntelUser2000 - Monday, August 17, 2020 - link
Isn't it obvious? Cometlake uses the HD 620 graphics, which works fine with LPDDR3. LPDDR4X just raises cost and complexity.Tigerlake with 4x+ performance can take advantage of it.
ikjadoon - Monday, August 17, 2020 - link
>LPDDR4X just raises cost and complexity.I didn't tell Intel to backport LPDDR4X. That was Intel's own foolish choice.
KimGitz - Monday, August 17, 2020 - link
I'm hoping LPDDR5 will show up in Q1 2021 along with the 6-8 cores 45-65W Tiger Lake H....ikjadoon - Tuesday, August 18, 2020 - link
Fair enough. I can't find many announcements, but Samsung is supposed to begin their 3rd-gen 1z node for LPDDR5-6400, meanwhile Tiger Lake only supports LPDDR5-5500. Perhaps laptops will get the 2nd-gen 1y node's LPDDR4-5500?Anandtech article a few months ago: https://www.anandtech.com/show/15547/samsung-start...
trivik12 - Monday, August 17, 2020 - link
Date for TGL-U/Y is in 2 weeks. Dont think that will change. I think LPDDR5 will be niche and will probably used only in high end ultrabooks like Surface book or XPS or Spectre or Thinkpad. Otherwise we will see laptops with LPDDR4x or low end with DDR4.ikjadoon - Tuesday, August 18, 2020 - link
That's true, but I do recall some Ice Lake laptops not being announced until CES 2020, i.e., four months later.But I agree w/ your assessment: perhaps the higher-end models will just delay TGL models until LPDDR5 capacity is high enough.
Spunjji - Tuesday, August 18, 2020 - link
Most Ice Lake laptops came later, IIRC. The original launch was with a limited number of devices.I'd be more prepared to wager that the majority of devices won't have it at all, and once again we'll be left in that situation where you don't know how a given device will perform until you see a review.
KimGitz - Monday, August 17, 2020 - link
Xe HPG is on an external process, apparently on TSMC's 6nm process. Since Intel delayed their 7nm process I was expecting them to use TSMC's 7nm. I feel like Intel have been working on the Xe-HPG in secret and were planning on releasing it on their own 7nm in 2021 but when that got delayed they went with TSMC's 7nm but with everyone turning to TSMC's 7nm, TSMC could only offer decent capacity at 6nm. This is why Intel disclosed the Xe-HPG now because they have secured a deal with TSMC. The Xe GPU Architecture and the Micro-Architectures IP are not tied to a particular process node and can be ported to any internal or external process.JayNor - Tuesday, August 18, 2020 - link
HPG also back in the lab already, so arrangement with the external fab must have been some time ago.RedOnlyFan - Tuesday, August 18, 2020 - link
Can we expect RT in tiger lake Xe LP or DG1?Question to Ian is the 16MiB L3 cache including 12MiB from core cache?
If it's exclusive to GPU that would be crazy.
yeeeeman - Tuesday, August 18, 2020 - link
There is no point in supporting RT on lower end cards...Cooe - Tuesday, August 18, 2020 - link
No. There is no RT on ANY Xe LP product.Cooe - Tuesday, August 18, 2020 - link
So much die space required for marginally better performance than AMD's absolutely freaking MINISCULE Vega 8 7nm iGPU block... What a disappointment... Are we REALLY supposed to be impressed that they finally caught up to AMD's years old GPU IP while needing 2-3x the die space to get there?edzieba - Tuesday, August 18, 2020 - link
I make out the die to be ~30% GPU in that on-slide die-shot. Tiger Lake is 146mm^2, so the GPU is ~44mm^2.Using the Wikichip dieshot for Raven Ridge, the GPU occupies also 30% of the 210mm^2 die, or ~63mm^2
So, in summary: No.
Fulljack - Tuesday, August 18, 2020 - link
comparing Tiger Lake to Raven Ridge doesn't even make the slightest sense, should've compare it to Renoir, it's direct competitor.Renoir die size are measured around 156 mm², which the GPU take a small size, but I couldn't get the information on how big the GPU block is. you could check it here https://i.imgur.com/fxRNWcM.jpg
Spunjji - Friday, August 21, 2020 - link
I measured it at around 16% of the die (that particular shot seems to leave out a lot of the GPU in its markup, suggesting it's only ~9%, which is silly) - that works out at ~25mm².As a quick sanity check, based on AMD's own statement about "-61% area" from Raven Ridge to Renoir, if we go with edzieba's estimate of around 63mm² for the 11CU 14nm GPU then we get a result of 24.57mm² for the 7nm 8CU GPU - so it seems in the right ballpark.
Ice Lake G7 averages about 66% of Renoir + Vega 8's performance in real titles. If Intel's prediction of a 2x performance increase from Ice Lake G11 to Tiger Lake Xe is borne out, then they'll have a 33% performance advantage over Renoir at the cost of 55% greater GPU die area.
I'm confident in assuming that the numbers Intel are giving out are based on an LPDDR5 design too (why wouldn't they be?), so they're also factoring in a bandwidth advantage to get that result. It's not dismal by any means, but it's not especially thrilling.
Cooe - Tuesday, August 18, 2020 - link
We're talking about Renoir dumbass, not Raven Ridge. They shrunk their iGPU MASSIVELY for Renoir (not just be cutting the CU count either).Arbie - Wednesday, August 19, 2020 - link
Gratuitous insults don't help the forum.Spunjji - Friday, August 21, 2020 - link
I wouldn't call it gratuitous per se - you have to be either not paying attention at all or deliberately trying to stink up the thread in order to compare completely the wrong things in order to "prove" someone else wrong. Quite personal, though.Smell This - Tuesday, August 18, 2020 - link
It is not fair to Chipzilla, but graphic progress using bolt-on EUs has been a losing proposition for a long time. Xe is a simply faster, re-ordered and expanded EUs in an HD Graphics container. Xe is something like 768:96:16 --- see GT1 to GT4!
It is still 128-bit wide FPU /EU executing 8 16-bit or 4 32-bit operations per clock cycle with the re-ordered SIM blocks. It may be functional, but it is still a bolt-on (with a new double ring bus!)
mikk - Saturday, August 22, 2020 - link
Marginally better performance, did you see reviews? And also there is no 2-3x space difference, this is nonsense. You are unaware that (initial) Tigerlake and Renoir both use LPDDR4x-4266 at best which makes it super hard to scale iGPU performance up linearly, it's almost impossible. Furthermore AMDs design has fewer units but on the same time a much higher iGPU clock speed. i7-1165G7 Xe can go to up to 1.3 Ghz while Ryzen 4800U can go up to 1.75 Ghz. At 1.75 Ghz Vega 8 can do 1.79 Tflop FP32, a theroetical Xe LP 1.75 Ghz could do 2.69 Tflop, so the compute units on Tigerlake are much more powerful. Even if iGPU Vega would have more units it would struggle to hold this clocks speed in a low power environment and would struggle with the bandwidth feeding the additional units.AbRASiON - Tuesday, August 18, 2020 - link
I just want laptop with dock that can do 2x4k at a good reliable windows frame rate, playing video, moving windows around ETcJayNor - Tuesday, August 18, 2020 - link
"08:38PM EDT - Add new capabilities - matrix tensors, ray tracing, virtualization, etc"Any extra info on virtualization? What was added to support that?
JayNor - Tuesday, August 18, 2020 - link
Is the Xe memory fabric PCIE version 5.0?The XeLink description appears to be a central point for communicating with the system. Is it also being used for connection to the two per node CPUs?
JayNor - Wednesday, August 19, 2020 - link
"08:38PM EDT - Goals: increase SIMD lanes from 10s to 1000s""08:42PM EDT - 16 EUs = 128 SIMD lanes"
So, are they extending SIMD width across multiple tiles?
Yojimbo - Wednesday, August 19, 2020 - link
I assume what you call a tile and what they call a tile are two different things. There are 16 EUs per compute subslice. The number of compute subslices per compute slice is variable, but at least one of their chips has 96 EUs in a compute slice. So perhaps they are extending SIMD width across subslices within a compute slice, perhaps to the entire compute slice, which would be 768 SIMD lanes for that 96 EU compute slice. All of that happens within one tile, as what they call a tile is an element connected to other elements by their EMIB technology. Although I didn't notice it stated explicitly, perhaps one tile can be comprised of more than one compute slice.JayNor - Wednesday, August 19, 2020 - link
Not explained what single GPU implies, but perhaps this implies a simd instruction can be executed across multiple tiles."08:47PM EDT - Mutliple tiles work as separate GPUs or a single GPU"