Original Link: https://www.anandtech.com/show/3887/nvidia-400m-dx11-top-to-bottom
NVIDIA 400M: DX11 Top to Bottom Solutions Now Available
by Jarred Walton on September 3, 2010 12:02 AM ESTIntroducing the GeForce 400M Family
Back in May, NVIDIA surprised us by announcing their first mobile DX11 GPU, the GTX 480M. What was surprising is that they were using a full GF100 chip, only harvested and downclocked relative to the desktop GPUs. In fact, GTX 465M would have been a more accurate name, as the 480M shipped with the same number of cores as the desktop GTX 465. Power requirements were understandably quite high (100W), but there's no arguing that the 480M is now the fastest mobile GPU on the block. Whether it's worth the price of admission is another story, of course, which segues nicely into today's announcement.
NVIDIA is filling out the rest of their mobile lineup with a slew of new chips. What they're not telling is precisely which core the chips are using, so potentially there will be some overlap with harvesting going on (the 445M in particular looks like it will use two different chips). NVIDIA also didn't give us any figures for power requirements, though Optimus Technology means that when paired with and IGP-enabled CPU they can "idle" at 0W. Anyway, here's what we do know, starting with the high-end offerings. (We've split the other parts out on the next page to keep our tables manageable.)
NVIDIA High-End 400M Specifications | |||
GeForce GTX 480M | GeForce GTX 470M | GeForce GTX 460M | |
Codename | GF100 | GF104 | GF106 |
CUDA Cores | 352 | 288 | 192 |
Graphics Clock (MHz) | 425 | 535 | 675 |
Processor Clock (MHz) | 850 | 1070 | 1350 |
Memory Clock (MHZ) | 1200 | 1250 | 1250 |
Standard Memory Configuration | GDDR5 | GDDR5 | GDDR5 |
Memory Interface Width | 256-bit | 192-bit | 192-bit |
Memory Bandwidth (GB/sec) | 76.8 | 60 | 60 |
SLI Ready | Yes | Yes | Yes |
We eliminated several rows of supported features, which we'll summarize here: all of the 400M CPUs, from the lowly 415M up to the top 480M, include support for DX11, OpenGL 4.0, PhysX, Optimus, CUDA, DirectCompute, OpenCL, H.264/VC1/MPEG2 1080p video decoding, and full spec Blu-ray decode. They also support the HDMI 1.4a spec, so hopefully that means all the new cards will include 1.4a ports; now we just need 1.4a HDMI displays to go along with the GPUs.
The more interesting specs are the number of CUDA cores in the various models, which allow us to make guesses as to the base chip. (Update: NVIDIA also included images of the chips, though it looks like they used the same image for many chips and just changed the log via Photoshop, so we have a pretty good idea of what's going on. We have updated the tables after looking at the images, as one reader suggested we do.) We already know 480M uses a harvested GF100. The GF104 was introduced on the desktop with the GTX 460, and it contains up to 384 CUDA cores—which potentially means the 480M could switch to the GF104 as well. Anyway, the 470M will use GF104, and perhaps a new revision of the 480M will make the switch as well. In the past, NVIDIA has chopped off about half of their halo product for the next level GPUs, and then half of that again for the lower midrange parts, and finally one third/fourth of that for the entry-level parts. Thus, GT200 had up to 240 cores, GT215 had 128, GT216 48, and GT218 came with a lowly 16 cores. Right now, it looks like we don't have that final cut yet, so perhaps we'll see a G 410M at some point in the future.
The good news is that with 400M, we get roughly twice as many cores at every level compared to the previous generation 200M/300M parts, but typically slightly lower clocks. Theoretical computational power is nearly double, but the catch is that our testing of the desktop GTX 480 suggests that clock-for-clock, GF100 cores aren't as potent as GT200 cores. So looking at clocks and core counts, GTX 480 has 90% more computational power available relative to GTX 285, but in actual games it's more like 50% faster—though memory bandwidth and other areas also come into play. Even with that said, here's how things break down in the various performance segments.
At the very top, we've gone from 285M with 128 cores at 1500MHz to 480M with 352 cores at 850MHz. That represents a computational power increase of about 55%, but memory bandwidth is relatively close—only 18% higher. In our testing 480M beat 285M by around 20%, so the computational power isn't likely the bottleneck and memory bandwidth is playing a major role. What we'd like to see is a shift to the smaller (and presumably less power hungry) GF104 while still keeping the same specs, but perhaps that's not possible. Either way, 480M is the mobile performance champion but with a 100W TDP it's also very hot and will only be found in larger notebooks.
The next step down gives us 470M, which replace GTX 260M. The 260M had a TDP of around 55W (75W max, but that was more for the 285M), so presumably the 470M will target a similar power envelope. Core count at the top goes from 112 at 1375MHz in up to 288 at 1070MHz, an increase of 100%. As we saw with 285M and 480M, however, memory bandwidth may be the bigger factor; here the 260M and 470M are equal (60GB/s vs. 60.8GB/s), so it will be interesting to see how performance plays out. It's also very possible that future games will be able to stress shaders more than memory bandwidth and thus show greater performance improvements.
The 460M replaces the GTS 360M and GTS 350M, neither of which saw much use in notebooks. (We'll actually look at our first GTS 350M notebook in the near future, just in time for replacements to arrive.) GTS 360M has 96 cores at 1325MHz with 57.6GB/s of bandwidth; GTS 350M has a slightly lower shader and RAM clocks. The new 460M checks in with 192 cores at 1350MHz, and slightly more memory bandwidth. Again, computationally we're looking at roughly double the performance potential. If TDP is similar, we're also looking at around 40W for the 460M.
Performance and Mainstream 400M
After the high-end parts, the drop in performance can become precipitous. This has been particularly bad for AMD GPUs, where the drop from Mobility 5800 series down to the 5700 and 5600 parts often means less than half the performance. NVIDIA has had a few more upper-midrange parts floating around, though, and that looks to continue.
NVIDIA Performance and Mainstream 400M Specifications | |||||
GeForce GT 445M | GeForce GT 435M | GeForce GT 425M | GeForce GT 420M | GeForce GT 415M | |
Codename | GF106 | GF108 | GF108 | GF108 | GF108 |
CUDA Cores | 144 | 96 | 96 | 96 | 48 |
Graphics Clock (MHz) | 590 | 650 | 560 | 500 | 500 |
Processor Clock (MHz) | 1180 | 1300 | 1120 | 1000 | 1000 |
Memory Clock (MHZ) | 800/1250 | 800 | 800 | 800 | 800 |
Standard Memory Configuration | DDR3/GDDR5 | DDR3 | DDR3 | DDR3 | DDR3 |
Memory Interface Width | 128/192-bit | 128-bit | 128-bit | 128-bit | 128-bit |
Memory Bandwidth (GB/sec) | 25.6/60.0 | 25.6 | 25.6 | 25.6 | 25.6 |
SLI Ready | No | No | No | No | No |
First, you'll notice that none of these "Performance and Mainstream" parts supports SLI. That's hardly surprising, as SLI with lower-end mobile GPUs has never been our recommended approach. First get to the high-end for performance reasons, and then worry about SLI. Other than that limitation, all of these parts have the same features as the faster parts on the previous page.
The new GT 445M is the first part to come with split specifications. Given the option for 128-bit and 192-bit bus widths, it appears the 445M will use the full GF106 memory controller for the higher bandwidth version and cut off one of the 64-bit interfaces for the low bandwidth model. Many of our gaming results have looked bandwidth limited, so we'd definitely recommend going for the GDDR5 192-bit model if possible, but that will be up to the notebook manufacturers. 445M looks to compete in a similar space as 460M with the higher bandwidth model, but it cuts computational power quite a bit at roughly two-thirds of the 460M. The difficulty here is that 445M can be either substantially faster than some of the older parts, or if you get the 128-bit DDR3 model you're suddenly cut down to less than half the bandwidth. Heavy use of shaders, tessellation, etc. might make the lack of bandwidth less painful, but without hardware and future games it's difficult to say how things will play out.
The 435M is a more straightforward replacement of GT 335M. (Did someone ask for a remake of M11x with a DX11 GPU? Hopefully they can do something about the LCD this time around….) 335M has 72 cores at 1080MHz, with 34.1GB/s of bandwidth. Unless something changes, 435M will actually have less bandwidth but substantially more computational power—60% more to be exact (plus architectural changes, obviously). This is a pattern that holds throughout the 400M lineup, so NVIDIA appears to be betting heavily that shader performance rather than bandwidth will become important.
Along with the 435M come several more GPUs; the 425M and 420M have the same bandwidth and core counts, but lower core/shader clocks. This is similar to the current 325M/330M, which have 48 cores but the same amount of bandwidth as the 335M. Even the lowest 420M has around 25% more compute power than 335M, but they all have less bandwidth. It would have been nice to see a move to GDDR5 on more of the Performance and Mainstream parts, as that would have improved overall performance substantially.
Finally, wrapping up the low end we have the GT 415M. Here we can actually see something to celebrate, since the previous generation parts largely consisted of 16 core models with a 64-bit bus (i.e. the G 310M). On the compute side, we're looking at nearly twice the power of the G 310M. Bandwidth also gets a kick in the pants, going from 12.8GB/s to 25.6GB/s. In short, our entry-level mobile GPUs just doubled their performance. Note also that if NVIDIA wanted to cuts things down even further, they'd need to make yet another chip (i.e. GF110), since 48 cores is a single SM. Most likely, for anything below GT 415M they'll just continue to sell their older 300M parts.
Miscellaneous Benefits and Closing Thoughts
Overall, we've seen a dramatic boost in core counts across the entire mobile family. What we haven't seen is much in the way of bandwidth improvements. How this will affect actually gaming remains to be seen, but NVIDIA is claiming average performance increases of around 40% compared to the older 300M series. We heard 30% faster performance with 480M versus 285M when that part launched in May, and we didn't quite get that in all games, but newer titles did tend to benefit more than older games.
There are other benefits that NVIDIA is touting with the new GPUs, some new but mostly this is stuff we've seen before. Front and center is Optimus Technology, with six of the seven major OEMs now shipping (or preparing to ship) Optimus enabled laptops. If we want to name names, Acer, ASUS, Dell, Lenovo, Samsung, and Toshiba have or will shortly have Optimus laptops; we'll let you fill in the missing blank. While there will always be a market for discrete-only laptops, the switching technology makes a lot of sense for midrange GPUs. Obviously you need a CPU with an IGP, which means no one is likely to do a high-end Optimus notebook just yet, but once Sandy Bridge launches the situation could (re: should) change. While Fermi was a power hungry beast, this is less of a concern on midrange and lower laptops. They'll need to be able to cool the GPUs when gaming, but at least on battery power you won't have to worry about the GPU sucking down watts.
NVIDIA is also touting the benefits of their CUDA GPUs again, which is hardly surprising. With more users doing HD video clips (i.e. with the latest smart phones), a way to quickly convert those videos into online friendly formats is certainly useful. Badaboom isn't going to win an award for the highest quality encodes, but if you're uploading to YouTube (where your video gets re-encoded anyway) it gets the job done. Needless to say, all of these new 400M GPUs should tear through such encodes much faster than even desktop CPUs. Retouching photos in Photoshop CS5 also gets a boost to speed, there's tons of web content moving to GPU acceleration (HTML 5 Video, Flash 10.1, WebGL, and Scalable Vector Graphics for example), and Internet Explorer 9 along with Firefox 4 and Chrome 7 will all have GPU acceleration. Intel's HD Graphics and upcoming Sandy Bridge IGP may struggle in comparison with a few of those areas, but we'll withhold judgment until we get hardware for testing.
And of course, there's the games. The slowest of the slow 400M GPUs should still pack quite a wallop when it comes to gaming. With three times as many cores as G 310M and twice the memory bandwidth, we expect at least double the performance out of the GT 415M. Wondering what that means? Well, 310M is already about three times faster than Intel's HD Graphics and typically more than twice as fast as AMD's HD 4200 IGP. In fact, it's only slightly slower than HD 5470, so if we get twice that level of performance with the bottom-of-the-barrel discrete GPU from NVIDIA all we can say is… it's about time! Of course, our Sandy Bridge preview indicates that Intel may roughly match G 310M performance with their next IGP, so Optimus or no users will want more from discrete GPUs.
Wrapping things up, we have the other NVIDIA features like 3D Vision (GT 425M or higher required for gaming, because of the 60FPS target to render two separate views), and the new 3D notebooks will also support 3DTV Play. And if you've wondered about the utility of 3D Vision on notebooks—after all, who wants to carry around the extra USB shutter transmitter?—the new line of 3D Vision enabled laptops will be integrating the emitter into the display bezel.
One last win for NVIDIA comes from ASUS, who will be building an all-in-one 3D Vision PC with the GTX 460M driving the graphics. Like the new 400M notebooks, the 3D emitter is integrated in the display bezel, providing for less wire clutter. We don't have any other details on the ASUS ET2400XVT other than availability is scheduled for some time in the next month or two; hopefully we can get one to test drive and let you know how it works in the near future.
All told, the 400M lineup is looking pretty good right about now. AMD got there first with top to bottom DX11 mobile parts, but performance wasn't substantially higher in many cases than their previous 4000 series. DX11 was a big selling point though, and judging by the number of HD 5000 laptop design wins consumers like the feature. Now NVIDIA can strike back with not just DX11, but very likely higher performance and features like CUDA, PhysX, and Optimus. If you've been holding off buying a new laptop, this fall may finally have the new designs to tempt you into upgrading. Unfortunately, that makes the last few pre-400M laptops we have in hand for review just a little less compelling, but hopefully those who don't need DX11 will be able to find some great deals on the current "outdated" crop as a consolation prize. Have I mentioned how much I like competition?