Original Link: https://www.anandtech.com/show/6276/nvidia-geforce-gtx-660-review-gk106-rounds-out-the-kepler-family
The NVIDIA GeForce GTX 660 Review: GK106 Fills Out The Kepler Family
by Ryan Smith on September 13, 2012 9:00 AM ESTAs our regular readers are well aware, NVIDIA’s 28nm supply constraints have proven to be a constant thorn in the side of the company. Since Q2 the message in financial statements has been clear: NVIDIA could be selling more GPUs if they had access to more 28nm capacity. As a result of this capacity constraint they have had to prioritize the high-profit mainstream mobile and high-end desktop markets above other consumer markets, leaving holes in their product lineups. In the intervening time they have launched products like the GK104-based GeForce GTX 660 Ti to help bridge that gap, but even that still left a hole between $100 and $300.
Now nearly 6 months after the launch of the first Kepler GPUs – and 9 months after the launch of the first 28nm GPUs – NVIDIA’s situation has finally improved to the point where they can finish filling out the first iteration of the Kepler GPU family. With GK104 at the high-end and GK107 at the low-end, the task of filling out the middle falls to NVIDIA’s latest GPU: GK106.
As given away by the model number, GK106 is designed to fit in between GK104 and GK107. GK106 offers a more modest collection of functional blocks in exchange for a smaller die size and lower power consumption, making it a perfect fit for NVIDIA’s mainstream desktop products. Even so, we have to admit that until a month ago we weren’t quite sure whether there would even be a GK106 since NVIDIA has covered so much of their typical product lineup with GK104 and GK107, leaving open the possibility of using those GPUs to also cover the rest. So the arrival of GK106 comes as a pleasant surprise amidst what for the last 6 months has been a very small GPU family.
GK106’s launch vehicle will be the GeForce GTX 660, the central member of NVIDIA’s mainstream video card lineup. GTX 660 is designed to come in between GTX 660 Ti and GTX 650 (also launching today), bringing Kepler and its improved performance down to the same $230 price range that the GTX 460 launched at nearly two years ago. NVIDIA has had a tremendous amount of success with the GTX 560 and GTX 460 families, so they’re looking to maintain this momentum with the GTX 660.
GTX 660 Ti | GTX 660 | GTX 650 | GT 640 | |
Stream Processors | 1344 | 960 | 384 | 384 |
Texture Units | 112 | 80 | 32 | 32 |
ROPs | 24 | 24 | 16 | 16 |
Core Clock | 915MHz | 980MHz | 1058MHz | 900MHz |
Shader Clock | N/A | N/A | N/A | N/A |
Boost Clock | 980MHz | 1033MHz | N/A | N/A |
Memory Clock | 6.008GHz GDDR5 | 6.008GHz GDDR5 | 5GHz GDDR5 | 1.782GHz DDR3 |
Memory Bus Width | 192-bit | 192-bit | 128-bit | 128-bit |
VRAM | 2GB | 2GB | 1GB/2GB | 2GB |
FP64 | 1/24 FP32 | 1/24 FP32 | 1/24 FP32 | 1/24 FP32 |
TDP | 150W | 140W | 64W | 65W |
GPU | GK104 | GK106 | GK107 | GK107 |
Transistor Count | 3.5B | 2.54B | 1.3B | 1.3B |
Manufacturing Process | TSMC 28nm | TSMC 28nm | TSMC 28nm | TSMC 28nm |
Launch Price | $299 | $229 | $109 | $99 |
Diving right into the guts of things, the GeForce GTX 660 will be utilizing a fully enabled GK106 GPU. A fully enabled GK106 in turn is composed of 5 SMXes – arranged in an asymmetric 3 GPC configuration – along with 24 ROPs, 3 64bit memory controllers, and 384KB of L2 cache. Design-wise this basically splits the difference between the 8 SMX + 32 ROP GK104 and the 2 SMX + 16 ROP GK107. This also means that GTX 660 ends up looking a great deal like a GTX 660 Ti with fewer SMXes.
Meanwhile the reduction in functional units has had the expected impact on die size and transistor count, with GK106 packing 2.54B transistors into 214mm2. This also means that GK106 is only 2mm2 larger than AMD’s Pitcairn GPU, which sets up a very obvious product showdown.
In breaking down GK106, it’s interesting to note that this is the first time since 2008’s G9x family of GPUs that NVIDIA’s consumer GPU has had this level of consistency. The 200 series was split between 3 different architectures (G9x, GT200, and GT21x), and the 400/500 series was split between Big Fermi (GF1x0) and Little Fermi (GF1x4/1x6/1x8). The 600 series on the other hand is architecturally consistent from top to bottom in all respects, which is why NVIDIA’s split of the GTX 660 series between GK104 and GK106 makes no practical difference. As a result GK104, GK106, and GK107 all offer the same Kepler family features – such as the NVENC hardware H.264 encoder, VP5 video decoder, FastHDMI support, TXAA anti-aliasing, and PCIe 3.0 connectivity – with only the number of functional units differing.
As GK106’s launch vehicle, GTX 660 will be the highest performing implementation of GK106 that we expect to see. NVIDIA is setting the reference clocks for the GTX 660 at 980MHz for the core and 6GHz for the memory, the second to only the GTX 680 in core clockspeed and still the same common 6GHz memory clockspeed we’ve seen across all of NVIDIA’s GDDR5 desktop Kepler parts this far. Compared to GTX 660 Ti this means that on paper GTX 660 has around 76% of the shading and texturing performance of the GTX 660 Ti, 80% of the rasterization performance, 100% of the memory bandwidth, and a full 107% of the ROP performance.
These figures mean that the performance of the GTX 660 relative to the GTX 660 Ti is going to be heavily dependent on shading and rasterization. Shader-heavy games will suffer the most while memory bandwidth-bound and ROP-bound games are likely to perform very similarly between the two video cards. Interestingly enough this is effectively opposite the difference between the GTX 670 and GTX 660 Ti, where the differences between the two of those cards were all in memory bandwidth and ROPs. So in scenarios where GTX 660 Ti’s configuration exacerbated GK104’s memory bandwidth limitations GTX 660 should emerge relatively unscathed.
On the power front, GTX 660 has power target of 115W with a TDP of 140W. Once again drawing a GTX 660 Ti comparison, this puts the TDP of the GTX 660 at only 10W lower than its larger sibling, but the power target is a full 19W lower. In practice power consumption on the GTX 600 series has been much more closely tracking the power target than it has the TDP, so as we’ll see the GTX 660 is often pulling 20W+ less than the GTX 660 Ti. This lower level of power consumption also means that the GTX 660 is the first GTX 600 product to only require 1 supplementary PCIe power connection.
Moving on, for today’s launch NVIDIA is once again going all virtual, with partners being left to their own designs. However given that this is the first GK106 part and that partners have had relatively little time with the GPU, in practice partners are using NVIDIA’s PCB designs with their own coolers – many of which have been lifted from their GTX 660 Ti designs – meaning that all of the cards being launched today are merely semi-custom as opposed to some fully custom designs like we saw with the GTX 660 Ti. This means that though there’s going to be a wide range designs with respect to cooling, all of today’s launch cards will be extremely consistent with regard to clockspeeds and power delivery.
Like the GTX 660 Ti launch, partners have the option of going with either 2GB or 3GB of RAM, with the former once more taking advantage of NVIDIA’s asymmetrical memory controller functionality. For partners that do offer cards in both memory capacities we’re expecting most partners to charge $30-$40 more for the extra 1GB of RAM.
NVIDIA has set the MSRP on the GTX 660 at $229, which NVIDIA’s partners will be adhering to almost to a fault. Of the 3 cards we’re looking at in our upcoming companion GTX 660 launch roundup article, every last card is going for $229 despite the fact that every last card is also factory overclocked. Because NVIDIA does not provide an exhaustive list of cards and prices it’s not possible to say for sure just what the retail market will look like ahead of time, but at this point it looks like most $229 cards will be shipping with some kind of factory overclock. This is very similar to how the GTX 560 launch played out, though if it parallels the GTX 560 launch close enough then reference-clocked cards will still be plentiful in time.
At $229 the GTX 660 is going to be coming in just under AMD’s Radeon HD 7870. AMD’s official MSRP on the 7870 is $249, but at this point in time the 7870 is commonly available for $10 cheaper at $239 after rebate. Meanwhile the 2GB 7850 will be boxing in the GTX 660 in from the other side, with the 7850 regularly found at $199. Like we saw with the GTX 660 Ti launch, these prices are no mistake by AMD, with AMD once again having preemptively cut prices so that NVIDIA doesn’t undercut them at launch. It’s also worth noting that NVIDIA will not be extending their Borderlands 2 promotion to the GTX 660, so this is $229 without any bundled games, whereas AMD’s Sleeping Dogs promotion is still active for the 7870.
Finally, along with the GTX 660 the GK107-based GTX 650 is also launching today at $109. For the full details of that launch please see our GTX 650 companion article. Supplies of both cards are expected to be plentiful.
Summer 2012 GPU Pricing Comparison | |||||
AMD | Price | NVIDIA | |||
Radeon HD 7950 | $329 | ||||
$299 | GeForce GTX 660 Ti | ||||
Radeon HD 7870 | $239 | ||||
$229 | GeForce GTX 660 | ||||
Radeon HD 7850 | $199 | ||||
Radeon HD 7770 | $109 | GeForce GTX 650 | |||
Radeon HD 7750 | $99 | GeForce GT 640 |
Meet The GeForce GTX 660
For virtual launches it’s often difficult for us to acquire reference clocked cards since NVIDIA doesn’t directly sample the press with reference cards, and today’s launch of the GeForce GTX 660 launch is one of those times. The problem stems from the fact that NVIDIA’s partners are hesitant to offer reference clocked cards to the press since they don’t want to lose to factory overclocked cards in benchmarks, which is an odd (but reasonable) concern.
For today’s launch we were able to get a reference clocked card, but in order to do so we had to agree not to show the card or name the partner who supplied the card. As it turns out this isn’t a big deal since the card we received is for all practical purposes identical to NVIDIA’s reference GTX 660, which NVIDIA has supplied pictures of. So let’s take a look at the “reference” GTX 660.
The reference GTX 660 is in many ways identical to the GTX 670, which comes as no great surprise given the similar size of their PCBs, which in turn allows NVIDIA to reuse the same cooler with little modification. Like the GTX 670, the reference GTX 660 is 9.5” long, with the PCB itself composing just 6.75” of that length while the blower and its housing composes the rest. The size of retail cards will vary between these two lengths as partners like EVGA will be implementing their own blowers similar to NVIDIA’s, while other partners like Zotac will be using open air coolers not much larger than the reference PCB itself.
Breaking open one of our factory overclocked GTX 660 (specifically, our EVGA 660 SC using the NV reference PCB), we can see that while the GTX 670 and GTX 660 are superficially similar on the outside, the PCB itself is quite different. The biggest change here is that while the 670 PCB made the unusual move of putting the VRM circuitry towards the front of the card, the GTX 660 PCB once more puts it on the far side. With the GTX 670 this was a design choice to get the GTX 670 PCB down to 6.75”, whereas with the GTX 660 it requires so little VRM circuitry in the first place that it’s no longer necessary to put that circuitry at the front of the card to find the necessary space.
Looking at the GK106 GPU itself, we can see that not only is the GPU smaller than GK104, but the entire GPU package itself has been reduced in size. Meanwhile, not that it has any functional difference, but GK106 is a bit more rectangular than GK104.
Moving on to the GTX 660’s RAM, we find something quite interesting. Up until now NVIDIA and their partners have regularly used Hynix 6GHz GDDR5 memory modules, with that specific RAM showing up on every GTX 680, GTX 670, and GTX 660 Ti we’ve tested. The GTX 660 meanwhile is the very first card we’ve seen that’s equipped with Samsung’s 6GHz GDDR5 memory modules, marking the first time we’ve seen non-Hynix memory on a GeForce GTX 600 card. Truth be told, though it has no technical implications we’ve seen so many Hynix equipped cards from both AMD and NVIDIA that it’s refreshing to see that there is in fact more than one GDDR5 supplier in the marketplace.
For the 2GB GTX 660, NVIDIA has outfit the card with 8 2Gb memory modules, 4 on the front and 4 on the rear. Oddly enough there aren’t any vacant RAM pads on the 2GB reference PCB, so it’s not entirely clear what partners are doing for their 3GB cards; presumably there’s a second reference PCB specifically built to house the 12 memory modules needed for 3GB cards.
Elsewhere we can find the GTX 660’s sole PCIe power socket on the rear of the card, responsible for supplying the other 75W the card needs. As for the front of the card, here we can find the card’s one SLI connector, which like previous generation mainstream video cards supports up to 2-way SLI.
Finally, looking at display connectivity we once more see the return of NVIDIA’s standard GTX 600 series display configuration. The reference GTX 660 is equipped with 1 DL-DVI-D port, 1 DL-DVI-I port, 1 full size HDMI 1.4 port, and 1 full size DisplayPort 1.2. Like GK104 and GK107, GK106 can drive up to 4 displays, meaning all 4 ports can be put into use simultaneously.
Just What Is NVIDIA’s Competition & The Test
Every now and then it’s productive to dissect NVIDIA’s press presentation to get an idea of what NVIDIA is thinking. NVIDIA’s marketing machine is generally laser-focused, but even so it’s not unusual for them to have their eye on more than one thing at a time.
In this case, ostensibly NVIDIA’s competition for the GTX 660 is the Radeon HD 7800 series. But if we actually dig through NVIDIA’s press deck we see that they only spend a single page comparing the GTX 660 to a 7800 series card (and it’s a 7850 at that). Meanwhile they spend 4 pages comparing the GTX 660 to prior generation NVIDIA products like the GTX 460 and/or the 9800GT.
The most immediate conclusion is that while NVIDIA is of course worried about stiff competition from AMD, they’re even more worried about competition from themselves right now. The entire computer industry has been facing declining revenues in the face of drawn out upgrade cycles due to older hardware remaining “good enough” for longer period of times, and NVIDIA is not immune from that. To even be in competition with AMD, NVIDIA needs to convince its core gaming user base to upgrade in the first place, which it seems is no easy task.
NVIDIA has spent a lot of time in the past couple of years worrying about the 8800GT/9800GT in particular. “The only card that matters” was a massive hit for the company straight up through 2010, which has made it difficult to get users to upgrade even 4 years later. As a result what was once a 2 year upgrade cycle has slowly stretched out to become a 4 year upgrade cycle, which means NVIDIA only gets to sell half as many cards in that timeframe. Which leads us back to NVIDIA’s press presentation: even though the GTX 460/560 has long since supplanted the 9800GT’s install base, NVIDIA is still in competition with themselves 4 years later, trying to drive their single greatest DX10 card into the sunset.
The Test
The official launch drivers for the GTX 660 are 306.23, which are the latest iteration of NVIDIA’s R304 branch of drivers. Besides adding support for the GTX 660, these drivers are performance-identical to earlier R304 drivers in our tests.
Also, we'd like to give a quick thank you to Antec, who rushed out a replacement True Power Quattro 1200 PSU on very short notice after the fan went bad on our existing unit. Thanks guys!
CPU: | Intel Core i7-3960X @ 4.3GHz |
Motherboard: | EVGA X79 SLI |
Chipset Drivers: | Intel 9.2.3.1022 |
Power Supply: | Antec True Power Quattro 1200 |
Hard Disk: | Samsung 470 (256GB) |
Memory: | G.Skill Ripjaws DDR3-1867 4 x 4GB (8-10-9-26) |
Case: | Thermaltake Spedo Advance |
Monitor: | Samsung 305T |
Video Cards: |
AMD Radeon HD 6870 AMD Radeon HD 7850 AMD Radeon HD 7870 AMD Radeon HD 7950 NVIDIA GeForce 8800GT NVIDIA GeForce GTX 260 NVIDIA GeForce GTX 460 1GB NVIDIA GeForce GTX 560 NVIDIA GeForce GTX 560 Ti NVIDIA GeForce GTX 660 Ti |
Video Drivers: |
NVIDIA ForceWare 304.79 Beta NVIDIA ForceWare 305.37 NVIDIA ForceWare 306.23 Beta AMD Catalyst 12.8 |
OS: | Windows 7 Ultimate 64-bit |
Crysis: Warhead
Kicking things off as always is Crysis: Warhead. It’s no longer the toughest game in our benchmark suite, but it’s still a technically complex game that has proven to be a very consistent benchmark. Thus even four years since the release of the original Crysis, “but can it run Crysis?” is still an important question, and the answer continues to be “no.” While we’re closer than ever, full Enthusiast settings at a 60fps is still beyond the grasp of a single-GPU card.
Crysis has been a game that has consistently penalized Kepler for its lack of memory bandwidth. Nowhere was this more evident than the GTX 660 Ti, which thanks to its memory bus reduction took a significant hit. But as we alluded to in our introduction, there’s a corner case where the GTX 660 is going to be able to easily keep up with the GTX 660 Ti: ROP and memory bandwidth-bound situations. As a result we’re looking at the best case scenario for the GTX 660 when held up against the GTX 660 Ti, which sees the GTX 660 offer 95% of the performance of the GTX 660 Ti. Most games aren’t going to be like this, but in this one case the GTX 660 may as well be as good as the GTX 660 Ti as far as performance goes, which goes to prove just how bottlenecked Crysis is by memory bandwidth.
Looking at a more meaningful comparison, because the GTX 660 doesn’t take a memory bandwidth hit compared to the GTX 660 Ti, the resulting card is much more resource balanced which in turn impacts AMD’s ability to lead in this benchmark. AMD once again wins here with the 7870 taking the lead, but only by a relatively modest 7% margin. This is the first time we haven’t seen a comparable AMD card lead by a significant margin in this generation, which for NVIDIA is an improvement though still not a reversal of fortunes. At the same time however NVIDIA isn’t doing too much better than the 7850 here, beating AMD’s lesser 7800 by an even more modest 5%.
As for NVIDIA’s older cards, the generational performance gains are in-line with what we’ve already seen out of the other GTX 600 cards. Compared to the GTX 460 1GB for example, a card that launched over 2 years ago at the same price, performance is up by 50-60%. But unsurprisingly this is less than the performance gain going from the 8800GT to the GTX 460, a similar timeline jump that saw performance more than double. At the very least NVIDIA certainly has the 8800GT licked at this point (by nearly a factor of 4x), but this means they’re also at risk of perpetuating longer upgrade cycles for current GTX 460 owners.
Moving on to minimum framerates, our results are almost the same with one interesting twist: the GTX 660 is now beating the more expensive GTX 660 Ti. Why? As we mentioned earlier, because of the higher core clock the ROPs on the GTX 660 actually have a greater theoretical throughput than the ROPs on the GTX 660 Ti. Since we’re not seeing any other factors that would explain this difference (i.e. drivers) it’s very likely that the GTX 660’s faster ROPs are giving it the advantage here.
Though while this is enough to push the GTX 660 ahead of the GTX 660 Ti, it’s not improving the GTX 660’s situation relative to the 7800 series at all. The GTX 660 is still closer to the 7850 than it is the 7870 here.
Metro: 2033
Paired with Crysis as our second behemoth FPS is Metro: 2033. Metro gives up Crysis’ lush tropics and frozen wastelands for an underground experience, but even underground it can be quite brutal on GPUs, which is why it’s also our new benchmark of choice for looking at power/temperature/noise during a game. If its sequel due this year is anywhere near as GPU intensive then a single GPU may not be enough to run the game with every quality feature turned up.
The situation with Metro is fairly similar to what we’ve seen under Crysis. Once more AMD sits in the lead here with the GTX 660 effectively splitting the difference between the 7850 and 7870 at the all-important resolution of 1920. Though NVIDIA does squeak by with a win at 1680.
Meanwhile as a more shader-heavy game than Crysis we finally see a gap open up between the GTX 660 Ti and GTX 660, with the GTX 660 coming in at roughly 87% of the performance of the GTX 660 Ti. The GTX 660 is only 76% the price of the GTX 660 Ti, so any time NVIDIA can offer similar performance with the GTX 660 they’re undercutting the GTX 660 Ti.
DiRT 3
For racing games our racer of choice continues to be DiRT, which is now in its 3rd iteration. Codemasters uses the same EGO engine between its DiRT, F1, and GRID series, so the performance of EGO has been relevant for a number of racing games over the years.
Though it may not seem like it at first glance, DiRT is actually a fairly shader/texture heavy game, which of course makes it less than ideal for the GTX 660. As a result the GTX 660 is clearly behind AMD’s closest competition, with the GTX 660 trailing the 7870 by 14% and leading the 7850 by only 7%. Compared to the GTX 460 the performance gains are much better, but 60% is still one of the smaller gaps we’ll see.
With DiRT 3’s consistent performance, the minimum framerates strongly reflect the average framerate, which means the GTX 660 is solidly between the 7850 and 7870.
Total War: Shogun 2
Total War: Shogun 2 is the latest installment of the long-running Total War series of turn based strategy games, and alongside Civilization V is notable for just how many units it can put on a screen at once. As it also turns out, it’s the single most punishing game in our benchmark suite (on higher end hardware at least).
Shogun is another shader-heavy game, so coming from the GTX 660 Ti we once again see a sizable performance drop. The performance difference here is very close to the theoretical performance difference between the two cards, with the GTX 660 delivering 80% of the performance of its older sibling.
With that said, NVIDIA had such a large lead here in the first place that even with 76% of the shading performance of the GTX 660 Ti, the GTX 660 still does well enough to tie the 7870. With 3 other losses so far NVIDIA needs to start beating the 7870, but for now tying the 7870 is a start.
Batman: Arkham City
Batman: Arkham City is loosely based on Unreal Engine 3, while the DirectX 11 functionality was apparently developed in-house. With the addition of these features Batman is far more a GPU demanding game than its predecessor was, particularly with tessellation cranked up to high.
Arkham City is another game that has favored memory bandwidth and ROP throughput over shader performance on Kepler parts, which means the GTX 660 does relatively well here. The performance drop coming from the GTX 660 Ti at 1920 is only 4fps, which puts the GTX 660 ahead of the 7870 by the same amount and well ahead of the 7850. On our 5th game the GTX 660 finally gets a win, even if it is just a modest 5%.
Interestingly enough this is also a good showcase for the GTX 660 versus the GTX 460, with the GTX 660 stopping just shy of doubling the GTX 460’s performance. More than anything else the extra 1GB of RAM is making the biggest difference here, which going forward is going to be the GTX 460’s Achilles’ Heel. With newer games the GTX 460 and GTX 560 are likely to run out of memory before they run out of shading and rendering resources.
Portal 2
Portal 2 continues the long and proud tradition of Valve’s in-house Source engine. While Source continues to be a DX9 engine, Valve has continued to upgrade it over the years to improve its quality, and combined with their choice of style you’d have a hard time telling it’s over 7 years old at this point. Consequently Portal 2’s performance does get rather high on high-end cards, but we have ways of fixing that…
Perhaps the most surprising thing about Portal 2 is that even with a $229 card performance is still high enough that we can reasonably get away with 4x SSAA. SSAA invokes a high shader performance hit, so the cost coming from the GTX 660 Ti is by no means cheap, but it’s still fast enough to average 72fps. At this framerate the GTX 660 is fast enough to beat both 7800 series cards and even the 7950, a card that by all rights should be winning here. One of these days we’ll get to the bottom of this, but time and time again we’re seeing that NVIDIA’s performance hit here from SSAA is far less than AMD’s.
Battlefield 3
Its popularity aside, Battlefield 3 may be the most interesting game in our benchmark suite for a single reason: it’s the first AAA DX10+ game. It’s been 5 years since the launch of the first DX10 GPUs, and 3 whole process node shrinks later we’re finally to the point where games are using DX10’s functionality as a baseline rather than an addition. Not surprisingly BF3 is one of the best looking games in our suite, but as with past Battlefield games that beauty comes with a high performance cost.
BF3 has always favored NVIDIA’s architectures, so it comes as no surprise here that this is another good showing for the GTX 660. Realistically speaking MSAA is out of the question here since the minimum framerates would drop into the 20s, but performance is still high enough for 1920 on Ultra quality with FXAA. Here the GTX 660 trails the GTX 660 Ti by 12% while stopping just short of completely clobbering the 7800 series. At 71fps it can beat the 7870 by 19% and even beats the 7950 by 14%. Much like Portal 2 this is a game where the 7950 should by all rights be winning, so it’s curious just what is going on under the hood that has NVIDIA’s architectures doing so well here.
Even among NVIDIA cards however this is another strong showing for the GTX 660. Here it improves on the performance of the GTX 460 by 76%, a difference so large that it sees the GTX 660 crack 60fps at 1920 when the GTX 460 can’t crack 60fps at 1680.
Starcraft II
Our next game is Starcraft II, Blizzard’s 2010 RTS megahit. Much like Portal 2 it’s a DX9 game designed to run on a wide range of hardware so performance is quite peppy with most high-end cards, but it can still challenge a GPU when it needs to.
With the release of patch 1.5, Blizzard turned both our Starcraft II testing methodology and our Starcraft II benchmark results on their heads. After straightening things out a curious pattern emerged: NVIDIA’s cards came out relatively unscathed, while most AMD GCN cards have taken a small performance hit compared to our earlier results. As a result Starcraft II now favors NVIDIA’s cards even more so now than it did before, making this an easy win for the GTX 660. At 1920 the GTX 660 beats the 7870 by 37%, and once more even the 7950 falls behind.
The driving factor here seems to be ROP performance, as showcased by the performance of the GTX 660 relative to that of the GTX 660 Ti. This is a textbook case of the GTX 660’s slightly higher ROP performance giving it an equally slight performance advantage over the GTX 660 Ti, and also explaining why performance hasn’t dropped to near-7870 levels like we’ve seen in some other games. With the next Starcraft II chapter already in beta testing, it will be interesting to see if these kinds of performance differences will remain into the future.
The Elder Scrolls V: Skyrim
Bethesda's epic sword & magic game The Elder Scrolls V: Skyrim is our RPG of choice for benchmarking. It's altogether a good CPU benchmark thanks to its complex scripting and AI, but it also can end up pushing a large number of fairly complex models and effects at once, especially with the addition of the high resolution texture pack.
Even at 1920 Skyrim’s performance breaks cards down into one of two general categories: cards that have enough RAM, and cards that don’t. The performance gap between everything from the 7850 to the GTX 660 Ti is quite small, with only 7fps separating the two cards. For the GTX 660 this means that it does end up losing to the 7870, but only by 1fps. RPGs are commonly more CPU-intensive than they are GPU-intensive, and nowhere is this more evident than with Skyrim.
Civilization V
Our final game, Civilization 5, gives us an interesting look at things that other RTSes cannot match, with a much weaker focus on shading in the game world, and a much greater focus on creating the geometry needed to bring such a world to life. In doing so it uses a slew of DirectX 11 technologies, including tessellation for said geometry, driver command lists for reducing CPU overhead, and compute shaders for on-the-fly texture decompression.
Civilization V was once a game that favored NVIDIA’s hardware, but with AMD’s GCN architecture that is no more. Coming from the GTX 660 Ti the GTX 660 takes a moderate 13% performance hit, but this only widens the gap between the GTX 660 series and the 7870, which was already the highest performing card out of this bunch. As a result AMD’s GTX 660 competitor leads by 17%, or put reciprocally the GTX 660 trails by 15%. In fact the GTX 660 doesn’t do much better than even the 7850 here, leading by just 6%.
Given Civilization V’s reliance on Compute Shader performance, it comes as no great surprise that this is also one of the weakest showings for the GTX 660 relative to the GTX 460. The GTX 660 ends up being only 45% faster than the GTX 460, the smallest improvement out of any game we’ve tested.
Compute Performance
As always our final set of real-world benchmarks is composed of a look at compute performance. As we have seen with GTX 680 and GTX 670, Kepler appears to be significantly less balanced between rendering and compute performance than GF110 or GF114 were, and as a result compute performance suffers. Further compounding this is the fact that GK106 only has 5 SMXes versus the 8 SMXes of GK104, which will likely further depress compute performance.
Our first compute benchmark comes from Civilization V, which uses DirectCompute to decompress textures on the fly. Civ V includes a sub-benchmark that exclusively tests the speed of their texture decompression algorithm by repeatedly decompressing the textures required for one of the game’s leader scenes. Note that this is a DX11 DirectCompute benchmark.
It’s interesting then that despite the obvious difference between the GTX 660 and GTX 660 Ti in theoretical compute performance, the GTX 660 actually beats the GTX 660 Ti here. Despite being a compute benchmark, Civlization V’s texture decompression benchmark is more sensitive to memory bandwidth and cache performance than it is shader performance, giving us the results we see above. Given the GTX 660 Ti’s poor showing in this benchmark this is a good thing for NVIDIA since this means they don’t fall any farther behind. Still, the GTX 660 is effectively tied with the 7850 and well behind the 7870.
Our next benchmark is SmallLuxGPU, the GPU ray tracing branch of the open source LuxRender renderer. We’re now using a development build from the version 2.0 branch, and we’ve moved on to a more complex scene that hopefully will provide a greater challenge to our GPUs.
SmallLuxGPU sees us shift towards an emphasis on pure compute performance, which of course is going to be GTX 660’s weak point here. Over 2 years after the launch of the GTX 460 and SLG performance has gone exactly nowhere, with the GTX 460 and GTX 660 turning in the same exact scores. Thank goodness the 8800GT is terrible at this benchmark, otherwise the GTX 660 would be in particularly bad shape.
It goes without saying that with the GTX 660’s poor compute performance here, the 7800 series is well in the lead. The 7870 more than trebles the GTX 660’s performance, an indisputable victory if there ever was one.
For our next benchmark we’re looking at AESEncryptDecrypt, an OpenCL AES encryption routine that AES encrypts/decrypts an 8K x 8K pixel square image file. The results of this benchmark are the average time to encrypt the image over a number of iterations of the AES cypher.
Our AES benchmark was one of the few compute benchmarks where the GTX 660 Ti had any kind of lead, but the significant loss of compute resources has erased that for the GTX 660. At 395ms it’s a hair slower than the 7850, never mind the 7870.
For our next benchmark we’re looking at AESEncryptDecrypt, an OpenCL AES encryption routine that AES encrypts/decrypts an 8K x 8K pixel square image file. The results of this benchmark are the average time to encrypt the image over a number of iterations of the AES cypher.
The fluid simulation is another benchmark that includes a stronger mix of memory bandwidth and cache rather than being purely dependent on compute resources. As a result the GTX 660 still trails the GTX 660 Ti, but by a great amount. Even so, the GTX 660 is no match for the 7800 series.
Finally, we’re adding one last benchmark to our compute run. NVIDIA and the Folding@Home group have sent over a benchmarkable version of the client with preliminary optimizations for Kepler. Folding@Home and similar initiatives are still one of the most popular consumer compute workloads, so it’s something NVIDIA wants their GPUs to do well at.
As we’ve seen previously with GK104, this is one of the few compute benchmarks that shows any kind of significant performance advantage for Little Kepler compared to Little Fermi. GTX 660 drops by 12% compared to GTX 660 Ti, but this is still good enough for a 60% performance advantage over GTX 460.
Synthetics
As always we’ll also take a quick look at synthetic performance to see if NVIDIA’s core configuration has had any impact on basis performance metrics. We’re expecting to see performance very close to the GTX 660 Ti, due to its nearly identical ROP/L2/memory configuration. We’ll start with 3DMark Vantage’s Pixel Fill test.
3DMark Vantage’s pixel fill test likes memory bandwidth and ROP performance in that order, which makes these results a bit odd. With identical memory bandwidth between them we’d expect the GTX 660 and GTX 660 Ti to at least be tied here, if not a slight lead for the GTX 660 thanks to its higher ROP performance. Instead the GTX 660 trails the GTX 660 Ti by a slight amount, an outcome we can’t explain at this time.
Our texture fillrate benchmark on the other hand sees a large gap between the GTX 660 and GTX 660 Ti, which is what we would expect from the loss of SMXes.
Our third theoretical test is the set of settings we use with Microsoft’s Detail Tessellation sample program out of the DX11 SDK
Despite the loss of SMXes (and thereby Polymorph engines), our tessellation benchmarks don’t show any kind of significant difference between the GTX 660 and GTX 660 Ti. We’ve been finding this benchmark to be surprisingly sensitive to ROP performance and memory bandwidth on Kepler, and these results back that finding.
Our final theoretical test is Unigine Heaven 2.5, a benchmark that straddles the line between a synthetic benchmark and a real-world benchmark as the engine is licensed but no notable DX11 games have been produced using it yet.
Despite its advanced nature, Heaven isn’t particularly sensitive to the loss of shader and texturing performance, as signified by the performance loss of less than 10% for the GTX 660.
Power, Temperature, & Noise
As always, we’re wrapping up our look at a video card’s stock performance with a look at power, temperature, and noise. Unlike GTX 660 Ti, which was a harvested GK104 GPU, GTX 660 is based on the brand-new GK106 GPU, which will have interesting repercussions for power consumption. Scaling down a GPU by disabling functional units often has diminishing returns, so GK106 will effectively “reset” NVIDIA’s position as far as power consumption goes. As a reminder, NVIDIA’s power target here is a mere 115W, while their TDP is 140W.
GeForce GTX 660 Series Voltages | |||||
Ref GTX 660 Ti Load | Ref GTX 660 Ti Idle | Ref GTX 660 Load | Ref GTX 660 Idle | ||
1.175v | 0.975v | 1.175v | 0.875v |
Stopping to take a quick look at voltages, even with a new GPU nothing has changed. NVIDIA’s standard voltage remains at 1.175v, the same as we’ve seen with GK104. However idle voltages are much lower, with the GK106 based GTX 660 idling at 0.875v versus 0.975v for the various GK104 desktop cards. As we’ll see later, this is an important distinction for GK106.
Up next, before we jump into our graphs let’s take a look at the average core clockspeed during our benchmarks. Because of GPU boost the boost clock alone doesn’t give us the whole picture, we’ve recorded the clockspeed of our GTX 660 during each of our benchmarks when running it at 1920x1200 and computed the average clockspeed over the duration of the benchmark
GeForce GTX 600 Series Average Clockspeeds | |||||
GTX 670 | GTX 660 Ti | GTX 660 | |||
Max Boost Clock | 1084MHz | 1058MHz | 1084MHz | ||
Crysis | 1057MHz | 1058MHz | 1047MHz | ||
Metro | 1042MHz | 1048MHz | 1042MHz | ||
DiRT 3 | 1037MHz | 1058MHz | 1054MHz | ||
Shogun 2 | 1064MHz | 1035MHz | 1045MHz | ||
Batman | 1042MHz | 1051MHz | 1029MHz | ||
Portal 2 | 988MHz | 1041MHz | 1033MHz | ||
Battlefield 3 | 1055MHz | 1054MHz | 1065MHz | ||
Starcraft II | 1084MHz | N/A | 1080MHz | ||
Skyrim | 1084MHz | 1045MHz | 1084MHz | ||
Civilization V | 1038MHz | 1045MHz | 1067MHz |
With an official boost clock of 1033MHz and a maximum boost of 1084MHz on our GTX 660, we see clockspeeds regularly vary between the two points. For the most part our average clockspeeds are slightly ahead of NVIDIA’s boost clock, while in CPU-heavy workloads (Starcraft II, Skyrim), we can almost sustain the maximum boost clock. Ultimately this means that the GTX 660 is spending most of its time near or above 1050MHz, which will have repercussions when it comes to overclocking.
Starting as always with idle power we immediately see an interesting outcome: GTX 660 has the lowest idle power usage. And it’s not just a one or two watt either, but rather a 6W (all the wall) difference between the GTX 660 and both the Radeon HD 7800 series and the GTX 600 series. All of the current 28nm GPUs have offered refreshingly low idle power usage, but with the GTX 660 we’re seeing NVIDIA cut into what was already a relatively low idle power usage and shrink it even further.
NVIDIA’s claim is that their idle power usage is around 5W, and while our testing methodology doesn’t allow us to isolate the video card, our results corroborate a near-5W value. The biggest factors here seem to be a combination of die size and idle voltage; we naturally see a reduction in idle power usage as we move to smaller GPUs with fewer transistors to power up, but also NVIDIA’s idle voltage of 0.875v is nearly 0.1v below GK104’s idle voltage and 0.075v lower than GT 640 (GK107)’s idle voltage. The combination of these factors has pushed the GTX 660’s idle power usage to the lowest point we’ve ever seen for a GPU of this size, which is quite an accomplishment. Though I suspect the real payoff will be in the mobile space, as even with Optimus mobile GPUs have to spend some time idling, which is another opportunity to save power.
At this point the only area in which NVIDIA doesn’t outperform AMD is in the so-called “long idle” scenario, where AMD’s ZeroCore Power technology gets to kick in. 5W is nice, but next-to-0W is even better.
Moving on to load power consumption, given NVIDIA’s focus on efficiency with the Kepler family it comes as no great surprise that NVIDIA continues to hold the lead when it comes to load power consumption. The gap between GTX 660 and 7870 isn’t quite as large as the gap we saw between GTX 680 and 7970 but NVIDIA still has a convincing lead here, with the GTX 660 consuming 23W less at the wall than the 7870. This puts the GTX 660 at around the power consumption of the 7850 (a card with a similar TDP) or the GTX 460. On AMD’s part, Pitcairn is a more petite (and less compute-heavy) part than Tahiti, which means AMD doesn’t face nearly the disparity as they do on the high-end.
OCCT on the other hand has the GTX 660 and 7870 much closer, thanks to AMD’s much more aggressive throttling through PowerTune. This is one of the only times where the GTX 660 isn’t competitive with the 7850 in some fashion, though based on our experience our Metro results are more meaningful than our OCCT results right now.
As for idle temperatures, there are no great surprises. A good blower can hit around 30C in our testbed, and that’s exactly what we see.
Temperatures under Metro look good enough; though despite their power advantage NVIDIA can’t keep up with the blower-equipped 7800 series. At the risk of spoiling our noise results, the 7800 series doesn’t do significantly worse for noise so it’s not immediately clear why the GTX 660 is 6C warmer here. Our best guess would be that the GTX 660’s cooler just quite isn’t up to the potential of the 7800 series’ reference cooler.
OCCT actually closes the gap between the 7870 and the GTX 660 rather than widening it, which is the opposite of what we would expect given our earlier temperature data. Reaching the mid-70s neither card is particularly cool, but both are still well below their thermal limits, meaning there’s plenty of thermal headroom to play with.
Last but not least we have our noise tests, starting with idle noise. Again there are no surprises here; the GTX 660’s blower is solid, producing no more noise than any other standard blower we’ve seen.
While the GTX 660 couldn’t beat the 7870 on temperatures under Metro, it can certainly beat the 7870 when it comes to noise. The difference isn’t particularly great – just 1.4dB – but every bit adds up, and 47.4dB is historically very good for a blower. However the use of a blower on the GTX 660 means that NVIDIA still can’t match the glory of the GTX 560 Ti or GTX 460; for that we’ll have to take a look at retail cards with open air coolers.
Similar to how AMD’s temperature lead eroded with OCCT, AMD’s slight loss in load noise testing becomes a much larger gap under OCCT. A 4.5dB difference is now solidly in the realm of noticeable, and further reinforces the fact that the GTX 660 is the quieter card under both normal and extreme situations.
We’ll be taking an in-depth look at some retail cards later today with our companion retail card article, but with those results already in hand we can say that despite the use of a blower the “reference” GTX 660 holds up very well. Open air coolers can definitely beat a blower with the usual drawbacks (that heat has to go somewhere), but when a blower is only hitting 47dB, you already have a fairly quiet card. So even a reference GTX 660 (as unlikely as it is to appear in North America) looks good all things considered.
OC: Power, Temperature, & Noise
Before wrapping things up, we wanted to quickly take a look at the overclocking potential of the GTX 660. As the first GK106 product GTX 660 should give us some idea as to how capable GK106 is at overclocking, though like GK104 we’re eventually at the mercy of NVIDIA’s locked voltages and limited power target control.
In its rawest form, GTX 660 will have two things going against it for overclocking. First and foremost, as the highest clocked GK106 part it’s already starting out at a fairly high clockspeed – 980MHz for reference cards, and upwards of 1050MHz for factory overclocked cards – so there may not be a great deal of overclocking headroom left to exploit. Furthermore because NVIDIA is keeping the power consumption of the card low (it needs to stay under 150W max), the maximum power target is the lowest we’ve seen for any GTX 600 card yet: it’s a mere 110%. As a result even if we can hit a large GPU clock offset, there may not be enough power headroom available to let the GPU regularly reach those speeds.
Memory overclocking on the other hand looks much better. With the same memory controllers and the same spec’d RAM as on the other high-end GTX 600 cards, there’s no reason to believe that the GTX 660 shouldn’t be able to hit equally high memory clocks, which means 6.5GHz+ is a reasonable goal.
GeForce GTX 660 Overclocking | ||||||
Ref GTX 660 | EVGA GTX 660 SC | Zotac GTX 660 | Gigabyte GTX 660 OC | |||
Shipping Core Clock | 980MHz | 1046MHz | 993MHz | 1033MHz | ||
Shipping Max Boost Clock | 1084MHz | 1123MHz | 1110MHz | 1123MHz | ||
Shipping Memory Clock | 6GHz | 6GHz | 6GHz | 6GHz | ||
Shipping Max Boost Voltage | 1.175v | 1.175v | 1.162v | 1.175v | ||
Overclock Core Clock | 1080MHz | 1096MHz | 1093MHz | 1083MHz | ||
Overclock Max Boost Clock | 1185MHz | 1174MHz | 1215MHz | 1174MHz | ||
Overclock Memory Clock | 6.7GHz | 6.9GHz | 6.7GHz | 6.5GHz | ||
Overclock Max Boost Voltage | 1.175v | 1.175v | 1.162v | 1.175v |
Throwing in our factory overclocked cards from our companion roundup, our core overclocking experience was remarkably consistent. The difference in the max boost clock between the slowest and fastest card was a mere 41MHz, with the Zotac card being a clear outlier compared to the rest of our cards. This comes as no great surprise since all of these launch cards are using the NVIDIA reference PCB, so there’s little room at this moment for overclocking innovation.
Memory overclocking is as volatile as ever, with a 400MHz spread between our best and worst cards. Again with the use of the reference PCB (and the same Samsung RAM), memory overclocking is entirely the luck of the draw.
For the moment at least GTX 660 overclocking looks to be on a level playing field due to all partners using the same PCB. For overclockers the choice of a card will come down to pricing, what cooler they prefer, and any preference in vendor.
The end result of all of this is that at best we’re seeing 100MHz overclocks (going by the max boost clock), which represents roughly a 10% overclock. Coupling this with a good memory overclock and the 10% increase in the power target will result in around a 10% increase in performance, which isn’t shabby but also is the same kind of shallow overclocking potential that we’ve seen on cards like the GTX 670 and GTX 660 Ti. All told the GTX 660 isn’t a poor overclocker – 10% more performance for free is nothing to sneeze at – but it’s also not going to enamor itself with hardware overclockers who like to chase 20% or more.
Moving on to our performance charts, we’re going to once again start with power, temperature, and noise, before moving on to gaming performance. Due to popular demand we’ll also be including overclocking results with just a 110% power target so that you can see the impact of adjusting the power target separately from the clock offsets.
With a 110% power target we should be seeing an 11W-14W increase in power consumption, which is indeed roughly what we’re seeing at the wall after accounting for PSU inefficiencies. In Metro this is just enough of a difference to erase most of the GTX 660’s power consumption advantage over the GTX 660 Ti, though the GTX 660 still draws marginally less power than the stock 7870. Meanwhile under OCCT the GTX 660 now draws more power than the 7870, but still is still drawing over 20W less than the stock GTX 660 Ti.
Our increased power consumption pushes temperatures up by another 2-3C. This is nothing a blower can’t handle, let alone an open-air cooler.
Interestingly enough, despite the increase in power consumption and temperatures, overclocking has almost no impact on noise. In the worst case scenario our GTX 660’s increased its fan speed by all of 2%, which increases noise by less than 1dB. As a result the amount of noise generated by the overclocked GTX 660 is practically identical to that generated by the stock GTX 660, and still below the reference 7870.
OC: Gaming Performance
We’ll keep the running commentary short here, but as we’ll see in our results, power target overclocking alone doesn’t produce a particularly significant result here. Our GTX 660 already runs so close to its maximum boost clock so often that simply allowing it to draw more power doesn’t allow it to clock up all that much higher.
So for significant overclocking we need to turn to utilizing the power target alone with clock offsets. With those offsets in place we’ve increased our power target limits, core clock, and memory clock all by roughly 10%. Overclocked performance as a result is between 8% and 12% better than the stock GTX 660, which for ROP or memory bandwidth-limited game is more than enough to usurp the stock GTX 660 Ti.Meanwhile for games that are heavily shader-bound the 10% overclock will still leave the GTX 660 well behind the stock GTX 660 Ti.
Final Words
Bringing our review of the first GK106-based video card to a close, it’s difficult not to sound like a broken record at times. The launch of the GeForce GTX 660 and the accompanying GK106 GPU is very much a by-the-numbers launch. This is by no means a bad thing, but it does mean that it’s a launch with very few surprises.
As far as NVIDIA’s execution goes, GK106 and the GTX 660 is exactly what they’ve needed to start filling in the gap between $100 and $300. Truth be told we would have liked to see the GTX 660 come in at $200 so that NVIDIA had a clear $200 contender – an always-popular price point – but given the performance of the GTX 660 that’s being a bit wishful on our part. Furthermore NVIDIA would still need to leave enough room for the eventual launch of the next GK106 part, which will be whatever goes between GTX 650 and GTX 660. So much like the GTX 460 1GB two years before it, the GTX 660 launches at $229.
To that end NVIDIA has done their launch planning well, and for $229 it’s hard to argue that they haven’t hit the right balance of price and performance. GeForce GTX 660 offers around 88% of the performance of the GTX 660 Ti at 1920x1200, making it a strong performer in its own right and the logical follow-up to the GTX 660 Ti. However on that note I think this is going to be one of the more unusual launches due to how inconsistent the performance gap between NVIDIA’s cards is, as the GTX 660 offers anywhere between 80% to 100% of the performance of the GTX 660 Ti, owing to the much different shader-to-ROP ratio of the GTX 660. In the right scenario the GTX 660 is every bit as fast as the GTX 660 Ti, though these scenarios are admittedly few and far between.
The real question of course isn’t how the GTX 660 compares to the GTX 660 Ti, but rather how it compares to the Radeon HD 7870 in the face of AMD’s earlier price drops. Even with a more balanced shader-to-ROP ratio for GTX 660, the question of who wins remains to be heavily dependent on the game being tested. AMD controls their traditional strongholds of Crysis, DiRT, and Civilization V, while NVIDIA controls Battlefield 3, Starcraft II, and Portal 2. The end result is that the GTX 660 is on average 4% ahead of the 7870, but once again this is an anything-but-equal scenario; even swapping out a single game could easily shift the balance, reiterating the importance of individual games when relative performance is so inconsistent.
Meanwhile when it comes to physical metrics like power consumption, temperature, and noise, NVIDIA does have a clear edge thanks to another efficient rendition of the Kepler architecture with GK106. GK106 doesn’t enjoy nearly the same advantage over Pitcairn that GK104 did over Tahiti, but it’s still enough to get the same job done with less power consumed and less noise generated. It’s also just enough to make GTX 660 the preferable card over 7870 (at least as far as reference cards go) though by no means is 7870 suddenly a poor choice.
The real wildcard for today’s launch is going to be the prevalence of factory overclocked cards, which are going to be showing up at the same $229 price point as reference cards. Factory overclocked cards will sacrifice GTX 660’s edge in power consumption, but of course they’ll extend the GTX 660’s performance lead. For major launch articles we’re always going to base our advice on reference clocked cards since those are by definition the bare minimum level of performance you can expect, but you’ll want to come back later today for our companion article that takes a look at some of the $229 factory overclocked cards launching today.
Ultimately how well the GTX 660 is received is up to AMD more than it is NVIDIA. The 7870 is already priced close enough to the GTX 660 that the price difference is negligible, and meanwhile AMD and their partners could easily trim another $10 or $20 off of the card’s price to match or beat NVIDIA’s pricing (all the while still offering a bundled game), at which point the sweet spot would once again shift back to AMD. Otherwise AMD is still not in a bad position, even if the GTX 660 is technically the better card.
Wrapping things up, as we briefly discussed earlier NVIDIA’s biggest hurdle isn’t AMD so much as it is themselves. The GTX 660 is a clear multi-generational upgrade over particularly old cards like the 9800GT and GTX 260, but compared to the Fermi cards of the last two years the performance jump isn’t quite as grand. Contrasting the launch of the GTX 660 to the launch of the GTX 460 1GB two years ago, NVIDIA is actually doing far better in this respect thanks to the fact that the GTX 660 offers an impressive 75% jump in performance over the GTX 460 1GB. But at the same time we’re now approaching a more frugal market segment; enthusiasts gamers can justify spending $300+ every 2 years for a next-generation video card even if the gains are only 50%, but mainstream gamers need a bigger jump. GTX 660 is unquestionably a meaningful upgrade to an aging Fermi card – these days Fermi is going to have a hard time hitting playable framerates at 1920 with a high degree of quality – but given the fact that we’re still on the Direct3D 11 generation of video cards holding on to Fermi for one more generation wouldn’t be hard to justify for the cash-strapped mainstream gamer.