Original Link: https://www.anandtech.com/show/6159/the-geforce-gtx-660-ti-review
The GeForce GTX 660 Ti Review, Feat. EVGA, Zotac, and Gigabyte
by Ryan Smith on August 16, 2012 9:00 AM ESTIt’s hard not to notice that NVIDIA has a bit of a problem right now. In the months since the launch of their first Kepler product, the GeForce GTX 680, the company has introduced several other Kepler products into the desktop 600 series. With the exception of the GeForce GT 640 – their only budget part – all of those 600 series parts have been targeted at the high end, where they became popular, well received products that significantly tilted the market in NVIDIA’s favor.
The problem with this is almost paradoxical: these products are too popular. Between the GK104-heavy desktop GeForce lineup, the GK104 based Tesla K10, and the GK107-heavy mobile GeForce lineup, NVIDIA is selling every 28nm chip they can make. For a business prone to boom and bust cycles this is not a bad problem to have, but it means NVIDIA has been unable to expand their market presence as quickly as customers would like. For the desktop in particular this means NVIDIA has a very large, very noticeable hole in their product lineup between $100 and $400, which composes the mainstream and performance market segments. These market segments aren’t quite the high margin markets NVIDIA is currently servicing, but they are important to fill because they’re where product volumes increase and where most of their regular customers reside.
Long-term NVIDIA needs more production capacity and a wider selection of GPUs to fill this hole, but in the meantime they can at least begin to fill it with what they have to work with. This brings us to today’s product launch: the GeForce GTX 660 Ti. With nothing between GK104 and GK107 at the moment, NVIDIA is pushing out one more desktop product based on GK104 in order to bring Kepler to the performance market. Serving as an outlet for further binned GK104 GPUs, the GTX 660 Ti will be launching today as NVIDIA’s $300 performance part.
GTX 680 | GTX 670 | GTX 660 Ti | GTX 570 | |
Stream Processors | 1536 | 1344 | 1344 | 480 |
Texture Units | 128 | 112 | 112 | 60 |
ROPs | 32 | 32 | 24 | 40 |
Core Clock | 1006MHz | 915MHz | 915MHz | 732MHz |
Shader Clock | N/A | N/A | N/A | 1464MHz |
Boost Clock | 1058MHz | 980MHz | 980MHz | N/A |
Memory Clock | 6.008GHz GDDR5 | 6.008GHz GDDR5 | 6.008GHz GDDR5 | 3.8GHz GDDR5 |
Memory Bus Width | 256-bit | 256-bit | 192-bit | 320-bit |
VRAM | 2GB | 2GB | 2GB | 1.25GB |
FP64 | 1/24 FP32 | 1/24 FP32 | 1/24 FP32 | 1/8 FP32 |
TDP | 195W | 170W | 150W | 219W |
Transistor Count | 3.5B | 3.5B | 3.5B | 3B |
Manufacturing Process | TSMC 28nm | TSMC 28nm | TSMC 28nm | TSMC 40nm |
Launch Price | $499 | $399 | $299 | $349 |
In the Fermi generation, NVIDIA filled the performance market with GF104 and GF114, the backbone of the very successful GTX 460 and GTX 560 series of video cards. Given Fermi’s 4 chip product stack – specifically the existence of the GF100/GF110 powerhouse – this is a move that made perfect sense. However it’s not a move that works quite as well for NVIDIA’s (so far) 2 chip product stack. In a move very reminiscent of the GeForce GTX 200 series, with GK104 already serving the GTX 690, GTX 680, and GTX 670, it is also being called upon to fill out the GTX 660 Ti.
All things considered the GTX 660 Ti is extremely similar to the GTX 670. The base clock is the same, the boost clock is the same, the memory clock is the same, and even the number of shaders is the same. In fact there’s only a single significant difference between the GTX 670 and GTX 660 Ti: the GTX 660 Ti surrenders one of GK104’s four ROP/L2/Memory clusters, reducing it from a 32 ROP, 512KB L2, 4 memory channel part to a 24 ROP, 384KB L2, 3 memory channel part. With NVIDIA already binning chips for assignment to GTX 680 and GTX 670, this allows NVIDIA to further bin those GTX 670 parts without much additional effort. Though given the relatively small size of a ROP/L2/Memory cluster, it’s a bit surprising they have all that many chips that don’t meet GTX 670 standards.
In any case, as a result of these design choices the GTX 660 Ti is a fairly straightforward part. The 915MHz base clock and 980MHz boost clock of the chip along with the 7 SMXes means that GTX 660 Ti has the same theoretical compute, geometry, and texturing performance as GTX 670. The real difference between the two is on the render operation and memory bandwidth side of things, where the loss of the ROP/L2/Memory cluster means that GTX 660 Ti surrenders a full 25% of its render performance and its memory bandwidth. Interestingly NVIDIA has kept their memory clocks at 6GHz – in previous generations they would lower them to enable the use of cheaper memory – which is significant for performance since it keeps the memory bandwidth loss at just 25%.
How this loss of render operation performance and memory bandwidth will play out is going to depend heavily on the task at hand. We’ve already seen GK104 struggle with a lack of memory bandwidth in games like Crysis, so coming from GTX 670 this is only going to exacerbate that problem; a full 25% drop in performance is not out of the question here. However in games that are shader heavy (but not necessarily memory bandwidth heavy) like Portal 2, this means that GTX 660 Ti can hang very close to its more powerful sibling. There’s also the question of how NVIDIA’s nebulous asymmetrical memory bank design will impact performance, since 2GB of RAM doesn’t fit cleanly into 3 memory banks. All of these are issues where we’ll have to turn to benchmarking to better understand.
The impact on power consumption on the other hand is relatively straightforward. With clocks identical to the GTX 670, power consumption has only been reduced marginally due to the disabling of the ROP cluster. NVIDIA’s official TDP is 150W, with a power target of 134W. This compares to a TDP of 170W and a power target of 141W for the GTW 670. Given the mechanisms at work for NVIDIA’s GPU boost technology, it’s the power target that is a far better reflection of what to expect relative to the GTX 670. On paper this means that GK104 could probably be stuffed into a sub-150W card with some further functional units being disabled, but in practice desktop GK104 GPUs are probably a bit too power hungry for that.
Moving on, this launch will be what NVIDIA calls a “virtual” launch, which is to say that there aren’t any reference cards being shipped to partners to sell or to press to sample. Instead all of NVIDIA’s partners will be launching with semi-custom and fully-custom cards right away. This means we’re going to see a wide variety of cards right off the bat, however it also means that there will be less consistency between partners since no two cards are going to be quite alike. For that reason we’ll be looking at a slightly wider selection of partner designs today, with cards from EVGA, Zotac, and Gigabyte occupying our charts.
As for the launch supply, with NVIDIA having licked their GK104 supply problems a couple of months ago the supply of GTX 660 Ti cards looks like it should be plentiful. Some cards are going to be more popular than others and for that reason we expect we’ll see some cards sell out, but at the end of the day there shouldn’t be any problem grabbing a GTX 660 Ti on today’s launch day.
Pricing for GTX 660 Ti cards will start at $299, continuing NVIDIA’s tidy hierarchy of a GeForce 600 at every $100 price point. With the launch of the GTX 660 Ti NVIDIA will finally be able to start clearing out the GTX 570, a not-unwelcome thing as the GTX 660 Ti brings with it the Kepler family features (NVENC, TXAA, GPU boost, and D3D 11.1) along with nearly twice as much RAM and much lower power consumption. However this also means that despite the name, the GTX 660 Ti is a de facto replacement for the GTX 570 rather than the GTX 560 Ti. The sub-$250 market the GTX 560 Ti launched will continue to be served by Fermi parts for the time being. NVIDIA will no doubt see quite a bit of success even at $300, but it probably won’t be quite the hot item that the GTX 560 Ti was.
Meanwhile for a limited period of time NVIDIA will be sweeting the deal by throwing in a copy of Borderlands 2 with all GTX 600 series cards as a GTX 660 Ti launch promotion. Borderlands 2 is the sequel to Gearbox’s 2009 FPS/RPG hybrid, and is a TWIMTBP game that will have PhysX support along with planned support for TXAA. Like their prior promotions this is being done through retailers in North America, so you will need to check and ensure your retailer is throwing in Borderlands 2 vouchers with any GTX 600 card you purchase.
On the marketing front, as a performance part NVIDIA is looking to not only sell the GTX 660 Ti as an upgrade to 400/500 series owners, but to also entice existing GTX 200 series owners to upgrade. The GTX 660 Ti will be quite a bit faster than any GTX 200 series part (and cooler/quieter than all of them), with the question being of whether it’s going to be enough to spur those owners to upgrade. NVIDIA did see a lot of success last year with the GTX 560 driving the retirement of the 8800GT/9800GT, so we’ll see how that goes.
Anyhow, as with the launch of the GTX 670 cards virtually every partner is also launching one or more factory overclocked model, so the entire lineup of launch cards will be between $299 and $339 or so. This price range will put NVIDIA and its partners smack-dab between AMD’s existing 7000 series cards, which have already been shuffling in price some due to the GTX 670 and the impending launch of the GTX 660 Ti. Reference-clocked cards will sit right between the $279 Radeon HD 7870 and $329 Radeon HD 7950, which means that factory overclocked cards will be going head-to-head with the 7950.
On that note, with the launch of the GTX 660 Ti we can finally shed some further light on this week’s unexpected announcement of a new Radeon HD 7950 revision from AMD. As you’ll see in our benchmarks the existing 7950 maintains an uncomfortably slight lead over the GTX 660 Ti, which has spurred on AMD to bump up the 7950’s clockspeeds at the cost of power consumption in order to avoid having it end up as a sub-$300 product. The new 7950B is still scheduled to show up at the end of this week, with AMD’s already-battered product launch credibility hanging in the balance.
For this review we’re going to include both the 7950 and 7950B in our results. We’re not at all happy with how AMD is handling this – it’s the kind of slimy thing that has already gotten NVIDIA in trouble in the past – and while we don’t want to reward such actions it would be remiss of us not to include it since it is a new reference part. And if AMD’s credibility is worth anything it will be on the shelves tomorrow anyhow.
Summer 2012 GPU Pricing Comparison | |||||
AMD | Price | NVIDIA | |||
Radeon HD 7970 GHz Edition | $469/$499 | GeForce GTX 680 | |||
Radeon HD 7970 | $419/$399 | GeForce GTX 670 | |||
Radeon HD 7950 | $329 | ||||
$299 | GeForce GTX 660 Ti | ||||
Radeon HD 7870 | $279 | ||||
$279 | GeForce GTX 570 | ||||
Radeon HD 7850 | $239 |
That Darn Memory Bus
Among the entire GTX 600 family, the GTX 660 Ti’s one unique feature is its memory controller layout. NVIDIA built GK104 with 4 memory controllers, each 64 bits wide, giving the entire GPU a combined memory bus width of 256 bits. These memory controllers are tied into the ROPs and L2 cache, with each controller forming part of a ROP partition containing 8 ROPs (or rather 1 ROP unit capable of processing 8 operations), 128KB of L2 cache, and the memory controller. To disable any of those things means taking out a whole ROP partition, which is exactly what NVIDIA has done.
The impact on the ROPs and the L2 cache is rather straightforward – render operation throughput is reduced by 25% and there’s 25% less L2 cache to store data in – but the loss of the memory controller is a much tougher concept to deal with. This goes for both NVIDIA on the design end and for consumers on the usage end.
256 is a nice power-of-two number. For video cards with power-of-two memory bus widths, it’s very easy to equip them with a similarly power-of-two memory capacity such as 1GB, 2GB, or 4GB of memory. For various minor technical reasons (mostly the sanity of the engineers), GPU manufacturers like sticking to power-of-two memory busses. And while this is by no means a true design constraint in video card manufacturing, there are ramifications for skipping from it.
The biggest consequence of deviating from a power-of-two memory bus is that under normal circumstances this leads to a card’s memory capacity not lining up with the bulk of the cards on the market. To use the GTX 500 series as an example, NVIDIA had 1.5GB of memory on the GTX 580 at a time when the common Radeon HD 5870 had 1GB, giving NVIDIA a 512MB advantage. Later on however the common Radeon HD 6970 had 2GB of memory, leaving NVIDIA behind by 512MB. This also had one additional consequence for NVIDIA: they needed 12 memory chips where AMD needed 8, which generally inflates the bill of materials more than the price of higher speed memory in a narrower design does. This ended up not being a problem for the GTX 580 since 1.5GB was still plenty of memory for 2010/2011 and the high pricetag could easily absorb the BoM hit, but this is not always the case.
Because NVIDIA has disabled a ROP partition on GK104 in order to make the GTX 660 Ti, they’re dropping from a power-of-two 256bit bus to an off-size 192bit bus. Under normal circumstances this means that they’d need to either reduce the amount of memory on the card from 2GB to 1.5GB, or double it to 3GB. The former is undesirable for competitive reasons (AMD has 2GB cards below the 660 Ti and 3GB cards above) not to mention the fact that 1.5GB is too small for a $300 card in 2012. The latter on the other hand incurs the BoM hit as NVIDIA moves from 8 memory chips to 12 memory chips, a scenario that the lower margin GTX 660 Ti can’t as easily absorb, not to mention how silly it would be for a GTX 680 to have less memory than a GTX 660 Ti.
Rather than take the usual route NVIDIA is going to take their own 3rd route: put 2GB of memory on the GTX 660 Ti anyhow. By putting more memory on one controller than the other two – in effect breaking the symmetry of the memory banks – NVIDIA can have 2GB of memory attached to a 192bit memory bus. This is a technique that NVIDIA has had available to them for quite some time, but it’s also something they rarely pull out and only use it when necessary.
We were first introduced to this technique with the GTX 550 Ti in 2011, which had a similarly large 192bit memory bus. By using a mix of 2Gb and 1Gb modules, NVIDIA could outfit the card with 1GB of memory rather than the 1.5GB/768MB that a 192bit memory bus would typically dictate.
For the GTX 660 Ti in 2012 NVIDIA is once again going to use their asymmetrical memory technique in order to outfit the GTX 660 Ti with 2GB of memory on a 192bit bus, but they’re going to be implementing it slightly differently. Whereas the GTX 550 Ti mixed memory chip density in order to get 1GB out of 6 chips, the GTX 660 Ti will mix up the number of chips attached to each controller in order to get 2GB out of 8 chips. Specifically, there will be 4 chips instead of 2 attached to one of the memory controllers, while the other controllers will continue to have 2 chips. By doing it in this manner, this allows NVIDIA to use the same Hynix 2Gb chips they already use in the rest of the GTX 600 series, with the only high-level difference being the width of the bus connecting them.
Of course at a low-level it’s more complex than that. In a symmetrical design with an equal amount of RAM on each controller it’s rather easy to interleave memory operations across all of the controllers, which maximizes performance of the memory subsystem as a whole. However complete interleaving requires that kind of a symmetrical design, which means it’s not quite suitable for use on NVIDIA’s asymmetrical memory designs. Instead NVIDIA must start playing tricks. And when tricks are involved, there’s always a downside.
The best case scenario is always going to be that the entire 192bit bus is in use by interleaving a memory operation across all 3 controllers, giving the card 144GB/sec of memory bandwidth (192bit * 6GHz / 8). But that can only be done at up to 1.5GB of memory; the final 512MB of memory is attached to a single memory controller. This invokes the worst case scenario, where only 1 64-bit memory controller is in use and thereby reducing memory bandwidth to a much more modest 48GB/sec.
How NVIDIA spreads out memory accesses will have a great deal of impact on when we hit these scenarios. In the past we’ve tried to divine how NVIDIA is accomplishing this, but even with the compute capability of CUDA memory appears to be too far abstracted for us to test any specific theories. And because NVIDIA is continuing to label the internal details of their memory bus a competitive advantage, they’re unwilling to share the details of its operation with us. Thus we’re largely dealing with a black box here, one where poking and prodding doesn’t produce much in the way of meaningful results.
As with the GTX 550 Ti, all we can really say at this time is that the performance we get in our benchmarks is the performance we get. Our best guess remains that NVIDIA is interleaving the lower 1.5GB of address while pushing the last 512MB of address space into the larger memory bank, but we don’t have any hard data to back it up. For most users this shouldn’t be a problem (especially since GK104 is so wishy-washy at compute), but it remains that there’s always a downside to an asymmetrical memory design. With any luck one day we’ll find that downside and be able to better understand the GTX 660 Ti’s performance in the process.
Meet The EVGA GeForce GTX 660 Ti Superclocked
Our first card of the day is EVGA’s entry, the EVGA GeForce GTX 660 Ti Superclocked. Among all of the GTX 670 cards we’ve looked at and all of the GTX 660 Ti cards we’re going to be looking at, this is the card that is the most like its older sibling. In fact with only a couple cosmetic differences it’s practically identical in construction.
GeForce GTX 660 Ti Partner Card Specification Comparison | ||||||
GeForce GTX 660 Ti(Ref) | EVGA GTX 660 Ti Superclocked | Zotac GTX 660 Ti AMP! | Gigabyte GTX 660 Ti OC | |||
Base Clock | 915MHz | 980MHz | 1033MHz | 1033MHz | ||
Boost Clock | 980MHz | 1059MHz | 1111MHz | 1111MHz | ||
Memory Clock | 6008MHz | 6008MHz | 6608MHz | 6008MHz | ||
Frame Buffer | 2GB | 2GB | 2GB | 2GB | ||
TDP | 150W | 150W | 150W | ~170W | ||
Width | Double Slot | Double Slot | Double Slot | Double Slot | ||
Length | N/A | 9.5" | 7.5" | 10,5" | ||
Warranty | N/A | 3 Year | 3 Year + Life | 3 Year | ||
Price Point | $299 | $309 | $329 | $319 |
EVGA will be clocking the GTX 660 Ti SC at 980MHz for the base clock and 1059MHz for the boost clock, which represents a 65MHz (7%) and 79MHz (8%) overclock respectively. Meanwhile EVGA has left the memory clocked untouched at 6GHz, the reference memory clockspeed for all of NVIDIA’s GTX 600 parts thus far.
The GTX 660 Ti is otherwise identical to the GTX 670, for all of the benefits that entails. While NVIDIA isn’t shipping a proper reference card for the GTX 660 Ti, they did create a reference design, and this appears to be what it’s based on. Both the EVGA and Zotac cards are using identical PCBs derived from the GTX 670’s PCB, which is not unexpected given the power consumption of the GTX 660 Ti. The only difference we can find on this PCB is that instead of there being solder pads for 16 memory chips there are solder pads for 12, reflecting the fact that the GTX 660 Ti can have at most 12 memory chips attached.
With this PCB design the PCB measures only 6.75” long, with the bulk of the VRM components located at the front of the card rather than the rear. Hynix 2Gb 6GHz memory chips are placed both on the front of the PCB and the back, with 6 on the front and 2 on the rear. The rear chips are directly behind a pair of front chips, reflecting the fact that all 4 of these chips are connected to a single memory controller.
With the effective reuse of the GTX 670 PCB, EVGA is also reusing their GTX 670 cooler. This cooler is a blower, which due to the positioning of the GPU and various electronic components means that the blower fan is off of the PCB entirely by necessity. Instead the blower fan is located behind the card in a piece of enclosed housing. This housing pushes the total length of the card out to 9.5”. Housed inside of the enclosure is a block-style aluminum heatsink with a copper baseplate that is providing cooling for the GPU. Elsewhere, attached to the PCB we’ll see a moderately sized aluminum heatsink clamped down on top of the VRMs towards the front of the card. There is no cooling provided for the GDDR5 RAM.
Elsewhere, at the top of the card we’ll find the 2 PCIe power sockets and 2 SLI connectors. Meanwhile at the front of the card EVGA is using the same I/O port configuration and bracket that we saw with the GTX 670. This means they’re using the NVIDIA standard: 1 DL-DVI-D port, 1 DL-DVI-I port, 1 full size HDMI 1.4 port, and 1 full size DisplayPort 1.2. This also means that the card features EVGA’s high-flow bracket, a bracket with less shielding in order to maximize the amount of air that can be exhausted.
Rounding out the package is EVGA’s typical collection of accessories and knick-knacks. In the box you’ll find a pair of molex power adapters, a quick start guide, and some stickers. The real meat of EVGA’s offering is on their website, where EVGA card owners can download their wonderful video card overclocking utility (Precision X), and their stress test utility (OC Scanner X). The powered-by-RivaTuner Precision X and OC Scanner X still set the gold standard for video card utilities thanks to their functionality and ease of use. Though personally I’m not a fan of the new UI – circular UIs and sliders aren’t particularly easy to read – but it gets the job done.
Next, as with all EVGA cards, the EVGA GeForce GTX 660 Ti Superclocked comes with EVGA’s standard 3 year transferable warranty, with individual 2 or 7 year extensions available for purchase upon registration, which will also unlock access to EVGA’s step-up upgrade program. Finally, the EVGA GeForce GTX 660 Ti Superclocked will be hitting retail with an MSRP of $309, $10 over the MSRP for reference cards.
Meet The Zotac GeForce GTX 660 Ti AMP! Edition
Our next GTX 660 Ti of the day is Zotac’s entry, the GeForce GTX 660 Ti AMP! Edition. As indicated by the AMP branding (and like the other cards in this review) it’s a factory overclocked card; in fact it has the highest factory overclock of all the cards we’re reviewing today, with both a core and memory overclock.
GeForce GTX 660 Ti Partner Card Specification Comparison | ||||||
GeForce GTX 660 Ti(Ref) | EVGA GTX 660 Ti Superclocked | Zotac GTX 660 Ti AMP! | Gigabyte GTX 660 Ti OC | |||
Base Clock | 915MHz | 980MHz | 1033MHz | 1033MHz | ||
Boost Clock | 980MHz | 1059MHz | 1111MHz | 1111MHz | ||
Memory Clock | 6008MHz | 6008MHz | 6608MHz | 6008MHz | ||
Frame Buffer | 2GB | 2GB | 2GB | 2GB | ||
TDP | 150W | 150W | 150W | ~170W | ||
Width | Double Slot | Double Slot | Double Slot | Double Slot | ||
Length | N/A | 9.5" | 7.5" | 10,5" | ||
Warranty | N/A | 3 Year | 3 Year + Life | 3 Year | ||
Price Point | $299 | $309 | $329 | $319 |
Zotac will be shipping the GeForce GTX 660 Ti AMP at 1033MHz for the base clock and 1111MHz for the boost clock. This represents a sizable 118MHz (13%) base overclock, and a 131MHz (13%) boost overclock. Meanwhile Zotac will be shipping their memory at 6.6GHz, a full 600MHz (10%) over the reference GTX 660 Ti. The latter overclock will stand to be very important, as we’ve already noted the GTX 660 Ti is starting off life as a memory bandwidth crippled card. Power consumption willing, the GTX 660 Ti AMP is in a good position to pick up at least 10% on performance relative to the reference GTX 660 Ti.
Like the EVGA card we just took a look at, Zotac’s GTX 660 Ti is based on NVIDIA’s reference board, so we’ll skip the details here. Rather than using a blower like EVGA however, Zotac is using an open air cooler – dubbed the dual silencer – that is well suited for a board of this length. The cooler uses a pair of 70mm fans, mounted over an aluminum heatsink that runs nearly the entire length of the card. Attaching the heatsink to the GPU itself is a trio of copper heatpipes, which transfer heat from the GPU to various points on the heatsink. Meanwhile the VRMs are cooled by a smaller, separate heatsink that fits under the primary heatsink; given the size and the location, it’s hard to say just how well this secondary heatsink is being cooled.
Altogether the card measures just 7.5” in length, an otherwise itty-bity card made just a bit longer thanks to some overhang from Zotac’s cooler. Zotac advertises their dual silencer as being 10C cooler and 10dB quieter than the competition, and while this may strictly be true when compared to some blowers, it’s not appreciably different than the dual-fan open air heatsinks that are extremely common on the market today. In fact among all of the cards we’re reviewing today this is unquestionably the most standard of them, as Zotac and several other NVIDIA partners will be shipping reference clocked cards built very similar to this. For this reason we’ll be using Zotac’s card as our reference card for the purpose of our testing.
Moving on, power and display connectivity is the same as with the GTX 670 and other cards using NVIDIA’s PCBs. This means 2 PCIe power sockets and 2 SLI connectors on the top, and 1 DL-DVI-D port, 1 DL-DVI-I port, 1 full size HDMI 1.4 port, and 1 full size DisplayPort 1.2 on the front.
Rounding out the package is the usual collection of molex power adapters and quickstart guides, along with a trial version of Trackmania Canyon. However the real star of the show as far as pack-in games goes will be Borderlands 2 through NVIDIA’s launch offer.
Wrapping things up, Zotac is attaching a $329 MSRP to the GeForce GTX 660 Ti AMP, which makes it a full $30 more expensive than reference-clocked cards and reflecting the greater factory overclock. This also makes it the most expensive card in today’s review by $10. Meanwhile for the warranty Zotac is offering a base 2 year warranty, which is extended to a rather generous full limited lifetime warranty upon registration of the card.
Meet The Gigabyte GeForce GTX 660 Ti OC
Our final GTX 660 Ti of the day is Gigabyte’s entry, the Gigabyte GeForce GTX 660 Ti OC. Unlike the other cards in our review today this is not a semi-custom card but rather a fully-custom card, which brings with it some interesting performance ramifications.
GeForce GTX 660 Ti Partner Card Specification Comparison | ||||||
GeForce GTX 660 Ti(Ref) | EVGA GTX 660 Ti Superclocked | Zotac GTX 660 Ti AMP! | Gigabyte GTX 660 Ti OC | |||
Base Clock | 915MHz | 980MHz | 1033MHz | 1033MHz | ||
Boost Clock | 980MHz | 1059MHz | 1111MHz | 1111MHz | ||
Memory Clock | 6008MHz | 6008MHz | 6608MHz | 6008MHz | ||
Frame Buffer | 2GB | 2GB | 2GB | 2GB | ||
TDP | 150W | 150W | 150W | ~170W | ||
Width | Double Slot | Double Slot | Double Slot | Double Slot | ||
Length | N/A | 9.5" | 7.5" | 10,5" | ||
Warranty | N/A | 3 Year | 3 Year + Life | 3 Year | ||
Price Point | $299 | $309 | $329 | $319 |
The big difference between a semi-custom and fully-custom card is of course the PCB; fully-custom cards pair a custom cooler with a custom PCB instead of a reference PCB. Partners can go in a few different directions with custom PCBs, using them to reduce the BoM, reduce the size of the card, or even to increase the capabilities of a product. For their GTX 660 Ti OC, Gigabyte has gone in the latter direction, using a custom PCB to improve the card.
On the surface the specs of the Gigabyte GeForce GTX 660 Ti OC are relatively close to our other cards, primarily the Zotac. Like Zotac Gigabyte is pushing the base clock to 1033MHz and the boost clock to 1111MHz, representing a sizable 118MHz (13%) base overclock and a 131MHz (13%) boost overclock respectively. Unlike the Zotac however there is no memory overclocking taking place, with Gigabyte shipping the card at the standard 6GHz.
What sets Gigabyte apart here in the specs is that they’ve equipped their custom PCB with better VRM circuitry, which means NVIDIA is allowing them to increase their power target from the GTX 660 Ti standard of 134W to an estimated 141W. This may not sound like much (especially since we’re working with an estimate on the Gigabyte board), but as we’ve seen time and time again GK104 is power-limited in most scenarios. A good GPU can boost to higher bins than there is power available to allow it, which means increasing the power target in a roundabout way increases performance. We’ll see how this works in detail in our benchmarks, but for now it’s good enough to say that even with the same GPU overclock as Zotac the Gigabyte card is usually clocking higher.
Moving on, Gigabyte’s custom PCB measures 8.4” long, and in terms of design it doesn’t bear a great resemblance to either the reference GTX 680 PCB nor the reference GTX 670 PCB; as near as we can tell it’s completely custom. In terms of design it’s nothing fancy – though like the reference GTX 670 the VRMs are located in the front – and as we’ve said before the real significance is the higher power target it allows. Otherwise the memory layout is the same as the reference GTX 660 Ti with 6 chips on the front and 2 on the back. Due to its length we’d normally insist on there being some kind of stiffener for an open air card, but since Gigabyte has put the GPU back far enough, the heatsink mounting alone provides enough rigidity to the card.
Sitting on top of Gigabyte’s PCB is a dual fan version of Gigabyte’s new Windforce cooler. The Windforce 2X cooler on their GTX 660 Ti is a bit of an abnormal dual fan cooler, with a relatively sparse aluminum heatsink attached to unusually large 100mm fans. This makes the card quite large and more fan than heatsink in the process, which is not something we’ve seen before.
The heatsink itself is divided up into three segments over the length of the card, with a pair of copper heatpipes connecting them. The bulk of the heatsink is over the GPU, while a smaller portion is at the rear and an even smaller portion is at the front, which is also attached to the VRMs. The frame holding the 100mm fans is then attached at the top, anchored at either end of the heatsink. Altogether this cooling contraption is both longer and taller than the PCB itself, making the final length of the card nearly 10” long.
Finishing up the card we find the usual collection of ports and connections. This means 2 PCIe power sockets and 2 SLI connectors on the top, and 1 DL-DVI-D port, 1 DL-DVI-I port, 1 full size HDMI 1.4 port, and 1 full size DisplayPort 1.2 on the front. Meanwhile toolless case users will be happy to see that the heatsink is well clear of the bracket, so toolless clips are more or less guaranteed to work here.
Rounding out the package is the usual collection of power adapters and a quick start guide. While it’s not included in the box or listed on the box, the Gigabyte GeForce GTX 660 Ti OC works with Gigabyte’s OC Guru II overclocking software, which is available on Gigabyte’s website. Gigabyte has had OC Guru for a number of years now, and with this being the first time we’ve seen OC Guru II we can say it’s greatly improved from the functional and aesthetic mess that defined the previous versions.
While it won’t be winning any gold medals, in our testing OC Guru II gets the job done. Gigabyte offers all of the usual tweaking controls (including the necessary power target control), along with card monitoring/graphing and an OSD. It’s only real sin is that Gigabyte hasn’t implemented sliders on their controls, meaning that you’ll need to press and hold down buttons in order to dial in a setting. This is less than ideal, especially when you’re trying to crank up the 6000MHz memory clock by an appreciable amount.
Wrapping things up, the Gigebyte GeForce GTX 660 Ti OC comes with Gigabyte’s standard 3 year warranty. Gigabyte will be releasing it at an MSRP of $319, $20 over the price of a reference-clocked GTX 660 Ti and $10 less than the most expensive card in our roundup today.
The First TXAA Game & The Test
With the release of the NVIDIA’s 304.xx drivers a couple of months ago, NVIDIA finally enabled driver support on Kepler for their new temporal anti-aliasing technology. First announced with the launch of Kepler, TXAA is another anti-aliasing technology to be developed by Timothy Lottes, an engineer in NVIDIA’s developer technology group. In a nutshell, TXAA is a wide tent (>1px) MSAA filter combined with a temporal filter (effectively a motion-vector based frame blend) that is intended to resolve that pesky temporal aliasing that can be seen in motion in many games.
Because TXAA requires MSAA support and motion vector tracking by the game itself, it can only be used in games that specifically implement it. Consequently, while NVIDIA had enabled driver support for it, they’ve been waiting on a game to be released that implements it. That release finally happened last week with a patch for the MMO The Secret World, which became the first game with TXAA support.
This isn’t meant to be an exhaustive review of TXAA (MMOs and deterministic testing are like oil and water), but seeing as how this is the first time TXAA has been enabled we did want to comment on it.
On the whole, what NVIDIA is trying to accomplish here is to implement movie-like anti-aliasing at a reasonable performance cost. Traditionally SSAA would be the solution here (just like it is to most other image aliasing problems), but of course SSAA is much too expensive most of the time. At its lower setting it is just 2x MSAA plus the temporal component, which makes the process rather cheap.
Unfortunately by gaming standards it’s also really blurry. This is due to the combination of the wide tent MSAA samples – which if you remember your history, ATI tried at one time – and the temporal filter blending data from multiple frames. TXAA does a completely fantastic job of eliminating temporal and other forms of aliasing, but it does so at a notable cost to image clarity.
Editorially speaking we’ll never disparage NVIDIA for trying new AA methods – it never hurts to try something new – however at the same time we do reserve the right to be picky. We completely understand the direction NVIDIA went with this and why they did it, especially since there’s a general push to make games more movie-like in the first place, but we’re not big fans of the outcome. You would be hard pressed to find someone that hates jaggies more than I (which is why we have SSAA in one of our tests), but as an interactive medium I have come to expect sharpness, sharpness that would make my eyes bleed. Especially when it comes to multiplayer games, where being able to see the slightest movement in the distance can be a distinct advantage.
To that end, TXAA is unquestionably an interesting technology and worth keeping an eye on in the future, but practically speaking AMD’s efforts to implement complex lighting cheaply on a forward renderer (and thereby making MSAA cheap and effective) are probably more relevant to improving the state of AA. But this is by no means the final word, and we’ll certainly revisit TXAA in detail in the future once it’s enabled on a game that offers a more deterministic way of testing image quality.
The Test
NVIDIA’s GTX 660 Ti launch drivers are 305.37, which are a further continuation of the 304.xx branch. Compared to the previous two 304.xx drivers there are no notable performance changes or bug fixes that we’re aware of.
Meanwhile on the AMD side we’re continuing to use the Catalyst 12.7 betas released back in late June. AMD just released Catalyst 12.8 yesterday, which appear to be a finalized version of the 12.7 driver.
On a final note, for the time being we have dropped Starcraft II from our tests. The recent 1.5 patch has had a notable negative impact on our performance (and disabled our ability to play replays without logging in every time), so we need to further investigate the issue and likely rebuild our entire collection of SC2 benchmarks.
CPU: | Intel Core i7-3960X @ 4.3GHz |
Motherboard: | EVGA X79 SLI |
Chipset Drivers: | Intel 9.2.3.1022 |
Power Supply: | Antec True Power Quattro 1200 |
Hard Disk: | Samsung 470 (256GB) |
Memory: | G.Skill Ripjaws DDR3-1867 4 x 4GB (8-10-9-26) |
Case: | Thermaltake Spedo Advance |
Monitor: |
Samsung 305T Asus PA246Q |
Video Cards: |
AMD Radeon HD 6970 AMD Radeon HD 7870 AMD Radeon HD 7950 AMD Radeon HD 7950B AMD Radeon HD 7970 NVIDIA GeForce GTX 560 Ti NVIDIA GeForce GTX 570 Zotac GeForce GTX 660 Ti AMP! Edition EVGA GeForce GTX 660 Ti Superclocked Gigabyte GeForce GTX 660 Ti OC NVIDIA GeForce GTX 670 |
Video Drivers: |
NVIDIA ForceWare 304.79 Beta NVIDIA ForceWare 305.37 AMD Catalyst 12.7 Beta |
OS: | Windows 7 Ultimate 64-bit |
Crysis: Warhead
Kicking things off as always is Crysis: Warhead. It’s no longer the toughest game in our benchmark suite, but it’s still a technically complex game that has proven to be a very consistent benchmark. Thus even four years since the release of the original Crysis, “but can it run Crysis?” is still an important question, and the answer continues to be “no.” While we’re closer than ever, full Enthusiast settings at a 60fps is still beyond the grasp of a single-GPU card.
For a $300 performance card the most important resolution is typically going to be 1920x1080/1200, however in some cases these cards should be able to cover 2560x1440/1600 at a reasonable framerate. To that end, we’ll be focusing on 1920x1200 for the bulk of our review.
Crysis has been a sore spot for NVIDIA since the launch of Kepler, and GTX 660 Ti doesn’t improve this. Since it’s a memory bandwidth constrained game and GTX 660 Ti takes away 25% of GK104’s memory bandwidth, the result is a predictable drop in performance. The GTX 660 Ti only reaches 80% of the GTX 670’s performance here, which is only a bit more than our worst case scenario of 75%. At 38.8fps it’s playable, but it’s definitely not a great experience. So for anyone wanting to partake in this classic, an AMD card is the way to go and it doesn’t matter which; even the 7870 is marginally faster.
As for our factory overclocked cards from Zotac, EVGA, and Gigabyte, while they improve the situation they don’t do so by a great deal. Unexpectedly, despite its memory bandwidth advantage the Gigabyte card actually edges out the Zotac card here, due to the former’s higher power target allowing it to boost to higher clockspeeds. Still that’s only a 4% improvement, far below what these kinds of overclocks are really capable of hitting.
Looking at minimum framerates is even more grim; the GTX 660 Ti is experiencing its worst case scenario. Crysis, Kepler, and low memory bandwidth are a very bad combination here. As for the factory overclocked cards, the Zotac card finally takes the lead thanks to its memory overclock, but like our average framerates in Crysis it’s not a particularly big jump.
Metro: 2033
Paired with Crysis as our second behemoth FPS is Metro: 2033. Metro gives up Crysis’ lush tropics and frozen wastelands for an underground experience, but even underground it can be quite brutal on GPUs, which is why it’s also our new benchmark of choice for looking at power/temperature/noise during a game. If its sequel due this year is anywhere near as GPU intensive then a single GPU may not be enough to run the game with every quality feature turned up.
Metro is another game that has been favoring AMD’s GCN architecture over NVIDIA’s Kepler architecture, albeit less so than Crysis. The results still have the GTX 660 Ti struggling, but now it’s at least clearly ahead of the 7870, beating it by 7%. The 7950 however leads by similar 7%, which means the GTX 660 Ti is splitting the difference and is not competitive enough in this game.
The factory overclocked cards on the other hand show us that there is some hope for GTX 660 Ti, even with its memory bus castration. Both the Gigabyte and Zotac cards can edge out the 7950, an important distinction since they need to justify costing about as much as a 7950. At the same time this is part of the reason why AMD felt the need to do the 7950B, since only the B can outpace overclocked GTX 660 Ti cards.
DiRT 3
For racing games our racer of choice continues to be DiRT, which is now in its 3rd iteration. Codemasters uses the same EGO engine between its DiRT, F1, and GRID series, so the performance of EGO has been relevant for a number of racing games over the years.
With AMD’s recent performance boost in DiRT 3 the game has largely equalized AMD and NVIDIA, and at no point is this better demonstrated than with the launch of the GTX 660 Ti. At 1920 the GTX 660 Ti and 7950 are virtually tied, while the GTX 660 Ti falls behind at 2560 as it runs out of memory bandwidth.
Meanwhile the factory overclocked cards see some modest gains here, with the Gigabyte pulling ahead of the others with a 9% performance increase. In fact it’s only a few percent off of the GTX 670, which may be a reflection of the importance of the power target since the two also have similar a similar power draw.
Looking at the minimum framerates that 660/7950 standoff finally breaks in NVIDIA’s favor. Though a lack of memory bandwidth continues to pose a problem at 2560.
Total War: Shogun 2
Total War: Shogun 2 is the latest installment of the long-running Total War series of turn based strategy games, and alongside Civilization V is notable for just how many units it can put on a screen at once. As it also turns out, it’s the single most punishing game in our benchmark suite (on higher end hardware at least).
Going into this benchmark we weren’t sure just how sensitive Shogun 2 would be to the GTX 660 Ti’s lack of memory bandwidth relative to the GTX 670; the answer as it turns out is “not very much”. At 1920 the GTX 660 Ti is 4% ahead of the 7950 and only 6% behind the GTX 670. Even at the much more demanding 2560 the GTX 660 Ti falls behind by a bit more, but 12% is still better than what we were seeing with Crysis a few minutes ago. Overall this has become a fairly NVIDIA-friendly benchmark, with the GTX 660 Ti challenging even the 7970 at 1920.
As for our factory overclocked cards, Gigabyte is once again in the lead and once again ahead of even the GTX 670. This is followed by the Zotac, and then with the weakest overclock the EVGA. The performance impact of these overclocks ranges from between 9% at the high end and 3% at the low end.
Batman: Arkham City
Batman: Arkham City is loosely based on Unreal Engine 3, while the DirectX 11 functionality was apparently developed in-house. With the addition of these features Batman is far more a GPU demanding game than its predecessor was, particularly with tessellation cranked up to high.
Both AMD and NVIDIA recently saw significant performance improvements here due to driver tweaks, which has shuffled the deck to a degree. With full memory bandwidth GK104 cards do well here, but this is not the case for the GTX 660 Ti. Next to Crysis this ends up being the second worst game for the GTX 660 Ti if you’re comparing it to the GTX 670, as the GTX 660 Ti trails by 19%. This also means that the GTX 660 Ti falls behind the 7950 by a couple percent.
The factory overclocked cards on the other hand offer more evidence that the GTX 660 Ti can recover with a bit more memory bandwidth, which is exactly what we’re seeing with the Zotac card and its 10% memory overclock. Altogether it’s a full 10% faster than the reference GTX 660 Ti, and 6% faster than the next-fastest factory overclocked card. If Zotac wants to justify their $30 premium then they’ll need more games like this, since it makes the card clearly stand apart from the others and makes it more than competitive with the 7950.
On a final note, we haven’t been paying too much attention to past-generation cards up to this point, but it’s worth taking a breather and reflecting upon the situation. So far the GTX 660 Ti is faster than the GTX 570, but it’s not amazingly so. Against the GTX 560 Ti it looks much better, but then it’s a matter of replacing a card that launched at $250 with one that launched at $300.
Portal 2
Portal 2 continues the long and proud tradition of Valve’s in-house Source engine. While Source continues to be a DX9 engine, Valve has continued to upgrade it over the years to improve its quality, and combined with their choice of style you’d have a hard time telling it’s over 7 years old at this point. Consequently Portal 2’s performance does get rather high on high-end cards, but we have ways of fixing that…
The GTX 660 Ti is still fast enough for us to use SSAA, so that’s where we’ll focus today. At 1920 it can deliver 94fps with 4x SSAA, and even at 2560 it’s just shy of 60fps. Whatever NVIDIA changed with Kepler to improve its SSAA performance made it a monster here, which is made all the more impressive by the fact that we’re dealing with a memory bandwidth and ROP reduced version. It would appear the biggest bottleneck for SSAA performance here is the shaders, which makes this a near-ideal scenario for the GTX 660 Ti.
Overall the entire Radeon 7900 series falls to the GTX 660 Ti by a greater than 20% margin here, and even the GTX 570 is more than humbled. If NVIDIA could do this on more games then they’d be in an excellent position.
Battlefield 3
Its popularity aside, Battlefield 3 may be the most interesting game in our benchmark suite for a single reason: it’s the first AAA DX10+ game. It’s been 5 years since the launch of the first DX10 GPUs, and 3 whole process node shrinks later we’re finally to the point where games are using DX10’s functionality as a baseline rather than an addition. Not surprisingly BF3 is one of the best looking games in our suite, but as with past Battlefield games that beauty comes with a high performance cost.
The reduction in memory bandwidth and ROP throughput coming from the GTX 670 comes with roughly an 11% performance cost here, just about splitting the difference between the best and worst case scenarios. This is important for the GTX 660 Ti since it means the card doesn’t surrender NVIDIA’s performance advantage in BF3. At 1920 with FXAA that means the GTX 660 Ti has a huge 30% performance lead over the 7950, and even the 7970 falls behind the GTX 660 Ti. The only real disappointment here is that 1920 with MSAA isn’t quite playable – 53fps means that framerates will bottom out in the mid-20s, which isn’t desirable.
Meanwhile the factory overclocked cards continue to up the ante, and ends up being another game that factory overclocks offer a decent improvement. Zotac tops the factory cards at 10%, followed by Gigabyte and EVGA. We’re once again seeing the impact of Zotac’s memory overclock, and how in memory bandwidth limited situations it’s more important than Gigabyte’s higher power target, though Gigabyte does come close.
The Elder Scrolls V: Skyrim
Bethesda's epic sword & magic game The Elder Scrolls V: Skyrim is our RPG of choice for benchmarking. It's altogether a good CPU benchmark thanks to its complex scripting and AI, but it also can end up pushing a large number of fairly complex models and effects at once, especially with the addition of the high resolution texture pack.
For every Portal 2 you have a Skyrim, it seems. At 1920 the GTX 660 Ti actually does well for itself here, besting the 7900 series, but we know from experience that this is a CPU limited resolution. Cranking things up to 2560 and Ultra quality sends the GTX 660 Ti through the floor in a very bad way, pushing it well below even the 7870, never mind the 7900 series.
Altogether the GTX 660 Ti achieves about 80% of the performance of the GTX 670, making this another game that is hurt badly by the loss of a ROP block and memory bandwidth. At the same time the GTX 670 is the first NVIDIA card fast enough to compete with the 7950, so the GTX 660 Ti came into this benchmark with unfavorable odds. 68fps is more than playable, but hardcore Skyrim players are going to want to stick to cards with more memory bandwidth. At best, the bright spot for NVIDIA is that the GTX 660 Ti is nearly 100% faster than the GTX 560 Ti, a remarkable improvement, but also one being fueled by the meager 1GB of VRAM the latter has.
As for our factory overclocked cards, this is another case of Zotac leading the pack. Its memory overclock is exactly what’s needed to counter the GTX 660 Ti’s lack of memory bandwidth, which helps it easily clear EVGA and Gigabyte’s cards.
Civilization V
Our final game, Civilization 5, gives us an interesting look at things that other RTSes cannot match, with a much weaker focus on shading in the game world, and a much greater focus on creating the geometry needed to bring such a world to life. In doing so it uses a slew of DirectX 11 technologies, including tessellation for said geometry, driver command lists for reducing CPU overhead, and compute shaders for on-the-fly texture decompression.
Amusingly enough our final game sees the GTX 660 Ti and 7950 tied at roughly 67fps. If you want a brief summary of where this is going, there you go. Though the fact that the GTX 660 Ti actually increases its lead at 2560 is unexpected.
Finally, our factory overclocked cards offer mixed results. The Gigabyte card is once again in the lead, indicating that Civilization V isn’t particularly memory bandwidth bound as opposed to shader/texture bound. This leaves the Zotac card and finally the EVGA card bringing up the rear. At the same time the Gigabyte card only improves by 8%, which is less than some of the improvements we’ve seen in other games.
Compute Performance
Shifting gears, as always our final set of real-world benchmarks is a look at compute performance. As we have seen with GTX 680 and GTX 670, GK104 appears to be significantly less balanced between rendering and compute performance than GF110 or GF114 were, and as a result compute performance suffers. Cache and register file pressure in particular seem to give GK104 grief, which means that GK104 can still do well in certain scenarios, but falls well short in others. For GTX 660 Ti in particular, this is going to be a battle between the importance of shader performance – something it has just as much of as the GTX 670 – and cache/memory pressure from losing that ROP cluster and cache.
Our first compute benchmark comes from Civilization V, which uses DirectCompute to decompress textures on the fly. Civ V includes a sub-benchmark that exclusively tests the speed of their texture decompression algorithm by repeatedly decompressing the textures required for one of the game’s leader scenes. Note that this is a DX11 DirectCompute benchmark.
For Civilization V memory bandwidth and cache are clearly more important than raw compute performance in this test. Although this isn’t a worst case scenario outcome for the GTX 660 Ti, it drops substantially from the GTX 670. As a result its compute performance is barely better than the GTX 560 Ti, which wasn’t a strong performer at compute in the first place.
Our next benchmark is SmallLuxGPU, the GPU ray tracing branch of the open source LuxRender renderer. We’re now using a development build from the version 2.0 branch, and we’ve moved on to a more complex scene that hopefully will provide a greater challenge to our GPUs.
Ray tracing likes memory bandwidth and cache, which means another tough run for the GTX 660 Ti. In fact it’s now slower than the GTX 560 Ti. Compared to the 7950 this isn’t even a contest. GK104 is generally bad at compute, and GTX 660 Ti is turning out to be especially bad.
For our next benchmark we’re looking at AESEncryptDecrypt, an OpenCL AES encryption routine that AES encrypts/decrypts an 8K x 8K pixel square image file. The results of this benchmark are the average time to encrypt the image over a number of iterations of the AES cypher.
The GTX 660 Ti does finally turn things around on our AES benchmark, thanks to the fact that it generally favors NVIDIA. At the same time the gap between the GTX 670 and GTX 660 Ti is virtually non-existent.
Our fourth benchmark is once again looking at compute shader performance, this time through the Fluid simulation sample in the DirectX SDK. This program simulates the motion and interactions of a 16k particle fluid using a compute shader, with a choice of several different algorithms. In this case we’re using an (O)n^2 nearest neighbor method that is optimized by using shared memory to cache data.
The compute shader fluid simulation provides the GTX 660 Ti another bit of reprieve, although like other GK104 cards it’s still relatively weak. Here it’s virtually tied with the GTX 670 so it’s clear that it isn’t being impacted by cache or memory bandwidth losses, but it needs about 10% more to catch the 7950.
Finally, we’re adding one last benchmark to our compute run. NVIDIA and the Folding@Home group have sent over a benchmarkable version of the client with preliminary optimizations for GK104. Folding@Home and similar initiatives are still one of the most popular consumer compute workloads, so it’s something NVIDIA wants their GPUs to do well at.
Interestingly Folding @ Home proves to be rather insensitive to the differences between the GTX 670 and GTX 660 Ti, which is not what we would have expected. The GTX 660 Ti isn’t doing all that much better than the GTX 570, once more reflecting that GK104 is generally struggling with compute performance, but it’s not a bad result.
Synthetics
We’ll also take a quick look at synthetic performance to see if the reduction in ROP performance, L2 cache, and memory bandwidth had any kind of other impact we haven’t anticipated. We’ll start with 3DMark Vantage’s Pixel Fill test.
Assuming you have enough ROP throughput, the pixel fill test is about memory bandwidth. Unsurprisingly, the GTX 660 Ti is around 20% behind the GTX 670 here.
Our second test is 3DMark’s Texel Fill test, which as expected is insensitive to anything going on outside of the SMXes. The GTX 670 and GTX 660 Ti are tied here, reflecting the fact that they have equal theoretical texture throughput.
Our third theoretical test is the set of settings we use with Microsoft’s Detail Tessellation sample program out of the DX11 SDK
To be honest we’re not quite sure why there’s such a performance drop here relative to the GTX 670. On paper the geometry performance of the two should be identical. Either we’re ROP limited (this test does draw a lot of pixels at those framerates), or it really likes memory bandwidth. Regardless it’s an odd state of affairs to see NVIDIA losing a tessellation test to the 7870, considering how lopsided things were a year ago.
Our final theoretical test is Unigine Heaven 2.5, a benchmark that straddles the line between a synthetic benchmark and a real-world benchmark as the engine is licensed but no notable DX11 games have been produced using it yet.
Heaven proves to be rather sensitive to the ROP and memory changes. The performance hit is just shy of 20%.
Power, Temperature, & Noise
As always, we’re wrapping up our look at a video card’s stock performance with a look at power, temperature, and noise. Like we discussed in the introduction, while the official TDP of the GTX 660 Ti is 150W – 20W lower than the GTX 670 – the power target difference is only 7W. So let’s see which is more accurate, and how that compares to AMD’s cards.
GeForce GTX 660 Ti Voltages | ||||
Zotac GTX 660 Ti Boost Load | EVGA GTX 660 Ti Boost Load | Gigabyte GTX 660 Ti Boost Load | ||
1.175v | 1.162v | 1.175v |
Stopping to take a quick look at voltages, there aren’t any big surprises here. NVIDIA would need to maintain the same voltages as the GTX 670 because of the identical clocks and SMX count, and that’s exactly what has happened. In fact all single-GPU GK104 cards are topping out at 1.175v, NVIDIA’s defined limit for these cards. Even custom cards like the Gigabyte still only get to push 1.175v.
Up next, before we jump into our graphs let’s take a look at the average core clockspeed during our benchmarks. Because of GPU boost the boost clock alone doesn’t give us the whole picture – particularly when also taking a look at factory overclocked cards – we’ve recorded the clockspeed of our video cards during each of our benchmarks when running them at 2560x1600 and computed the average clockspeed over the duration of the benchmark. Unfortunately we then deleted the results for the factory overclocked cards, so we only have the “reference” card. Sorry about that guys.
GeForce GTX 600 Series Average Clockspeeds | |||||||
GTX 670 | GTX 660 Ti | Zotac GTX 660 Ti | EVGA GTX 660 Ti | Gigabyte GTX 660 Ti | |||
Max Boost Clock | 1084MHz | 1058MHz | 1175MHz | 1150MHz | 1228MHz | ||
Crysis | 1057MHz | 1058MHz | N/A | ||||
Metro | 1042MHz | 1048MHz | |||||
DiRT 3 | 1037MHz | 1058MHz | |||||
Shogun 2 | 1064MHz | 1035MHz | |||||
Batman | 1042MHz | 1051MHz | |||||
Portal 2 | 988MHz | 1041MHz | |||||
Battlefield 3 | 1055MHz | 1054MHz | |||||
Skyrim | 1084MHz | 1045MHz | |||||
Civilization V | 1038MHz | 1045MHz |
The average clockspeeds on our “reference” GTX 660 Ti don’t end up fluctuating all that much. With a max boost of 1058 the card actually gets to run at its top bin in a few of our tests, and it isn’t too far off in the rest. The lowest is 1035 for Shogun 2, and that’s only an average difference of 22MHz. The GTX 670 on the other hand had a wider range; a boon in some games and a bane in others. If nothing else, it means that despite the identical base and boost clocks, our cards aren’t purely identical at all times thanks to the impact of GPU boost pulling back whenever we reach our power target.
There are no great surprises with idle power consumption. Given the immense similarity between the GTX 670 and GTX 660 Ti, they end up drawing the same amount of power both during idle and long idle. This does leave AMD with an 8W-10W lead at the wall in this test though.
Moving on to our load power tests we start with Metro: 2033. As we mentioned previously the GTX 660 Ti and GTX 670 have very similar power targets, and this benchmark confirms that. Power consumption for the GTX 660 Ti is virtually identical to the Radeon HD 7870, an interesting matchup given the fact that this is the first time NVIDIA has had to compete with Pitcairn. Pitcairn’s weaker compute performance means it starts off in a better position, but it looks like even with a salvaged GK104 NVIDIA can still compete with it. NVIDIA drove efficiency hard this generation; to compete with a smaller chip like that is certainly a testament to that efficiency.
As for the inevitable 7950 comparison, it’s no contest. The GTX 670 was already doing well here and the GTX 660 Ti doesn’t change that. Tahiti just can’t match GK104’s gaming efficiency, which is why AMD has had to push performance over power with the new 7950B.
Meanwhile it’s fascinating to see that the GTX 660 Ti has lower power consumption than the GTX 560 Ti, even though the latter has the advantage of lower CPU power consumption due to its much lower performance in Metro. Or better yet, just compare the GTX 660 Ti to the outgoing GTX 570.
For AMD/NVIDIA comparisons we have a bit less faith in our OCCT results than we do our Metro results right now, as NVIDIA and AMD seem to clamp their power consumption differently. NVIDIA’s power consumption clamp through GPU Boost is far softer than AMD’s PowerTune. As a result the 7870 consumes 25W less than the GTX 660 Ti here, which even with AMD’s very conservative PowerTune rating seems like quite the gap. Metro seems to be much more applicable here, at least when you’re dealing with cards that have similar framerates.
In any case, compared to NVIDIA’s lineup this is another good showing for the GTX 660 Ti. Power consumption at the wall is 45W below the GTX 560 Ti, a large difference thanks to the latter’s lack of power throttling technology.
As for our factory overclocked cards, these results are consistent with our expectations. Among the Zotac and EVGA cards there’s a few watts of flutter at best, seeing as how they have the same power target of 134W. Meanwhile the Sapphire card with its higher power target is 20W greater at the wall, which indicates that our estimated power target of 141W for that card is a bit too low. However this also means that those times where the Gigabyte card was winning, it was also drawing around 20W more than its competition, which is a tradeoff in and of itself.
Moving on to temperatures, at 31C the GTX 660 Ti is once more where we’d expect it to be given the similarities to the GTX 670. Open air coolers tend to do a bit better here than blowers though, so the fact that it’s only 1C cooler than the blower-type GTX 670 is likely a reflection on Zotac’s cooler.
Speaking of factory overclocked video cards, one card stands out above the rest: the Gigabyte GTX 660 Ti. That oversized cooler does its job and does it well, keeping the GPU down to barely above room temperature.
Considering that most of our high-end cards are blowers while our “reference” GTX 660 Ti is an open air cooler, temperature benchmarks are the GTX 660 Ti’s to win, and that’s precisely what’s going on. 67C is nice and cool too, which means that the open air coolers should fare well even in poorly ventilated cases.
As usual we see a rise in temperatures when switching from Metro to OCCT, but at 73C the GTX 660 Ti is still the coolest reference (or semi-reference) card on the board. To be honest we had expected that it would beat the 7870, but as far as blowers go the 7870’s is quite good.
Moving on to our factory overclocked cards, we’re seeing the usual divisions between open air coolers and blowers. The blower-based EVGA card performs almost identically to the GTX 670, which makes sense given the similarities between the cards. Meanwhile the open air Zotac and Gigabyte cards are neck-and-neck here, indicating that both cards are shooting for roughly the same temperatures, keeping themselves below 70C. Though it’s somewhat weird to see the factory overclocked Zotac card end up being cooler than its reference-clocked self; this appears to be a product of where the fan curve is being hit.
Last but not least we have our look at noise, where we’ll hopefully be able to fully shake out our factory overclocked cards.
Right off the bat we see the blower-based EVGA struggle, which was unexpected. It’s basically the same cooler as the GTX 670, so it should do better. Then again the EVGA GTX 670 SC had the same exact problem.
As for Metro, the GTX 660 Ti once again looks good. 48.2 isn’t the best for an open air cooler, but it’s a hair quieter than the 7870 and notably quieter than the GTX 670. The only unfortunate part about these results is that it just can’t beat the GTX 560 Ti; in fact nothing can. For its power consumption the GTX 560 Ti was an almost unreal card, but it’s still a shame the GTX 660 Ti can’t be equally unreal.
Moving on to our factory overclocked cards however, the Gigabyte GTX 660 Ti OC gets very close thanks to its very large cooler. 43.7dB technically isn’t silent, but it just as well should be. To offer the performance of a GTX 660 Ti (and then some) in such a package is quite the accomplishment.
As for Zotac and EVGA, there’s nothing bad about either of them but there’s also nothing great. EVGA’s card is about average for a blower, while Zotac’s card seems to be suffering from its size. It’s a relatively tiny card with a relatively tiny cooler, and this has it working harder to hit its temperature targets.
Finally we have noise testing with OCCT. Our “reference” GTX 660 Ti actually fares a bit worse than the GTX 670, which is unfortunate. So much of this test comes down to the cooler though that it’s almost impossible to predict how other cards will perform. At least it’s no worse than the 7870.
Meanwhile the Gigabyte GTX 660 Ti OC continues to impress. 43.7dB not only means that it didn’t get any louder switching from Metro to OCCT, but it has now bested the GTX 560 Ti thanks to the 560’s lack of power throttling technology. Make no mistake, 43.7dB for this kind of performance is very, very impressive.
As for EVGA and Zotac, it’s also a rehash of Metro. EVGA’s blower is actually over 1dB quieter than Zotac’s cooler, which is an unfortunate outcome for an open air cooler.
Wrapping things up, even without a true reference sample from NVIDIA it’s clear that the GTX 660 Ti has a lot of potential when it comes to power/temp/noise. Compared to other cards it’s roughly equivalent in power consumption and noise to the 7870, which for NVIDIA is an important distinction since it’s also notably faster than the 7870, so NVIDIA is on a better place on the power/performance curve. This goes for not only the 7870, but especially the 7950, where the GTX 660 Ti continues the tradition the GTX 670 already set, which will see the GTX 660 Ti being cooler, quieter, and less power hungry than AMD’s entry-level Tahiti part.
But it must be pointed out that the lack of a reference design for the GTX 660 Ti means buyers are also facing a lot of variability. Power consumption should be consistent between cards – which is to say a hair less than the GTX 670 – but temperature and especially noise will vary on a card by card basis. Potential buyers would best be served by checking out reviews ahead of time to separate the duds from the gems.
OC: Power, Temperature, & Noise
Our final task is our look at the overclocking capabilities of our GTX 660 Ti cards. Based on what we’ve seen thus far with GTX 660 Ti, these factory overclocked parts are undoubtedly eating into overclocking headroom, so we’ll have to see just what we can get out of them. The very similar GTX 670 topped out at around 1260MHz for the max boost clock, and between 6.6GHz and 6.9GHz for the memory clock.
GeForce 660 Ti Overclocking | |||||
EVGA GTX 660 Ti SC | Zotac GTX 660 Ti AMP | Gigabyte GTX 660 Ti OC | |||
Shipping Core Clock | 980MHz | 1033MHz | 1033MHz | ||
Shipping Max Boost Clock | 1150MHz | 1175MHz | 1228MHz | ||
Shipping Memory Clock | 6GHz | 6.6GHz | 6GHz | ||
Shipping Max Boost Voltage | 1.175v | 1.175v | 1.175v | ||
Overclock Core Clock | 1030MHz | 1033MHz | 1083MHz | ||
Overclock Max Boost Clock | 1200MHz | 1175MHz | 1278MHz | ||
Overclock Memory Clock | 6.5GHz | 6.8GHz | 6.6GHz | ||
Overclock Max Boost Voltage | 1.175v | 1.175v | 1.175v |
As we suspected, starting with factory overclocked cards isn’t helping here. Our Zotac card wouldn’t accept any kind of meaningful GPU core overclock, so it shipped practically as fast as it could go. We were able to squeeze out another 200MHz on the memory clock though.
Meanwhile our EVGA and Gigabyte cards fared slightly better. We could push another 50MHz out of their GPU clocks, bringing us to a max boost clock of 1200MHz on the EVGA card and 1278MHz on the Gigabyte card. Memory overclocking was similarly consistent; we were able to hit 6.5GHz on the EVGA card and 6.6GHz on the Gigabyte card.
Altogether these are sub-5% GPU overclocks, and at best 10% memory overclocks, which all things considered are fairly low overclocks. The good news is that reference-clocked cards should fare better since their headroom has not already been consumed by factory overclocking, but binning also means the best cards are going to be going out as factory overclocked models.
Moving on to our performance charts, we’re going to once again start with power, temperature, and noise, before moving on to gaming performance.
Unsurprisingly, given the small power target difference between the GTX 670 and the GTX 660 Ti, any kind of overclocking that involves raising the power target quickly pushes power consumption past the GTX 670’s power consumption. How much depends on the test and the card, with the higher power target Gigabyte card starting with a particular disadvantage here as its power consumption ends up rivaling that of the GTX 680.
We also see the usual increase in load temperatures due to the increased power consumption. The Zotac and Gigabyte cards fare well enough due to their open air coolers, but the blower-type EVGA card is about as high as we want to go at 80C under OCCT.
Last but not least, looking at noise levels we can see an increase similar to the temperature increases we just saw. For the Zotac and EVGA cards noise levels are roughly equal with the reference GTX 680, which will be important to remember for when we’re looking at performance. Meanwhile the Gigabyte card continues to shine in these tests thanks to its oversized cooler; even OCCT can only push it to 46.8dB.
OC: Gaming Performance
We’ll keep the running commentary short here, but because of the wide range of max boost clocks and max memory clocks on our cards our overclocking results are fairly widespread between the three cards. Memory bandwidth remains the most important thing here due to GTX 660 TI’s missing memory controller, and thus far that’s been a luck of the draw matter among all GTX 600 series cards.
Final Words
Bringing the review to a close, it should come as no surprise that the launch of the GTX 660 Ti has ended up being a lot like the launches before it. Yet at the same time it’s not truly identical, as there’s a lot going on that makes it nothing like the launches before it.
Distilled to its essence, the GTX 660 Ti is yet another fine addition to the GTX 600 series thanks to the GK104 GPU. Compared to the GTX 670 it’s a bit slower, a lot cheaper, and still brutally efficient. For buyers who have wanted to pick up a Kepler card but have found the high-end GTX 670 and GTX 680 out of their price range, at $300 the GTX 660 Ti is at a much more approachable point on the price-performance curve, offering about 88% of the GTX 670’s performance for 75% of the price. Given the price of Kepler cards so far this is definitely a better deal, though it’s still by no means cheap. So in that respect the launch of the GTX 660 Ti is quite a lot like the launches before it.
What’s different about this launch compared to the launches before it is that AMD was finally prepared; this isn’t going to be another NVIDIA blow-out. While the GTX 680 marginalized the Radeon HD 7970 virtually overnight, and then the GTX 670 did the same thing to the Radeon HD 7950, the same will not be happening to AMD with the GTX 660 Ti. AMD has already bracketed the GTX 660 Ti by positioning the 7870 below it and the 7950 above it, putting them in a good position to fend off NVIDIA.
As it stands, AMD’s position correctly reflects their performance; the GTX 660 Ti is a solid and relatively consistent 10-15% faster than the 7870, while the 7950 is anywhere between a bit faster to a bit slower depending on what benchmarks you favor. Of course when talking about the 7950 the “anything but equal” maxim still applies here, if not more so than with the GTX 670. The GTX 660 Ti is anywhere between 50% ahead of the 7950 and 25% behind it, and everywhere in between.
Coupled with the tight pricing between all of these cards, this makes it very hard to make any kind of meaningful recommendation here for potential buyers. Compared to the 7870 the GTX 660 Ti is a solid buy if you can spare the extra $20, though it’s not going to be a massive difference. The performance difference is going to be just enough that AMD is going to need to trim prices a bit more to secure the 7870’s position.
On the other hand due to the constant flip-flopping of the GTX 660 Ti and 7950 on our benchmarks there is no sure-fire recommendation to hand down there. If we had to pick something, on a pure performance-per-dollar basis the 7950 looks good both now and in the future; in particular we suspect it’s going to weather newer games better than the GTX 660 Ti and its relatively narrow memory bus. But the moment efficiency and power consumption start being important the GTX 660 Ti is unrivaled, and this is a position that is only going to improve in the future when 7950B cards start replacing 7950 cards. For reasons like that there are a couple of niches one card or another serves particularly well, such as overclocking with the 7950, but ultimately unless you have a specific need either card will serve you well enough.
But enough about competition, let’s talk about upgrades for a moment. As we mentioned in our discussion on pricing, performance cards are where we see the market shift from rich enthusiasts who buy cards virtually every generation to more practical buyers who only buy every couple of generations. For these groups it’s a mixed bag. The GTX 660 Ti is actually a great upgrade for the GTX 560 Ti (and similar cards) from a performance standpoint, but despite the similar name it can’t match the GTX 560 Ti’s affordability. This entire generation has seen a smaller than normal performance increase at the standard price points, and the GTX 660 Ti doesn’t change this. If you’re frugal and on Fermi, you’re probably going to want to wait for whatever comes next. On the other hand performance is finally reaching a point where it’s getting very hard to hold on to GTX 200 series cards, especially as the lack of memory on those sub-1GB products becomes more and more prominent. The GTX 660 Ti can clobber any GTX 200, and it can do so with far less power and noise.
Finally, let’s discuss the factory overclocked cards we’ve seen today. Thanks to the fact that this is a virtual launch there’s an incredible variety of cards to pick from, with all of the major partners launching multiple cards with both the reference clocks and with factory overclocks. We’ve only been able to take a look at 3 of those cards today, but so far we like what we’re seeing.
Right now the partner card most likely to turn heads is Gigabyte’s GeForce GTX 660 Ti OC. Even if you ignore the overclock for a second it’s a GTX 660 Ti with an oversized cooler, which ends up being used to great effect. Thanks to Gigabyte’s Windforce 2X cooler it’s both cool and silent, which is always a great combination. Meanwhile the factory overclock alongside the higher power target is icing on the cake, although the lack of a memory bandwidth overclock means that the cooler is more valuable than the overclock.
But if you want something quite a bit smaller and generally a bit faster still, Zotac’s GeForce GTX 660 Ti AMP is no slouch. The memory overclock really makes up for GTX 660 Ti’s memory bandwidth shortcomings, and the size means it will fit into even small cases rather well. Its only downsides are that the $329 price tag puts it solidly in 7950 territory, and that the cooler is very average, especially when held up against what Gigabyte has done.
Finally there’s EVGA’s GeForce GTX 660 Ti Superclocked. The overclock is nothing to write home about – being just enough to justify the $10 price increase – but it’s otherwise a solid card. Even for 150W cards there’s still a need for blower type coolers, and EVGA will do a good job of filling that niche with their card.