Original Link: https://www.anandtech.com/show/4221/nvidias-gtx-550-ti-coming-up-short-at-150
NVIDIA's GeForce GTX 550 Ti: Coming Up Short At $150
by Ryan Smith on March 15, 2011 9:00 AM ESTThroughout the lifetime of the 400 series, NVIDIA launched 4 GPUs: GF100, GF104, GF106, and GF108. Launched in that respective order, they became the GTX 480, GTX 460, GTS 450, and GT 430. One of the interesting things from the resulting products was that with the exception of the GT 430, NVIDIA launched each product with a less than fully populated GPU, shipping with different configurations of disabled shaders, ROPs, and memory controllers. NVIDIA has never fully opened up on why this is – be it for technical or competitive reasons – but ultimately GF100/GF104/GF106 never had the chance to fully spread their wings as 400 series parts.
It’s the 500 series that has corrected this. Starting with the GTX 580 in November of 2010, NVIDIA has been launching GPUs built on a refined transistor design with all functional units enabled. Coupled with a hearty boost in clockspeed, the performance gains have been quite notable given that this is still on the same 40nm process with a die size effectively unchanged. Thus after GTX 560 and the GF114 GPU in January, it’s time for the 3rd and final of the originally scaled down Fermi GPUs to be set loose: GF106. Reincarnated as GF116, it’s the fully enabled GPU that powers NVIDIA’s latest card, the GeForce GTX 550 Ti.
GTX 560 Ti | GTX 460 768MB | GTX 550 Ti | GTS 450 | |
Stream Processors | 384 | 336 | 192 | 192 |
Texture Address / Filtering | 64/64 | 56/56 | 32/32 | 32/32 |
ROPs | 32 | 24 | 24 | 16 |
Core Clock | 822MHz | 675MHz | 900MHz | 783MHz |
Shader Clock | 1644MHz | 1350MHz | 1800MHz | 1566MHz |
Memory Clock | 1002Mhz (4.008GHz data rate) GDDR5 | 900Mhz (3.6GHz data rate) GDDR5 | 1026Mhz (4.104GHz data rate) GDDR5 | 902Mhz (3.608GHz data rate) GDDR5 |
Memory Bus Width | 256-bit | 192-bit | 192-bit | 128-bit |
RAM | 1GB | 768MB | 1GB | 1GB |
FP64 | 1/12 FP32 | 1/12 FP32 | 1/12 FP32 | 1/12 FP32 |
Transistor Count | 1.95B | 1.95B | 1.17B | 1.17B |
Manufacturing Process | TSMC 40nm | TSMC 40nm | TSMC 40nm | TSMC 40nm |
Price Point | $249 | ~$130 | $149 | ~$90 |
Out of the 3 scaled down 400 series cards, GTS 450 was always the most unique in how NVIDIA went about it. GF100 and GF104 disabled Streaming Multiprocessors (SMs), which housed and therefore cut down on the number of CUDA Cores/SPs and Polymorph Engines. However for GTS 450, NVIDIA instead chose to disable a ROP/memory block, giving GTS 450 the full shader/geometry performance of GF106 (on paper at least), but reduced memory bandwidth, L2 cache, and ROP throughput. We’ve always wondered why NVIDIA built a lower-performance/high-volume GPU with an odd number of memory blocks and what the immediate implications would be of disabling one of those blocks. Now we get to find out.
Launching today is the GTX 550 Ti, which features the GF116 GPU. As with GF114 before it, GF116 is a slight process tweak over GF106, using a new selection of transistors in order to reduce leakage, increase clocks, and to improve the card’s performance per watt. With these changes in hand NVIDIA has fully unlocked GF106/GF116 for the first time, giving GTX 550 Ti the responsibility of being the first fully enabled part: 192 CUDA cores is paired with 24 ROPs, 32 texture units, 384KB of L2 cache, a 192-bit memory bus, and 1GB of GDDR5.
The GTX 550 Ti will be shipping at a core clock of 900MHz and a memory clock of 1026MHz (4104MHz data rate), the odd memory speed being due to NVIDIA’s quirky PLLs. If you recall, GTS 450 was clocked at 783MHz core and 902MHz memory, giving the GTX 550 Ti an immediate 117MHz (15%) core clock and 124MHz (14%) memory clock advantage, with the latter coming on top of an additional 50% memory bandwidth advantage due to the wider memory bus (192-bit vs. 128-bit). NVIDIA puts the TDP at 116W, 10W over GTS 450. GF116 remains effectively unchanged from GF106, giving it a transistor count of 1.17B, with the power difference coming down to higher clocks and the additional functional units that have been enabled.
Unlike the GTS 450 launch, GTX 550 Ti is a more laid back affair for NVIDIA – admittedly this is more often a bad sign than it is a good one when it comes to gauging their confidence in a product. As a result they are not sampling any reference cards to reviewers, instead leaving that up to their board partners. As with GF104/GF114, GF116 is pin compatible with GF106, meaning partners can mostly reuse GTS 450 designs; they need only reorganize the PCB to handle a 192bit bus along with meeting the slightly higher power and cooling requirements. As a result a number of custom designs and overclocked cards will be launching right out of the gate, and you’re unlikely to ever see a reference card. Today we’re looking at Zotac’s GeForce GTX 550 Ti AMP, a factory overclocked card that pushes the core and memory clocks to 1000MHz and 1100MHz respectively. The MSRP on the GTX 550 Ti is $149 - $20 more than where GTS 450 launched at – while overclocked cards such as the Zotac model will go for more.
As was the case with the GTS 450, NVIDIA is primarily targeting the GTX 550 Ti towards buyers looking at driving 1680x1050 and smaller monitors, while GTX 460/560 continues to be targeted at 1920x1080/1200. Its closest competitor in the existing NVIDIA product stack is the GTX 460 768MB. The GTX 460 768MB has not officially been discontinued, but one quick look at product supplies shows that 768MB cards are fast dropping and we’d expect the 768MB cards to soon be de-facto discontinued, making the GTX 550 Ti a much cheaper to build replacement for the GTX 460 768MB. In the meantime however this means the GTX 550 Ti launches against the remaining supply of bargain priced GTX 460 cards.
AMD’s competition will be the Radeon HD 6850, and Radeon HD 5770. As is often the case NVIDIA is intending to target an AMD weak spot, in this case the fact that AMD doesn’t have anything between the 5770 and 6850 in spite of the sometimes wide performance gap. Pricing will be NVIDIA’s biggest problem here as the 5770 is available for around $110, while AMD has worked with manufacturers to get 6850 prices down to around $160 after rebate. Finally, to slightly spoil the review, as you may recall the GTS 450 had a deal of trouble breaking keeping up with the Radeon HD 5770 in performance – so NVIDIA has quite the performance gap to cover to keep up with AMD’s pricing.
March 2011 Video Card MSRPs | ||
NVIDIA | Price | AMD |
$700 | Radeon HD 6990 | |
$480 | ||
$320 | Radeon HD 6970 | |
$240 | Radeon HD 6950 1GB | |
$190 | Radeon HD 6870 | |
$160 | Radeon HD 6850 | |
$150 | ||
$130 | ||
|
$110 | Radeon HD 5770 |
GTX 550 Ti’s Quirk: 1GB Of VRAM On A 192-bit Bus
One thing that has always set NVIDIA apart from AMD is their willingness to use non-power of 2 memory bus sizes. AMD always sticks to 256/128/64 bit busses, while NVIDA has used those along with interesting combinations such as 384, 320, and 192 bit busses. This can allow NVIDIA to tap more memory bandwidth by having a wider bus, however they also usually run their memory slower than AMD’s memory on comparable products, so NVIDIA’s memory bandwidth advantage isn’t quite as pronounced. The more immediate ramifications of this however are that NVIDIA ends up with equally odd memory sizes: 1536MB, 1280MB, and 768MB.
768MB in particular can be problematic. When the GTX 460 launched, NVIDIA went with two flavors: 1GB and 768MB, the difference being how many memory controller/ROP blocks were enabled, which in turn changed how much RAM was connected. 768MB just isn’t very big these days – it’s only as big as NVIDIA’s top of the line card back at the end of 2006. At high resolutions with anti-aliasing and high quality textures it’s easy to swamp a card, making 1GB the preferred size for practically everything from $250 down. So when NVIDIA has a 768MB card and AMD has a 1GB card, NVIDIA has a definite marketing problem and a potential performance problem.
Video Card Bus Width Comparison | ||||||||
NVIDIA | Bus Width | AMD | Bus Width | |||||
GTX 570 | 320-bit | Radeon HD 6970 | 256-bit | |||||
GTX 560 Ti | 256-bit | Radeon HD 6950 | 256-bit | |||||
GTX 460 768MB | 192-bit | Radeon HD 6850 | 256-bit | |||||
GTX 550 Ti | 192-bit | Radeon HD 5770 | 128-bit | |||||
GTS 450 | 128-bit | Radeon HD 5750 | 128-bit |
NVIDIA’s solution is to normally outfit cards with more RAM to make up for the wider bus, which is why we’ve seen 1536MB and 1280MB cards going against 1GB AMD cards. With cheaper cards though the extra memory (or higher density memory) is an extra cost that cuts in to margins. So what do you do when you have an oddly sized 192-bit memory bus on a midrange card? For GTS 450 NVIDIA disabled a memory controller to bring it down to 128-bit, however for GTX 550 Ti they needed to do something different if they wanted to have a 192-bit bus while avoiding having only 768MB of memory or driving up costs by using 1536MB of memory. NVIDIA’s solution was to put 1GB on a 192-bit card anyhow, and this is the GTX 550 Ti’s defining feature from a technical perspective.
Under ideal circumstances when inter leaving memory banks you want the banks to be of equal capacity, this allows you to distribute most memory operations equally among all banks throughout the entire memory space. Video cards with their non-removable memory have done this for ages, however full computers with their replaceable DIMMs have had to work with other layouts. Thus computers have supported additional interleaving options beyond symmetrical interleaving, most notably “flex” interleaving where one bank is larger than the other.
It’s this technique that NVIDIA has adopted for the GTX 550 Ti. GF116 has 3 64-bit memory controllers, each of which is attached to a pair of GDDR5 chips running in 32bit mode. All told this is a 6 chip configuration, with NVIDIA using 4 1Gb chips and 2 2Gb chips. In the case of our Zotac card – and presumably all GTX 550 Ti cards – the memory is laid out as illustrated above, with the 1Gb devices split among 2 of the memory controllers, while both 2Gb devices are on the 3rd memory controller.
This marks the first time we’ve seen such a memory configuration on a video card, and as such raises a number of questions. Our primary concern at this point in time is performance, as it’s mathematically impossible to organize the memory in such a way that the card always has access to its full theoretical memory bandwidth. The best case scenario is always going to be that the entire 192-bit bus is in use, giving the card 98.5GB/sec of memory bandwidth (192bit * 4104MHz / 8), meanwhile the worst case scenario is that only 1 64-bit memory controller is in use, reducing memory bandwidth to a much more modest 32.8GB/sec.
How NVIDIA spreads out memory accesses will have a great deal of impact on when we hit these scenarios, and at this time they are labeling the internal details of their memory bus a competitive advantage, meaning they’re unwilling to share the details of its operation with us. Thus we’re largely dealing with a black box here, which we’re going to have to poke and prod at to try to determine how NVIDIA is distributing memory operations.
Our base assumption is that NVIDIA is using a memory interleaving mode similar to “flex” modes on desktop computers, which means lower memory addresses are mapped across all 3 memory controllers, while higher addresses are mapped to the remaining RAM capacity on the 3rd memory controller. As such NVIDIA would have the full 98.5GB/sec of memory bandwidth available across the first 768MB, while the last 256MB would be much more painful at 32.8GB/sec. This isn’t the only way to distribute memory operations however, and indeed NVIDIA doesn’t have to use 1 method at a time thanks to the 3 memory controllers, so the truth is likely much more complex.
Given the black box nature of GTX 550’s memory access methods, we decided to poke at things in the most practical manner available: CUDA. GPGPU operation makes it easy to write algorithms that test the memory across the entire address space, which in theory would make it easy to determine GTX 550’s actual memory bandwidth, and if it was consistent across the entire address space. Furthermore we have another very similar NVIDIA card with a 192-bit memory bus on hand – GTX 460 768MB – so it would be easy to compare the two and see how a pure 192-bit card would compare.
We ran in to one roadblock however: apparently no one told the CUDA group that GTX 550 was going to use mixed density memory. As it stands CUDA (and other APIs built upon it such as OpenCL and DirectCompute) can only see 768MB minus whatever memory is already in use. While this lends support to our theory that NVIDIA is using flex mode interleaving, this makes it nearly impossible to test the theory at this time as graphics operations aren’t nearly as flexible enough (and much more prone to caching) to test this.
CUDA-Z: CUDA Available Memory. Clockwise, Top-Left: GTS 450, GTX 460 768MB, GTX 550 Ti
At this point NVIDIA tells us it’s a bug and that it should be fixed by the end of the month, however until such a time we’re left with our share of doubts. Although this doesn’t lead to any kind of faulty operation, this is a pretty big bug to slip through NVIDIA’s QA process, which makes it all the more surprising.
In the meantime we did do some testing against the more limited memory capacity of the GTX 550. At this point the results are inconclusive at best. Using NVIDIA’s Bandwidth Test CUDA sample program, which is a simple test to measure memcopy bandwidth of the GPU, we tested the GTS 450, GTX 468 768MB, GTX 460 1GB, and GTX 550 Ti at both stock and normalized (GTX 460) clocks. The results were inconclusive – the test seems to scale with core clocks far more than memory bandwidth – which may be another bug, or an artifact of the program having originally been written pre-Fermi. In any case here is the data, but we have low confidence in it.
As it stands the test shows almost no gain over the GTS 450 at normalized clocks; this doesn’t make a great deal of sense under any memory interleaving scheme, hence the low confidence. If and when all the bugs that may be causing this are fixed, we’ll definitely be revisiting the issue to try to better pin down how NVIDIA is doing memory interleaving.
Meet The Zotac GeForce GTX 550 Ti AMP Edition
As we stated at the start of this article, NVIDIA is not directly sampling GTX 550 Ti reference cards for this launch. Instead they have left it up to partners to do the sampling. Zotac in turn has provided us with their factory overclocked model, the GeForce GTX 550 Ti AMP, a custom design clocked at 1000MHz core and 1100MHz (4400MHz data rate) memory. We’ll be looking at performance at both stock GTX 550 and Zotac factory clocks.
The AMP is very similar in design to Zotac’s previous GTS 450 AMP card, which is not surprising given the near-perfect compatibility of the GTS 450 and GTX 550 designs. Zotac does not provide thermal/power data for their cards, but with the factory overclock we’d expect it to be a bit more than the 116W for the reference design.
At a hair under 7.5” and using a double-wide cooler design, the AMP is nearly ¾ of an inch shorter than the original GTS 450, reflecting the fact that NVIDIA’s partners often end up producing shorter cards. A simple shroud covers the card, which directs most of the airflow from the 74mm fan out the front and the rear of the card. If we remove the shroud we find a rectangular aluminum heatsink covering the GPU – the GDDR5 memory remains uncovered.
Power is supplied by a single rear-facing 6pin PCIe power socket, which provides more than enough power for a lower-power device such as the AMP. Zotac has outfitted the chip with Hynix GDDR5; 4 1Gb chips and 2 2Gb chips. All of them are rated for 5GHz operation, so even at 4.4GHz the AMP is running its memory below what the chips are capable of, not counting what the bus and GPU itself are capable of handling. Meanwhile as was the case with the GTS 450, a single SLI connector is provided for 2-way SLI.
A common theme with customized Zotac designs is support for additional display connectivity options beyond the NVIDIA reference design, and Zotac does not disappoint here. By shifting a DVI port to the 2nd slot, Zotac has outfitted the card with a full size DisplayPort, along with upgrading the HDMI port from mini to full size; they are still limited in terms of total displays by the GPU however, and even with 4 ports can only drive 2 displays at once. Zotac continues to be one of the only NVIDIA vendors we regularly see support DisplayPort, a sharp contrast from AMD & their partners who include it on virtually everything.
Rounding out the package is the usual collection of extras from Zotac. For hardware this means a molex-to-PCIe power adapter and a DVI-to-VGA adapter. Meanwhile on the software size Zotac continues to provide the Boost Premium package, which includes a collection of OEM & trial copies of various GPU-accelerated programs, including vReveal, Nero Vision, Cooliris, XBMC, and Kylo.
Zotac is pricing the card at $169 $155, $5 more than the NVIDIA MSRP for a basic card. With it comes a 2 year warranty, and a 3rd year is added upon registration.
The Test
As we don’t have a true reference card our testing methodology has been slightly tweaked. We’ve tested the AMP at both GTX 550 Ti reference clocks and at its factory overclock for all metrics, however power/noise/temperature data is going to significantly vary from manufacturer to manufacturer.
For drivers NVIDIA is pairing ForceWare 267.59 with the card – these drivers are just incremental bugfixes, SLI profiles, and product additions over the earlier Release 265 series drivers and performance is unchanrged for other cards from earlier results.
Meanwhile for the AMD cards we’re using the Catalyst 11.4 preview for the 5770 and 6800 series. While the bulk of the performance improvements in these drivers (in what AMD is now calling Project Mjölnir) are for the new Cayman/VLIW4 architecture, Barts/Evergreen/VLIW5 performance has ticked up a couple percent here and there, further raising the bar that NVIDIA needs to cross.
CPU: | Intel Core i7-920 @ 3.33GHz |
Motherboard: | Asus Rampage II Extreme |
Chipset Drivers: | Intel 9.1.1.1015 (Intel) |
Hard Disk: | OCZ Summit (120GB) |
Memory: | Patriot Viper DDR3-1333 3 x 2GB (7-7-7-20) |
Video Cards: |
AMD Radeon HD 6990 AMD Radeon HD 6970 AMD Radeon HD 6950 2GB AMD Radeon HD 6870 AMD Radeon HD 6850 AMD Radeon HD 5970 AMD Radeon HD 5870 AMD Radeon HD 5850 AMD Radeon HD 5770 AMD Radeon HD 4870X2 AMD Radeon HD 4870 NVIDIA GeForce GTX 580 NVIDIA GeForce GTX 570 NVIDIA GeForce GTX 560 Ti NVIDIA GeForce GTX 550 Ti NVIDIA GeForce GTX 480 NVIDIA GeForce GTX 470 NVIDIA GeForce GTX 460 1GB NVIDIA GeForce GTX 460 768MB NVIDIA GeForce GTS 450 NVIDIA GeForce GTX 295 NVIDIA GeForce GTX 285 NVIDIA GeForce GTX 260 Core 216 |
Video Drivers: |
NVIDIA ForceWare 262.99 NVIDIA ForceWare 266.56 Beta NVIDIA ForceWare 266.58 NVIDIA ForceWare 257.59 Beta AMD Catalyst 10.10e AMD Catalyst 11.1a Hotfix AMD Catalyst 11.4 Preview |
OS: | Windows 7 Ultimate 64-bit |
Crysis: Warhead
Kicking things off as always is Crysis: Warhead, still one of the toughest game in our benchmark suite. Even 3 years since the release of the original Crysis, “but can it run Crysis?” is still an important question, and for 3 years the answer was “no.” Dual-GPU halo cards can now play it at Enthusiast settings at high resolutions, but for everything else max settings are still beyond the grasp of a single card.
Though NVIDIA is primarily targeting the GTX 550 Ti towards 1680x1050 users, we’re including 1920x1200 to showcase games where the card is fast enough to handle that higher resolution at a playable framerate, or to show where it’s close to crossing the mark. However this is largely to satisfy our curiosity rather than to generate data from which to draw a comparison.
Out of our normal card lineup the GTS 450 is the slowest card we keep, so NVIDIA quite literally has nowhere else to go but up here. For the GTX 550 this means vaulting well past the GTS 450, giving us a 23% increase in performance; keep in mind that the theoretical improvement based on core and memory clocks alone is only 15%, so whenever we exceed that we are clearly seeing the benefits of the additional ROPs, L2 cache, and memory bandwidth afforded by enabling the 3rd memory controller. In any case at 32.2fps it’s playable, however Crysis is a demanding enough game that it makes much more sense to turn the game’s settings down some more before taking it on.
Meanwhile compared to AMD’s offerings the GTX 550 comes out ahead of the 5770 by half a frame per second, while the 6850 completely clears the field - –he GTX 550 only manages 72% of the 6850’s performance here. The situation compared to the GTX 460 768MB is much better, but still the GTX 550 is only 85% as fast.
As for the Zotac factory overclock, here we’re picking up 3%. This is curiously much lower than the theoretical advantage.
In terms of minimum framerates the GTX 550 ends up doing better. It ends up being ahead of the 5770 by nearly 10%, and against the 450 it beats it by 30%. However the GTX 550 still falls short of the 6850 by nearly 25%.
BattleForge
Up next is BattleForge, Electronic Arts’ free to play online RTS. As far as RTSes go this game can be quite demanding, and this is without the game’s DX11 features.
BattleForge is simultaneously less and more harsh on the GTX 550 compared to its predecessors and competition. Against the 5770 it now has a solid 5% lead, against the 460 and 6850 however it’s only 79% and 71% as fast respectively. Even the gains against the GTS 450 are muted at 17%. The highlight here is that BattleForge is resolution-insensitive enough that the GTX 550 can hit 36fps here, which is largely playable for this style of game.
As for the AMP overclock, our results are now much closer to the theoretical numbers. Zotac picks up 10% here, bringing the card quite close to the GTX 285.
Metro 2033
The next game on our list is 4A Games’ Metro 2033, their tunnel shooter released last year. In September the game finally received a major patch resolving some outstanding image quality issues with the game, finally making it suitable for use in our benchmark suite. At the same time a dedicated benchmark mode was added to the game, giving us the ability to reliably benchmark much more stressful situations than we could with FRAPS. If Crysis is a tropical GPU killer, then Metro would be its underground counterpart.
The GTX 550 ends up doing what the GTS 450 could not on Metro, and that’s cracking 30fps at 1680x1050. Realistically speaking however Metro is quite possibly the only thing more resource intensive than Crysis, and even though we’re down to “high” settings without anti-aliasing, this isn’t very playable. You’d have to go down in quality/resolution further still to get this FPS fluid.
Compared to other cards Metro normally gives AMD a slight edge. This results in the worst showing for the GTX 550 out of our benchmarks, with the 5770 of all things topping it by 5%. Compared to the 6850 the deficit is reduced however, with the GTX 550 coming in at 77% the performance. Performance relative to NVIDIA cards is rather consistent with BattleForge: 17% ahead of the GTS 450, but 18% behind the GTX 460 768MB.
Zotac’s overclock does manage to turn the tables some. The AMP still trails the 6850 and GTX 460, but at least it’s finally faster than the 5770.
HAWX
Ubisoft’s 2008 aerial action game is one of the less demanding games in our benchmark suite, particularly for the latest generation of cards. However it’s fairly unique in that it’s one of the few flying games of any kind that comes with a proper benchmark.
HAWX was one of the better games for the GTS 450, so it should come as little surprise that it sees a stronger showing here, particularly against the 5770. At 1680 it has a 17% lead over AMD’s bargain offer, and in terms of raw performance even at 1920 it offers a silky smooth 72fps with 4xAA. In spite of that HAWX is still rather GPU limited, leading to the GTX 550 trailing the 6850/460 by around 18% - not that it would make a difference in this game.
Civilization V
Civilization 5 is the latest incarnation in Firaxis Games’ series of turn-based strategy games. Civ 5 gives us an interesting look at things that not even RTSes can match, with a much weaker focus on shading in the game world, and a much greater focus on creating the geometry needed to bring such a world to life. In doing so it uses a slew of DirectX 11 technologies, including tessellation for said geometry and compute shaders for on-the-fly texture decompression.
AMD recently picked up a performance boost in Civ 5, closing the gap NVIDIA opened earlier this year. Still, NVIDIA generally has quite an advantage here, which works out for the GTX 550’s favor.
Against the Radeon HD 5770 this translates to an 11% lead, while compared to the 6850 the GTX 550 comes as close as it ever will to the budget Barts, missing it by only 8%. For the rest of the NVIDIA lineup the gap is much closer to what we normally see, with the GTX 550 trailing the GTX 460 by 20%. Interestingly the GTX 550 doesn’t gain a ton over the GTS 450 here, and at only 10% we’re likely seeing what it means to be almost entirely geometry bound with no benefit to speak of from the ROPs or additional memory bandwidth.
Since being geometry bound is a simple matter of shader clocks however, the overclocked Zotac AMP gets a straightforward 10% performance increase due to its overclock. As a result for the first and only time in this article, we see a GTX 550 pull ahead of the 6850, even if it is by seven-tenths of a frame per second.
Battlefield: Bad Company 2
Now approaching a year old, Bad Company 2 remains as one of the cornerstone DX11 games in our benchmark suite. Based on the Frostbite 1.5 engine, it will be replaced in complexity by the DX10+ only Frostbite 2 engine (and Battlefield 3) later this year. As BC2 doesn’t have a built-in benchmark or recording mode, here we take a FRAPS run of the jeep chase in the first act, which as an on-rails portion of the game provides very consistent results and a spectacle of explosions, trees, and more.
Whether Bad Company 2 favors AMD or NVIDIA cards seems to have more to do with phases of the moon than logic. In this case though it works against NVIDIA, leading to another poor showing for the GTX 550. Here we’re looking at the GTX 550 falling to the 5770 by 2%, meanwhile there’s a 25% gap to close versus the 6850. Compared to NVIDIA’s other cards however we see a sizable 23% pickup versus the GTS 450, and a smaller 17% gap against the GTX 460. We don’t normally use AA on this game at 1680, so the GTX 550 should have enough power to run it with 4x AA at the cost of a bit of the silky-smooth 68fps, or alternatively going up to 1920 without any AA.
STALKER: Call of Pripyat
The third game in the STALKER series continues to build on GSC Game World’s X-Ray Engine by adding DX11 support, tessellation, and more. This also makes it another one of the highly demanding games in our benchmark suite.
STALKER is quite memory intensive, and at higher resolutions we see even 1GB cards trail off in performance. So the biggest benefit to 1GB over 768MB in today’s games can be seen here, where the GTX 550 comes the closest it ever will to the GTX 460 768MB, reducing its performance deficit to 9%. It’s also another good showing against the also 1GB GTS 450, with a 23% gain.
Against the Radeon cards however NVIDIA does poorly, as this game largely favors AMD. The 3rd and final loss to the 5770 is here, this time by 3%; the 6850 has a 25% gap to close in the meantime. The difference ultimately is going to come down to quality – at 32fps the GTX 550 is playable, but most people are going to want to turn down something like AA in order to get the framerate above 40fps for a smoother experience.
As for the AMP’s factory overclock, once more it helps to close the gap. The AMP picks up 10%, but it’s still not enough to get above 40fps.
DIRT 2
Codemasters’ 2009 off-road racing game continues its reign as the token racer in our benchmark suite. As the first DX11 racer, DiRT 2 makes pretty thorough use of the DX11’s tessellation abilities, not to mention still being the best looking racer we have ever seen.
DIRT 2 meanwhile generally runs better on NVIDIA cards, giving NVIDIA the break they needed. The GTX 550 enjoys a 12% lead over the 5770 and comes within 11% of the 6850. Even the GTX 460 gap closes some to 16%, while against the GTS 450 the 550 picks up 21%.
In practice DIRT 2 is light enough to run that even 1920 with 4xAA is not out of the question here for the GTX 550, with the AMP sealing the deal by recovering most of the framerate the stock GTX 550 loses on the resolution bump.
Mass Effect 2
Electronic Arts’ space-faring RPG is our Unreal Engine 3 game. While it doesn’t have a built in benchmark, it does let us force anti-aliasing through driver control panels, giving us a better idea of UE3’s performance at higher quality settings. Since we can’t use a recording/benchmark in ME2, we use FRAPS to record a short run.
On a side note, for our next benchmark refresh we’ll likely replace it with a newer UE3 game. If any of you have any requests or suggestions, we’d like to hear them.
Mass Effect 2 is another strong showing for the GTX 550, particularly compared to the Radeon HD 5770. At this point we’re looking at a 20% performance boost, and against the GTS 450 it’s an even bigger 26% boost – clearly ME2 is memory bandwidth limited. The performance relative to the GTX 460 768MB also seems to confirm this, as this is the closest the GTX 550 gets to the GTX 460, coming within 9%. It doesn’t get close to clearing the 6850 however, which still beats the GTX 550 by 20%.
Notably, this is one of the few games where the AMP’s overclock isn’t as helpful, since the memory overclock isn’t as great as the core overclock. As a result the AMP’s advantage over a stock-clocked GTX 550 is only 6%. Nevertheless it gets within a hair of equaling the GTX 460 768MB, which is quite an accomplishment given the original difference in the number of CUDA cores.
Wolfenstein
Finally among our benchmark suite we have Wolfenstein, the most recent game to be released using the id Software Tech 4 engine. All things considered it’s not a very graphically intensive game, but at this point it’s the most recent OpenGL title available. It’s more than likely the entire OpenGL landscape will be thrown upside-down once id releases Rage later this year.
Wolfenstein is quite easy to run, and as a result even at 1920 the GTX 550 nearly breaks 60fps. It’s also memory bandwidth limited to a degree, which is why the GTX 550 gains 24% on the GTS 450 and only underperforms the GTX 460 768MB by 13%.
AMD’s edge in OpenGL performance does keep it at bay though. The 5770 does fall by 8%, but the 6850 enjoys a lead just shy of 20%. Not that it’s likely to make a huge difference in this game.
Compute & Tessellation
Moving on from our look at gaming performance, we have our customary look at compute performance, bundled with a look at theoretical tessellation performance.
Our first compute benchmark comes from Civilization V, which uses DirectCompute to decompress textures on the fly. Civ V includes a sub-benchmark that exclusively tests the speed of their texture decompression algorithm by repeatedly decompressing the textures required for one of the game’s leader scenes.
Unlike our gaming benchmarks, Civ V’s compute performance strongly favors NVIDIA’s Fermi family here, giving the GTX 550 a very strong showing. At 136fps it’s competitive with the much more expensive 6950 and even comes quite close to the GTX 460 768MB. The 6850 and 5770 meanwhile fall well behind.
What’s quite interesting is that with a bit more gas, the Zotac AMP really climbs up the charts. At 150fps it’s tailing the GTX 460 1GB and even AMD’s single-GPU king, the 6970. And ultimately while this benchmark is compute bound, it’s also memory bound at least part of the time, which is why the GTX 550 outperforms the GTS 450 by 21%.
Our second GPU compute benchmark is SmallLuxGPU, the GPU ray tracing branch of the open source LuxRender renderer. While it’s still in beta, SmallLuxGPU recently hit a milestone by implementing a complete ray tracing engine in OpenCL, allowing them to fully offload the process to the GPU. It’s this ray tracing engine we’re testing.
SmallLuxGPU also gives NVIDIA a solid standing here, however not quite as much as with Civ V. The GTX 550 beats the 5770 and even pulls ahead of the 6850 here. The benefit from the memory bandwidth increase is minimal however, leading to a lower 15% gain versus the GTS 450, and a larger 25% gap versus the GTX 460.
Our final compute benchmark is a Folding @ Home benchmark. Given NVIDIA’s focus on compute for Fermi, cards such as the GTX 560 Ti can be particularly interesting for distributed computing enthusiasts, who are usually looking for a compute card first and a gaming card second.
We do not normally classify Folding@Home to be very memory bandwidth sensitive on GPUs, which makes our results here more interesting than we expected. A 17% gain over the GTS 450 is in-line with the core clock increase, but achieving 81% of the GTX 460 is no small feat. With the AMP’s overclock, the card closes to within 10% of the GTX 460.
At the other end of the spectrum from GPU computing performance is GPU tessellation performance, used exclusively for graphical purposes. With Fermi NVIDIA bet heavily on tessellation, and as a result they do very well at very high tessellation factors. With 1 GPCs the GTX 560 Ti can only retire 1 triangle/clock however, which nullifies much of the architectural advantage on paper.
Even with extreme levels of tessellation, the lower geometry throughput of the GTX 550 keeps a lid on performance. It’s consistently around 80% of the performance of the GTX 460, however the hit for using extreme tessellation ultimately is greater here than on NVIDIA cards with more geometry throughput. As NVIDIA likes to note it has quite a bit more geometry throughput than older generation cards, but tying geometry performance to the number of SMs means parity between high and low-end cards is lost.
The DX11 Detail Tessellation sample reinforces those findings. NVIDIA does keep up their solid scaling from medium to high tessellation versus AMD though., including a very surprising score versus the 5870 and 6870 at high tessellation.
Power, Temperature, & Noise
Last but not least as always is our look at the power consumption, temperatures, and acoustics of the GTX 550 Ti. As a result NVIDIA’s tinkering with both power efficiency and clocks, performance has gone way up but power has also gone up too: the official TDP has gone up by 10W.
Please note that as we don’t have a true reference card our testing methodology has been slightly tweaked. We’ve tested the AMP at both GTX 550 Ti reference clocks and at its factory overclock for all metrics, however noise and temperature in particular are going to significantly vary from manufacturer to manufacturer.
GeForce GTS 450 & GTX 550 Voltage | ||||
GTS 450 | Zotac GTX 550 Ti Amp | Idle | ||
1.05v | 1.1525v | 0.95v |
The load voltage on our Zotac AMP is 1.1525v, which compared to GF106/GTS 450 ends up being quite high. No doubt Zotac has given the card some more voltage to handle the factory overclock, however this will also skew our power results for load. Idle however remains unchanged at 0.95v.
Right off the bat, idle power consumption looks good. At 155W the Zotac GTX 550 AMP beats even the reference GTS 450 by 2W, and the gap with the Radeons is much larger. Here we’re looking at a 6W advantage over the 6850, and 8W over the 5770. This is no small feat given how hard it is to reduce idle power usage, and showcases the benefits of NVIDIA’s transistor tinkering.
Power consumption leaves much to be desired however, and it’s at this point that we can’t easily separate the GTX 550 Ti from the Zotac GTX 550 AMP. If the Zotac card did not have an overclock, perhaps it would have a lower load voltage, and as such lower power draw under load. But in this case it does not, which leads to a total system power draw difference of 19W over the 6850, never mind the 5770.
The GTX 550 Ti does not have any kind of overcurrent protection like the rest of the GTX 500 series, so here we can get straight numbers without forcibly disabling it. The results, like with Crysis, are understandably poor for the card; 33W over the 6850, more for the card when it’s running at factory clocks. If a more reference style GTX 550 Ti is anything like this, it bodes poorly for the card compared to the already faster 6850.
The king of our idle temperature charts is composed of NVIDIA reference cards, and while the Zotac GTX 550 AMP gives a decent performance, it can’t keep up. 35C at idle is still quite good, but we saw better on the GTS 450, and would likely see something similar on a reference GTX 550 Ti.
Under load, temperatures end up being middle the road, thanks once more to the card’s higher power consumption. Funny enough it does beat the 5770 by 1C even after all of this, but one could just as well slap the GTX 460 cooler on here and get results that would undoubtedly be below 70C.
FurMark of course adds several C more to our temperatures. In fact for a lower power card like the Zotac GTX 550 AMP we’re actually caught a bit off-guard. 86C at factory clocks means there’s some thermal headroom to play with for overclocking, but not too much – the GTX 550 Ti’s max temperature is only 95C.
For cards in this performance category, idle noise is pretty consistent thanks to the common use of open coolers. The Zotac GTX 550 AMP doesn’t disappoint here, offering a practically silent 41.5dbA as measured by our meter.
If you were hoping that higher temperatures would be a precursor to lower fan noise, you’re going to come up empty handed here. 50.7 is by no means loud among all the cards we’ve tested, but we’ve seen better. In fact the 5770/6850 both do better, as does the GTX 460 and GTX 560. At the AMP’s full factory overclock we get up to 54.4dbA, which is certainly going to be noticeable. Unfortunately the card doesn’t have the performance even with the overclock to justify the noise.
Ultimately this is less a discussion of the GTX 550 Ti reference design, and more a discussion on the Zotac GTX 550 Ti AMP’s design, which more likely than not takes a hit in cooling and temperatures thanks to the 2nd DVI port partially blocking the card’s external exhaust. But for the time being, it’s what we have to work with.
Closing Thoughts
This is ultimately an underwhelming launch for NVIDIA, but perhaps it’s best we first start with the positives.
The GTS 450 was the first Fermi launch that didn't result in some immediate fanfare for NVIDIA. With performance treading between a Radeon HD 5750 and 5770, the GTS 450 didn’t look good. So if they could be a “most improved” award for a GPU, GF116 and the GTX 550 Ti would most certainly get it. Even though all NVIDIA did was enable a 3rd memory controller and ramp up the clocks, it’s enough to increase performance by 20% - at other segments of the market we regularly settle for less. With these improvements the GTX 550 Ti is finally almost consistently ahead of the Radeon HD 5770.
So what’s the problem? The same problem NVIDIA normally runs in to: pricing. The GTX 550 Ti seems destined to sell based on NVIDIA’s name and market presence more than it will sell based on performance characteristics. Not having a reference card muddles our results some, but ultimately it’s clear that AMD’s pricing has caught NVIDIA flat-footed.
Indeed the GTX 550 Ti is faster than the 5770 - by around 7% - but then the GTX 550 Ti costs 36% more. At the other end of the spectrum is the 6850, which is 7% more expensive on average for 25% better performance. Even the GTX 460 768MB is going to gnaw at NVIDIA here so long as it’s still on the market; it’s 15% faster and yet it’s $20 cheaper. It’s with a dash of Alanis Morissette irony that while having so-so graphical performance the GTX 550 is a remarkable compute card compared to similar AMD cards, but at the same time a CUDA memory bug sliped by before the product shipped.
In these situations NVIDIA reminds me of Intel in the sub-$200 market before Sandy Bridge was released: gross margin first, competition second. AMD is quite willing to cut prices to the bone, NVIDIA is not. As a result on these lower-end products AMD has quite the performance lead for the price. This of course is NVIDIA’s choice, but so long as they choose to go about pricing products this way they’re going to play catch-up to AMD.
In the end the GTX 550 Ti just isn’t a compelling product at $149. At that price you’re much better served by ponying up the extra $10 to pick up a 6850 for much better performance – and if the Zotac GTX 550 Ti AMP is similar to other GTX 550 Ti cards – lower power consumption and less noise. Alternatively the GTX 460 768MB is an absolute steal while it’s still available.
Meanwhile partners like Zotac are left in a rough spot. At $169 $155 the GTX 550 Ti AMP closes the performance gap with the 6850 by some, and at $5 more than a stock clocked GTX 550 Ti is quite a good deal for 10% better performance. But ultimately it's only $5 less than a notably better performing card, the 6850. However the fact that so many partners are doing overclocked cards speaks well of GF116’s overclockability. More significantly it’s quite remarkable that these overclocked GTX 550 Ti’s can get so close to the GTX 460 768MB – a card with a much bigger GPU with many more functional units to work with. With these factory overclocks, the GTX 550 Ti could almost be a decent replacement for the GTX 460 768MB. Pricing is the enemy however – these guys can only lower prices if NVIDIA lets up on the $149 MSRP for the stock clocked GTX 550 Ti.
Finally, we certainly haven’t forgotten about NVIDIA’s interesting memory arrangement with the GTX 550 Ti. It’s a shame that they won’t tell us more about how they’re interleaving memory accesses on this unique design, but hopefully they’ll open up in the future. It’s something we’re definitely going to revisit once the CUDA memory bug is dealt with, and hopefully at that time we’ll be able to learn more about how NVIDIA is accomplishing this. If this is the start of a long term change to memory layout by NVIDIA, then getting to better understand how they’re interleaving memory accesses here will be all the more important to understanding future products.