Name: Budget Battle: HyperMemory vs. TurboCache
Item: Budget Battle: HyperMemory vs. TurboCache
Author: Derek Wilson

Original Link: https://www.anandtech.com/show/1679

Budget Battle: HyperMemory vs. TurboCache

VIEW ARTICLE

by Derek Wilson on May 12, 2005 9:00 AM EST

Posted in
GPUs

33 Comments

Introduction

Affordable, full-featured cards have been long in coming from ATI and NVIDIA. With the HyperMemory and TurboCache cards, we are finally able to recommend a budget card that can absolutely play the latest games with all the eye candy that developers have built in. The tradeoff that we have to make for the lower price is resolution and filtering options, but we no longer need to sacrifice effects or realism and are rewarded with the immersive experience that modern games are able to deliver at a reasonable price.

For those who have experienced huge resolutions with AA and AF enabled, it would be very hard to go back to playing games at an aliased 800x600 with no filtering. On the upside, casual computer users who may not have any real gaming experience now have a cost-effective way to add DX9 level graphics to their next computer upgrade.

Another major upside of the current landscape is that when the bear minimum in graphics cards supports DX9 level graphics, the minimum requirements of games will shift up to the DX9 level. Designing for DX9 at the outset will change the way that game developers approach their work. This is really the excuse that we need to see gaming experiences jump up to the next level.

In this look at ATI's HyperMemory and NVIDIA's TurboCache parts, we will be trying to determine which card is the best value for the money. Something that we also want to learn is whether the cheapest budget card can still hold its own, and whether the most expensive card that we test is worth the price difference.

We have already written about the technology behind TurboCache. Today, we talk about HyperMemory and concentrate on what these products are actually able to deliver.

Round 1: Architecture

The technology in these products has to do with making games think that they have more graphics memory than what the cards physically have on board. ATI and NVIDIA have taken different approaches to solving the problem.

NVIDIA has a solution that goes way down to the inner workings of the GPU. They haven't released details about the specifics on what has been changed with their TurboCache parts, but they state that everything they've done has been to hide the latency of system memory accesses in their pixel and ROP pipelines. Likely, this includes adding larger local caches and doing other things to increase the number of pixels that can be inflight at any given time. A very important factor of NVIDIA's architecture is that it is designed to operate on system memory as if it was local - the only thing that NVIDIA doesn't allow to operate directly in system RAM is the front buffer.

The ATI approach is distinctly more software based, though they do state that the memory controller on their GPU is what makes HyperMemory possible. The extent of these changes is significantly less than the NVIDIA solution. The ATI approach creates more of a virtualized memory system for the graphics card, allowing the driver to allocate system memory as needed and page data in and out of graphics RAM at will. The system memory is windows-managed and so, is virtualized out to the hard disk if necessary (which could really kill performance). Of course, if enough RAM is being used to page graphics data, there are more issues at hand that are likely also causing performance problems.

We haven't talked about GART memory very much since the decline of AGP, but the brief explanation is that GART memory is linearly addressable non-paged memory allocated to the graphics subsystem for external storage. With PCI Express based systems, it seems that the graphics driver manages GART memory completely rather than allowing the system BIOS to set a default size. We haven't been able to get solid details on how this memory is managed from either ATI or NVIDIA.

To take a further step back, the organization of ATI's graphics memory is set up in stages. First the driver determines what surfaces are the highest priorities and loads those into local memory. Whenever anything new comes along after local memory gets crowded, ATI demotes lower priority surfaces to GART memory. When GART memory gets too full, surfaces can further be demoted to pageable, Windows managed system memory. This system memory is requested by the driver as necessary and freed when memory pressure decreases again.

Microsoft's next Windows OS will require graphics drivers to support fully virtualized and windows managed graphics memory. Along with their VPU recover (graphics hardware reset), HyperMemory may be a product of ATI's preliminary Longhorn work. To be sure, including the ability to incorporate windows-managed memory with driver-managed local RAM is a drop in the bucket compared to handing over all local and system graphics memory management to the OS.

The inclusion of virtualized graphics memory is actually something that workstation users have been calling for for quite some time. It's very interesting to see the technology end up in a value product first. Hopefully, ATI will follow 3Dlabs' lead and bring their virtualization technology to the workstation space as well.

The major difference between TurboCache and HyperMemory is that the latter must first load a required surface into local memory before operating on it - possibly requiring the driver to kick something else off of local memory into system RAM. The separation of up and down stream bandwidth in PCI Express makes this relatively painless. TurboCache, on the other hand, sees all graphics memory as local and does not need to load a surface or texture to local RAM before operating on it. Shaders are able to read and write directly over the PCI Express bus into system RAM. Under the NVIDIA solution, the driver carries the burden of keeping the most used and most important bits of data in local memory.

The underlying architectures of these cards dictate the comparison points that we will chose. The ATI card needs more local RAM than the NVIDIA card because it isn't rearchitected to support operating on the majority of its data at across the high latency of a system bus. More fast local RAM is good, but with more RAM comes more cost. The balance will be found in who can afford to charge the least - ATI with a pretty much stock R42x and more RAM, or NVIDIA with less RAM and a rearchitected GPU. Price is a huge factor in determining the better solution here, and performance often comes as an afterthought.

Happily, we embrace the new move to eliminate graphics API features as a distinguishing factor in graphics hardware decisions.

Round 2: Performance

In an attempt to demonstrate the capabilities of these cards, we have run these game tests at their highest quality levels. This means that the cards will be pushed harder with more detail. Most importantly, the entire card will be well stressed. This will give us an idea of how these lower memory cards can deliver on the promise of an experience comparable to all other cards of the current generation.

In the end, it will be up to the end user whether to enable the highest quality settings and run at lower resolutions, or to turn off a few of the bells and whistles and pump up the number of pixels packed on to the screen a bit. The tradeoff really comes down to preference. Even for the AT staff, the resolution vs. settings argument is addressed on a game by game basis.

We will be taking a look at 4 flavors of cards today. We have 3 different TurboCache models and 1 HyperMemory board. Our HyperMemory board is the 32 MB model while we have 16, 32, and 64 MB TurboCache cards. With the exception of the 16MB TC board, these boards have a 64-bit memory interface. The 16MB card's memory bus is only 32-bits wide.

Aside from memory size and bus width, speed is an important factor with these boards. Vendors are looking at cheap for these boards, so we won't be seeing <2ns GDDR3 here. Instead, memory speeds are at most 700 MHz (the speed of the 16 and 32 MB TC cards). The HyperMemory card runs with RAM clocked at about 665MHz while the 64MB TC card runs at a rather slow 550 MHz. This will serve to change the performance landscape of the TurboCache cards, as having faster memory will really help the boards with less memory. The advantage of the 64MB card then becomes the ability to run applications that require 256MB of graphics memory. This will not likely be as useful as having higher performance under a 128MB graphics memory setup.

It is also possible to create 32bit wide 32MB cards, but these will experience a definite performance hit. It will be important for the consumer to pay attention to the amount of graphics memory that their solution supports, the amount of RAM that it has locally, and the bit width of the local memory interface. Needless to say, we will probably be less than satisfied with the way that these cards are marketed. Of course, giving vendors a wide range of choices based on their needs will hopefully help to keep competition up and prices down in the market.

The test setup that we used was designed to put the most emphasis on the graphics cards capabilities. As such, we should keep in mind that we are very graphics card limited here and should see very similar performance on quite a range of CPUs (until we start becoming CPU limited in games). These cards will be most sensitive to RAM used in a system, and it is our recommendation that if these cards are intended to be used by the casual gamer, memory choice should be given careful consideration. Here's what the cards ran in:

Microsoft Windows XP SP2
ASUS A8N-SLI Deluxe
AMD Athlon FX-53
1GB OCZ PC3200 @ 2:2:2:9
Seagate 7200.7 HD
OCZ Powerstream 600W PS

The 32MB 304/665 (core/mem) HyperMemory card that we have is direct from ATI, while the 32MB and 64MB 350/700 (core/mem) TurboCache cards are from PNY. The 16MB TurboCache part was from NVIDIA and is clocked the same as the PNY parts.

This is quite a lot to keep in mind when looking at our performance tests. Unfortunately, it's not a simple matter to understand what we are seeing at first glance. But, that being said, what follows paints quite a good picture of the budget market as it stands.

Doom 3 Performance

Not surprisingly, the NVIDIA cards maintain their place as the performance leader in Doom 3. We did run this test in high quality mode, which enables 8x AF; medium or low quality would give more playable results on the cheaper ultra low end cards. This test posts the lowest framerates of all the tests that we looked at today, and luckily, Doom 3 also looks better than most other games when compared at the lowest resolutions.

Doom 3 Performance

At 800x600, there is a 14% difference between the two fastest TurboCache cards that we tested. The cheapest NVIDIA and the ATI X300 HyperMemory cards perform equally well here.

X300 HyperMemory performance changes less versus resolution than the NVIDIA cards, but even at the lowest resolution, we would recommend turning off some of the extras and going with a lower quality mode.

Far Cry 1.3 Performance

Far Cry also shows a performance lead for the NVIDIA cards. Our middle grade resolution of 800x600 shows a fairly linear increase in the TurboCache cards with the 32MB card again leading the 64MB part.

Far Cry v1.3 Performance

It isn't until we reach 1024x768 that the 64MB part starts making a difference, but by this time, framerates have dropped to lower than acceptable levels.

Far Cry is definitely best played at 640x480 on all of these cards.

Half Life 2 Performance

This time around, the ATI card has the advantage, but not by much over the 32MB TurboCache card. Again, these numbers were run with the highest settings in HL2, and we are quite happy with the smoothness of the action under here.

Half Life 2 Performance

As we can see, all but the 16MB TC card perform about the same.

Antialiasing and Anisotropic filtering add a great deal to the image quality of this game. We ran some numbers to see if we could get away with enabling these options. We find that enabling 2xAA and 4xAF leads to about the same framerates as a bump up in resolution.

Unreal Tournament 2004 Performance

Unreal Tournament 2004 shows our 32MB HyperMemory performing on par with the 64MB TurboCache part in the middle of the pack. The 16 and 32 MB TC cards round out the bottom and top of the pack, respectively.

Unreal Tournament 2004 Performance

We also looked into AA/AF under UT2K4. Again, this is a tradeoff that the end user will have to make, but the game will run fine at lower resolutions with AA and AF. The ATI card pulls away from the 64MB TC part in this test as well.

Wolfenstein: Enemy Territory Performance

With this test, every game but HL2 has the 32MB 64bit TurboCache supporting 128MB of RAM running at 350/700 clock speeds leading the pack.

Wolfenstein: Enemy Territory Performance

Interestingly, when we enable AA and AF in this game, we see a change in the performance leadership. The ATI card comes back to the top in this test.

Final Round

What everything comes down to is the price. We are seeing the TurboCache parts coming in at between $59 and $80 for the range of cards that we tested here. While it isn't as widely available yet, the ATI HyperMemory parts are coming in at between $53 and $75. It is absolutely important to realize that we tested the cheapest of the X300 HyperMemory cards here, the 32MB onboard model. With this card coming in at under $60, there is no reason to choose the 16MB TC part over the ATI solution unless vendors can get the prices down near $50.

The 128MB onboard X300 HyperMemory part should perform significantly better than what we are seeing here, and we suspect that the price point puts it in good competition with the higher end TurboCache parts. But we will have to wait until we have hardware for testing before we can confirm this suspicion. Regardless, the small price difference for the extra 96MB of onboard RAM makes the more expensive ATI part a very interesting option.

We really can't see much reason to recommend the 64MB TurboCache part. Unless one of the vendors comes out with a solution that runs the memory at 700MHz or more, there's really no advantage to the 32MB onboard part (in fact, there is a disadvantage at the speeds that we tested). We really can't see any reason for the most expensive part that we tested to cost as much as it does.

For well-rounded performance, the 32MB 64-bit TurboCache part is our pick. Of course, that may change when we look at the 128MB HyperMemory card depending on performance and price at the time.

Business customers who want some added 3D functionality and possibly the ability to play games at the absolute lowest cost will not be disappointed with the 32MB onboard ATI X300 SE HyperMemory solution. If the budget is tight, this is definitely a workable part.

For those who will be buying in volume, even a single dollar counts in the grand scheme of things. With the low prices of these cards, we can expect a lot of competition between vendors in the high volume market. It will take quite a lot for NVIDIA and ATI to build up enough steam to surpass Intel in the graphics solution volume department, but perhaps upcoming integrated graphics solutions from ATI and NVIDIA will be as compelling as these parts show value products can be.