Original Link: https://www.anandtech.com/show/635
For years ATI has been considered to be a leader in multimedia or video features, courtesy of products like their TV tuner and All-in-Wonder lines, however they have rarely been considered to be the high-end performance leaders of the industry.
Like Matrox and S3, ATI was caught completely off guard by the 3D revolution of a few years ago that left the market dominated by companies like the now defunct Rendition as well as a much more familiar name, 3dfx. ATI managed to stay alive courtesy of their extremely strong hold on many OEM markets, places where 3dfx and NVIDIA had a hard time breaking into at first.
The Rage 128 was originally intended to mark ATI’s entry back into the market as a serious contender in terms of performance, not only in terms of OEM acceptance. Unfortunately a horrendously delayed Rage 128 chip gave ATI little in the way of credibility and when it eventually did make it out, the solution wasn’t powerful enough to put ATI’s name anywhere on the performance map.
What ATI needed was a new core, as we noticed with the Rage 128 Pro, increasing the clock speed of the Rage 128 only bought ATI performance that NVIDIA’s TNT2 had been giving us for months before. The introduction of the Rage Fury MAXX shortly thereafter gave ATI a strong taste of what competing in the high-end gaming market was like, and it only foreshadowed what was soon to come once ATI finally got a new core to work with.
Enter the Radeon. Previewed in April of this year and released as well as shipped just two months later, the Radeon turned quite a few heads. For the first time, ATI was not only playing catch-up to NVIDIA but they were even outperforming the king of the market.
It didn’t take long for the market to respond, and since the Radeon’s introduction the pressure has been placed on NVIDIA, and now with the possibility of not seeing a new product until next year, it is ATI’s chance to shine. ATI has definitely seen this opportunity, and just like AMD did following the introduction of their Athlon, ATI is attempting to take advantage of their situation as much as possible.
Recent reports have indicated that ATI was planning to dramatically phase out their Rage 128 line and replace it with a more cost effective solution based on the Radeon core. Just as NVIDIA did with their GeForce2 MX, ATI is doing something very similar by releasing a low-cost version of the Radeon.
While continuing to promote the still relatively new Radeon brand name, ATI is introducing the next incarnation of the line, the Radeon SDR.
The Chip
Whereas NVIDIA actually shipped the GeForce2 MX with a slightly altered version of the GeForce2 GTS core (two rendering pipelines on the MX versus four on the GTS, and a lower clock), ATI is using the exact same Radeon chip on their value product, the Radeon SDR.
Just as a quick refresher, this 0.18-micron Radeon core operates at up to 183MHz. Unlike the GeForce2 GTS, the Radeon only features two rendering pipelines (like the GeForce2 MX) but features three texture units per pipeline. While this doesn’t help the Radeon any in current dual textured games such as Quake III Arena, since one of the texture units is left unused, ATI is banking on the developers implementing higher order multitexture implementations in future games. In a game that takes advantage of three textures for example, the Radeon would be able to render two triple textured pixels at once while the GeForce2 GTS would only be able to do one. But as we just mentioned, this doesn’t help ATI any now, instead it could help the Radeon last a bit longer than the GeForce2 GTS in the long run as well as perform better in such situations.
The Radeon also features ATI’s Charisma engine, which makes them the second major graphics chip manufacturer to embrace an on-board Hardware T&L engine, the first being NVIDIA. The Charisma engine has the ability to perform all transforming, lighting and clipping calculations itself thus offloading some of the work from the host CPU. This puts it, feature-wise, on par with the GeForce2 GTS’ T&L engine. Performance-wise, on paper at least, the ATI T&L engine is capable of processing 30 million triangles per second, compared to 25 million on the GeForce2 GTS.
The Charisma engine also includes quite a few DirectX 8 features which we describe in full detail, as well as other parts of the Radeon core in our full review of the Radeon here.
Video
Card Specification Comparison
|
||||||||||
ATI
Radeon 64DDR
|
ATI
Radeon 32SDR
|
NVIDIA
GeForce 256
|
NVIDIA
GeForce2 MX
|
NVIDIA
GeForce2 GTS
|
3dfx
Voodoo5 5500
|
|||||
Core |
Rage6C
|
NV10
|
NV11
|
NV15
|
Napalm
(VSA-100)
|
|||||
Clock Speed |
183MHz*
|
166MHz
|
120MHz
|
175MHz
|
200MHz
|
166MHz
|
||||
Number of Chips |
1
|
1
|
1
|
1
|
1
|
2
|
||||
Rendering Pipelines |
2
|
2
|
4
|
2
|
4
|
2
|
||||
Texels/Clock |
3
|
3
|
1
|
2
|
2
|
1
|
||||
Texels/Second |
1100
Million
|
1000
Million
|
480
Million
|
700
Million
|
1600
Million
|
667
Million
|
||||
Memory Bus |
128-bit
DDR
|
128-bit
SDR
|
128-bit
SDR/DDR
|
128-bit
SDR or 64-bit SDR/DDR
|
128-bit
SDR/DDR
|
128-bit
SDR
|
||||
Memory Clock |
366MHz*
DDR
|
166MHz
|
166MHz
SDR/300MHz DDR
|
166MHz
SDR
|
333MHz
DDR
|
166MHz
SDR
|
||||
Memory Bandwidth |
5.9GB/s*
|
2.7GB/s*
|
2.7/4.8
GB/s
|
2.7
GB/s
|
5.3
GB/s
|
5.3
GB/s
|
||||
Manufacturing Process |
0.18-micron
|
0.22-micron
|
0.18-micron
|
0.25-micron
(Enhanced)
|
*Note: Figures don't take into account HyperZ
The Radeon SDR will be a 32MB only card, and as the name implies, will feature Single Data Rate SDRAM instead of the Double Data Rate (DDR) SDRAM that has been on all other Radeon cards until now. This is obviously going to cripple the Radeon SDR in the same way the GeForce2 MX’s SDR memory did to it, however, theoretically, there is another factor that must be taken into account that could tilt things in favor of ATI. Before we get to that, let’s look at the various flavors of the Radeon as there has been a recent confusion expressed about the various Radeon versions available.
For starters, there must be an understanding of OEM or “white box” products versus retail products. An OEM or “white box” product is one that, as the name implies, is used by an OEM such as Dell, Micron or any other such manufacturers, in their systems. A retail product is one that you’ll find on the shelves of retail stores, such as Fry’s Electronics, CompUSA, Best Buy and other such stores. Historically, OEM products have always required a bit more care since often times OEM versions of retail products carry slight differences. These differences can range from simple things such as not having any software bundles or not having any TV output, to much more serious issues such as different memory configurations or lower operating frequencies. The latter happens to be the case with ATI’s Radeon line.
We mentioned earlier that the Radeon operates at up to 183MHz. The reason for the “up to” disclaimer is simple, only the retail versions of the Radeon (cards you can find in retail stores) run at 183MHz, the rest run at 166MHz. This means that all OEM or “white box” Radeon cards run at 166MHz.
Things get even more confusing once you take into account the different memory configurations available on Radeon based cards. The retail 64MB Radeon cards, which are only available with DDR SDRAM, run at a 183MHz core clock as well as a 183MHz memory clock (the Radeon core is clocked synchronously with the memory clock, so the two frequencies must be identically clocked). However, as we just mentioned, all OEM Radeons run at 166MHz, which means that the OEM 64MB Radeon DDR runs at 166/166MHz (core/mem). With the same exact card running at 183/183MHz in the retail box, you can start to see why there would be some confusion here.
The 32MB Radeon cards are much easier to deal with since all 32MB Radeons, OEM and retail, run at 166/166MHz (core/mem). This includes the 32MB All-in-Wonder Radeon cards as well.
And using that information it’s pretty easy to guess that the Radeon SDR, which is a 32MB only card, runs at 166/166MHz (core/mem). It also happens to be that the Radeon SDR will be sold as an OEM card only.
ATI
Radeon Product Line Comparison
|
|||||||||||
Card
|
OEM/Retail
|
Memory
Size
|
Clock
Speed
|
TV
Tuner
|
Video
In/Video Out
|
Current
Price |
|||||
Radeon
64DDR |
Retail
|
64MB
|
183/183MHz
|
No
|
Yes
|
$320
|
|||||
Radeon 64DDR |
OEM
|
64MB
|
166/166MHz
|
No
|
Yes
|
$300
|
|||||
Radeon 32DDR |
OEM
& Retail
|
32MB
|
166/166MHz
|
No
|
No
|
$230
|
|||||
Radeon 32SDR |
OEM
only
|
32MB
|
166/166MHz
|
No
|
No
|
$150
(est)
|
|||||
All-in-Wonder Radeon |
OEM
& Retail
|
32MB
|
166/166MHz
|
Yes
|
Yes
|
$300
|
HyperZ: Attacking the Memory Bandwidth Issue
At 166MHz, the Radeon SDR’s core features a 1 Gigatexel/s fill rate, however because of its limited memory bandwidth, the Radeon SDR will never see a real world fill rate that high. This is the same situation that was present with the GeForce2 MX that would never take advantage of its 700 Megatexels/s fill rate because of similar memory bandwidth limitations.
ATI does have a trick up their sleeve however that could theoretically help the Radeon SDR not fall victim to the same fate as the memory bandwidth crippled GeForce2 MX. This trump card is ATI’s HyperZ technology. Primarily achieving a decrease in memory bandwidth utilization by efficiently caching and compressing data stored in the Z-buffer (data pertaining to depth), HyperZ can, according to ATI, increase available memory bandwidth up to 20%.
With the Radeon SDR suffering from an available 2.7GB/s memory bandwidth, eqvuivalent to that of a GeForce SDR or a GeForce2 MX, the Radeon SDR needs all the help it can get to more efficiently manage its limited memory bandwidth.
The HyperZ technology is essentially composed of three features that work in conjunction with one another to provide for this "increase" in memory bandwidth. In reality, the increase is simply a more efficient use of the memory bandwidth that is there. The three features are: Hierarchical Z, Z-Compression and Fast Z-Clear. Before we explain these features and how they impact performance, you have to first understand the basics of conventional 3D rendering.
As we briefly mentioned before, the Z-buffer is a portion of memory dedicated to holding the z-values of rendered pixels. These z-values dictate what pixels and eventually what polygons appear in front of one another when displayed on your screen, or if you're thinking about it in a mathematical sense, the z-values indicate position along the z-axis.
A traditional 3D accelerator processes each polygon as it is sent to the hardware, without any knowledge of the rest of the scene. Since there is no knowledge of the rest of the scene, every forward facing polygon must be shaded and textured. The z-buffer, as we just finished explaining, is used to store the depth of each pixel in the current back buffer. Each pixel of each polygon rendered must be checked against the z-buffer to determine if it is closer to the viewer than the pixel currently stored in the back buffer.
Checking against the z-buffer must be performed after the pixel is already shaded and textured. If a pixel turns out to be in front of the current pixel, the new pixel replaces (or is blended with in the case of transparency) the current pixel in the back buffer and the z-buffer depth updated. If the new pixel ends up behind the current pixel, the new pixel is thrown out and no changes are made to the back buffer (or blended in the case of transparency). When pixels are drawn for no reason, this is known as overdraw. Drawing the same pixel three times is equivalent to an overdraw of 3, which in some cases is typical.
Once the scene is complete, the back buffer is flipped to the front buffer for display on the montior.
What we've just described is known as "immediate mode rendering" and has been used since the 1960's for still frame CAD rendering, architectural engineering, film special effects, and now in most 3D accelerators found inside your PC. Unfortunately, this method of rendering results in quite a bit of overdraw, where objects that aren't visible are being rendered.
One method of attacking this problem is to implement a Tile Based Rendering architecture, such as what we saw with the PowerVR Series 3 based KYRO graphics accelerator from ST Micro. While that may be the ideal way of handling it, developing such an algorithm requires quite a bit of work, it took years for Imagination Technologies (the creator of the PowerVR chips) to get to the point they are today with their Tile Based Rendering architecture.
For this time around, that wasn't a possibility for ATI, as they needed to get the Radeon on the market as soon as possible to avoid otherwise devestating consequences of losing market share. Instead, ATI's solution was to optimize the accesses to the Z-buffer. From the above example of how conventional 3D rendering works, you can guess that quite a bit of memory bandwidth is spent on accesses to the Z-buffer in order to check to see if any pixels are in front of the one being currently rendered. ATI's HyperZ increases the efficiency of these accesses, so instead of attacking the root of the problem (overdraw), ATI went after the results of it (frequent Z-buffer accesses).
The first part of the HyperZ technology is the Hierarchical Z feature. This feature basically allows for the pixel being rendered to be checked against the z-buffer before the pixel actually hits the rendering pipelines. This allows useless pixels to be thrown out early, before the Radeon has to render them.
Next we have Z-Compression. As the name implies, this is a lossless compression algorithm (no data is lost during the compression) that compresses the data in the Z-buffer thus allowing it to take up less space which in turn conserves memory bandwidth during accesses to the Z-buffer.
The final piece of the HyperZ puzzle is the Fast Z-Clear feature. Fast Z-Clear is nothing more than a feature that allows for the quick clearing of all data in the Z-buffer after a scene has been rendered. Apparently ATI's method of clearing the Z-buffer is dramatically faster than other conventional methods of doing so.
But exactly how much does HyperZ do in terms of increasing available memory bandwidth? While ATI was unwilling to disclose the methods for which we could disable HyperZ under OpenGL, we were able to do so under Direct3D using the same registry keys that Sharky Extreme posted a few months ago.
In order to turn off HyperZ one feature at a time we had to edit the Windows 98 Registry. Start by firing up regedit.exe:
1. Open the key, [HKEY_LOCAL_MACHINE\Software\ATI
Technologies\Driver\0001\atidxhal\] by navigating down to it in the left hand
pane.
2. If they aren't there already, add the following strings:
DisableHierarchicalZ
DisableHyperZ
FastZClearEnabled
The "DisableHierarchicalZ" string, if set to a value of "1" will disable the HierarchicalZ feature of HyperZ. The "DisableHyperZ" string, if set to a value of "1", will disable the Z-Compression feature of HyperZ. And finally, setting "FastZClearEnabled" to a value of "0" will disable the Fast Z-Clear feature. Disabling all three features will completely disable HyperZ.
Now that we know how to disable HyperZ, at least under Direct3D, let’s take a look at the benchmarks and see exactly how effective HyperZ is where it truly matters, on a very memory bandwidth limited card like the Radeon SDR.
Using Reverend's Thunder Demo under UnrealTournament we notice that HyperZ is really doing quite a bit for the Radeon SDR. Not only does it improve the minimum frame rate by around 50% but it also increases the average frame rate by close to 40%.
The next question to ask is which of the three HyperZ features does the most? In order to answer that we disabled each one individually and ran the Thunder benchmark.
Effects
of HyperZ Features on Performance
UnrealTournament - 1024 x 768 x 32 |
|||||||||||
Feature
|
Minimum
Frame Rate
|
Average
Frame Rate
|
Maximum
Frame Rate
|
Average
Change in Performance
|
|||||||
HyperZ Enabled |
34
|
65
|
144
|
N/A
|
|||||||
Hierarchical Z Disabled |
25
|
64
|
140
|
-2%
|
|||||||
Z-Compression Disabled |
21
|
50
|
117
|
-22%
|
|||||||
Fast Z-Clear Disabled |
18
|
46
|
94
|
-29%
|
|||||||
HyperZ Disabled |
17
|
40
|
78
|
-39%
|
As the above chart shows us, at least under UnrealTournament, it seems like most of the performance benefit comes from Z-Compression as well as the Fast Z-Clear functions of ATI's HyperZ. Assuming that our method of disabling the various HyperZ functions was correct and we did indeed control the status of those three features, it can be concluded that in UT and games like it, that ATI's Hierarchical Z, while providing some memory bandwidth improvement, isn't nearly as effective as the other two features.
What seems to have made the biggest difference was the Fast Z-Clear feature, it seems like quite a bit of improvement comes from simply more efficiently clearing of the Z-buffer.
There's no doubt that HyperZ is helping ATI out quite a bit, especially in the case of the Radeon SDR. So without further ado, let's take a look at exactly what HyperZ, when combined with the rest of the Radeon's features, can do in terms of performance against the competition.
The Test
Windows 98 SE Test System |
|||||||
Hardware |
|||||||
CPU(s) | AMD Athlon (Thunderbird) 1.1GHz | ||||||
Motherboard(s) | ASUS A7V | ||||||
Memory | 128MB PC133 Corsair SDRAM (Micron -7E Chips) | ||||||
Hard Drive |
IBM Deskstar DPTA-372050 20.5GB 7200 RPM Ultra ATA 66 |
||||||
CDROM |
Phillips 48X |
||||||
Video Card(s) |
3dfx Voodoo4 4500 AGP 32MB
ATI Radeon 64MB DDR NVIDIA
GeForce2 MX 32MB SDR
|
||||||
Ethernet |
Linksys LNE100TX 100Mbit PCI Ethernet Adapter |
||||||
Software |
|||||||
Operating System |
Windows 98 SE |
||||||
Video Drivers |
|
||||||
Benchmarking Applications |
|||||||
Gaming |
idSoftware
Quake III Arena demo001.dm3 |
Starting out, we see the Radeon SDR take a back seat to the GeForce2 MX, it's closest competitor. The difference in performance between the two is just under 5% and can be attributed to T&L and driver superiority on NVIDIA's part.
The picture immediately begins to change once the resolution goes up. At 1024 x 768 x 32 the GeForce2 MX loses the 5% lead we described at 640 x 480 x 32 and gives way to the Radeon SDR's more efficient usage of memory bandwidth as well as higher fill rate.
Also notice that there is a third competitor here, the Voodoo4 4500, however it is easily outperformed by both the Radeon SDR and the GeForce2 MX.
As you push the resolution even higher, the GeForce2 MX falls a full 24% slower than the Radeon SDR as it no longer has the fill rate nor the memory bandwidth to keep up with the Radeon SDR. The Radeon SDR performs much more like the GeForce DDR at this high of a resolution and is even very slightly above the performance of a Voodoo5 5500.
MDK2 starts off with a much different picture than what we saw under Quake III Arena, the three NVIDIA cards completely dominate at the lower resolution. Once again, at this resolution fill rate and memory bandwidth limitations aren't factors, instead drivers and hardware T&L play the largest roles here, as well as vendor specific optimizations.
As the GeForce2 MX's limited memory bandwidth becomes a factor, it drops far from the ranks of its two elder brothers however it still holds a 10% lead over the Radeon SDR.
The Voodoo4 4500, once again, is no competition for either of the two boards.
Once again, at 1600 x 1200 x 32 the Radeon SDR maintains a 24% performance lead over the GeForce2 MX, and again it is performing more like a GeForce DDR or Voodoo5 5500 than the GeForce2 MX. In comparison to its DDR counterpart, the Radeon SDR is about 25% slower.
UnrealTournament is a very CPU limited benchmark and doesn't take full advantage of neither ATI nor NVIDIA's Hardware T&L engine, thus the minimum frame rates are very similar across the board here at 640 x 480 x 32.
Other than the Voodoo4 4500, most of the contenders performed similarly on average as well at this resolution. The GeForce2 MX held onto a 4% lead over the Radeon SDR.
With minimum frame rate numbers we can see that the Radeon SDR doesn't allow the frame rate to drop below 34 fps throughout the course of the demo while the GeForce2 MX let's it slip an 42% lower down to 24 fps. Let's see if the average frame rate tells the same story.
Unlike the Quake III Arena and MDK2 scores, we see the Radeon SDR outpacing the GeForce2 MX, even in the average frame rate numbers at 1024 x 768 x 32.
We know that UnrealTournament makes some very heavy use of textures, and thus the more efficient Z-buffer management on behalf of the Radeon SDR combined with its higher fill-rate gives it the advantage over the GeForce2 MX. The card however is still outperformed by the Voodoo5 5500 and the GeForce DDR.
At 1600 x 1200, you can't expect the minimum frame rate of the Radeon SDR to be too high...
...however on the average, the Radeon SDR is once again looking more like the GeForce DDR which has twice it's memory bandwidth instead of the GeForce2 MX which has the same amount of memory bandwidth as the Radeon SDR.
As we showed at the beginning of the benchmarks, without HyperZ, the Radeon would be struggling pretty bad here.
16-bit vs 32-bit Performance
We recently switched our testing methodology to a 32-bit only test suite for card performance, however it is important to investigate 16-bit performance as well, and thus we have put together a section on the card's 16-bit performance in comparison to its 32-bit performance.
Here we see that although the GeForce2 MX has a higher level of 16-bit performance, it's 32-bit performance is approximately 4% lower than the Radeon SDR's. ATI's argument here is that you don't buy a card that can run well under 32-bit color and play all of your games in 16-bit color, and it is a very fair and valid argument to make.
At 1600 x 1200 the GeForce2 MX still holds a lead in 16-bit color, and once again the Radeon offers superior performance under 32-bit color. The argument that 32-bit color doesn't really matter would've worked just a year and a half ago, however now that games are actually taking advantage of the extra color information, it is becoming increasingly important to concentrate on 32-bit color performance. In this case, the Radeon comes away with a 24% advantage in 32-bit color.
CPU Scaling Performance
We purposefully benchmark all of the cards on a fairly high performance platform, in this case a 1.1GHz Thunderbird, in order to gain an understanding for the uncapped performance of the solutions. However, not everyone has a 1.1GHz processor so in order to get a better idea for what CPU speeds are necessary to get the most out of the Radeon SDR, we look to a CPU scaling graph to see how the card performs with various CPUs.
The performance improvements begin to level off towards the higher clock speed CPUs, indicating that other limitations are kicking in at that point. The Athlon, clock for clock, is only slightly slower than the Pentium III while using the Radeon SDR card. It seems like both CPUs have been taken advantage of fairly well by ATI's drivers, either that or there are very few platform specific optimizations at work.
Windows 2000 Driver Performance
ATI has always been criticized for their driver quality, and with the release of Windows 2000 now behind us we are still seeing some manufacturers skimping on their Windows 2000 driver support. NVIDIA has been wonderful thus far with their support under Windows 2000 as well as Linux, let's see how ATI stacks up.
Not good at all. The Windows 2000 driver is running at less than half the speed of its Windows 98 counterpart at 640 x 480. While the percentage drops a bit at 1024 x 768, the performance numbers under Windows 2000 are completely unacceptable. There is no reason that the card should be performing this poorly under Windows 2000.
Video Features & Playback
The Radeon SDR also features the same hardware motion compensation and iDCT engines that the other Radeon products do, for more information on their impacts on DVD performance as well as the Radeon’s video playback quality visit our October 2000 Video Card Roundup entitled: DVD Quality, Features & Performance.
Just like all of the other Radeon based boards except for the retail 64MB DDR and All-in-Wonder Radeon products, the Radeon SDR features no video input or output ports by default.
Final Words
At lower resolutions the Radeon SDR falls behind the GeForce2 MX, however at the higher resolutions (around 1024 x 768 x 32 and above), the Radeon SDR begins to perform more along the lines of a Voodoo5 5500 or a GeForce DDR. In terms of FSAA quality and performance, not much has changed since we did our FSAA Comparison, so for more information on how the Radeon stacks up there take a look at the comparison.
The Radeon SDR marks ATI's first entry into the low-cost market with a Radeon based product, and because of their powerful HyperZ features, the Radeon SDR does not flop as a performance product either. The Radeon SDR will give some hefty competition to the GeForce2 MX, but not as the solution currently stands.
First of all, the Radeon SDR's estimated $150 price tag is entirely too expensive for a card that does not have the TwinView features of the GeForce2 MX or the DualHead features of the Matrox G450. The Radeon SDR needs to be much closer to the $100 price point in order for it to be a viable alternative for many. Considering that a GeForce2 GTS can be had for close to $180 now, a $150 Radeon SDR isn't the most attractive option.
Secondly, the Radeon SDR and the Radeon in general is in desperate need of improved Windows 2000 drivers. The performance numbers tell the horrid story of a 50% drop in performance simply by moving to Windows 2000, that's not a very pleasing story to be hearing. While Microsoft did intend Windows 2000 to be a professional OS only, the fact of the matter is that there are quite a few power users that use the OS at home, and for them, anything less than the same performance they would get under Windows 98 is unacceptable.
On the bright side of things, the Radeon SDR will be very attractive to OEMs. Just as OEMs are doing now with the GeForce2 MX, OEMs will be able to say that their systems are powered by ATI's Radeon graphics accelerator while making use of the cheaper SDR version of the card. Unfortunately, without any TwinView or DualHead-like features, the Radeon SDR will have some trouble gaining the same acceptance, professionally, that the GeForce2 MX and Matrox G4xx line of cards have.
In summary, ATI has the potential to take the Radeon SDR quite far, provided that the price drops and drivers improve. If ATI wants to compete with NVIDIA, they need to do so not only based on performance, but based on price as well as drivers.