Name: NVIDIA's GeForce3: "NV20" Revealed
Item: NVIDIA's GeForce3: "NV20" Revealed
Author: Anand Lal Shimpi

Original Link: https://www.anandtech.com/show/726

NVIDIA's GeForce3: "NV20" Revealed

VIEW ARTICLE

by Anand Lal Shimpi on February 27, 2001 9:16 AM EST

Posted in
GPUs

0 Comments

The tension has been mounting. The world and yours truly expected the unveiling of what would undoubtedly be the king of the 3D graphics market in August of last year. Unfortunately, much to our disappointment, all we saw was a handpicked GeForce2 GTS variant and some new drivers from NVIDIA. Discussion was rampant across the web about NVIDIA's "missed" product cycle, and although some were unwilling to accept the fact that the elusive NV20 wasn't going to surface for another 6 months there was little that could be done about it.

The opportunity was there; 3dfx, ATI or even Matrox had the chance to step forward and take the limelight. And although we had expected ATI to step forward with at least a Radeon MAXX last Comdex, all that came out of the 3D graphics market was NVIDIA's acquisition of 3dfx.

We actually owe NVIDIA an apology. When we learned that the NV20 was to be pushed back a full product cycle and to take its place was the GeForce2 Ultra, we definitely gave them a hard time. However, as you are about to see, there was no way that the NV20 could have made it out last year for the very reasons NVIDIA gave the public: mainly that DirectX 8 wasn't ready and that they were still waiting on the 0.15-micron process to come up to speed for this particular chip.

Fast forwarding to the present day, NVIDIA just recently unveiled the 'NV20' as the GeForce3 in Tokyo last week…to an all Mac audience. There was quite a bit of commotion surrounding this release simply because it is in fact the PC industry that has put NVIDIA where they are today, but there's no reason to get upset, the GeForce3 will be out for the PC at the same time if not earlier than the Mac community gets it. The biggest question we'll have to ask is what games will be out that will take advantage of it and unfortunately that requires much more speculative of an answer.

Alas, we are getting ahead of ourselves; we're talking about buying a card we know nothing about. Luckily that is all going to change today; what you are about to see is quite possibly the most interesting product to come to the 3D graphics market since the introduction of 3Dfx's (with a capital D) Voodoo graphics accelerator. We've kept you waiting long enough, without further ado, let's have a look at NVIDIA's GeForce3: The Infinite Effects GPU.

Brute Force vs. Elegance & Finesse

Last year we saw the introduction of the GeForce2 GTS, GeForce2 Ultra and GeForce2 Pro all from NVIDIA. Not a single one of those products approached the problems inherent with high performance 3D graphics accelerators; they simply offered higher clock speeds and faster memory to make the frames fly.

This brute force approach was necessary mainly because NVIDIA was in quite intense competition with the now defunct 3dfx. Every press release they made was full of quotes of extremely high peak fill rate and memory bandwidth figures; it was a quest to see who could produce and somewhat justify the most incredibly inflated peak fill rate number without lying.

The problem was that little attention was paid to the fact that if we were all capable of attaining these incredible fill rates then we wouldn't have a problem with running at 1600 x 1200 x 32 with 4-sample FSAA enabled. The reality of it all was that even the GeForce2 GTS with its 800 Mpixels/s fill rate was only capable of a 300 Mpixels/s fill rate in a real world situation. Bottlenecked by memory bandwidth, or a lack thereof, the GeForce2 line of graphics chips failed to offer any solution to this problem of real world performance other than by simply using faster memory.

You can say that NVIDIA was in fact caught up in the idea of competing with 3dfx so much that they failed to realize that all they were doing was seeing who could build the biggest rocket without so much as a thought to efficiency.

With NVIDIA's acquisition of 3dfx, the game has changed. NVIDIA doesn't have to worry about competing on a fill rate basis with ATI because they have not been playing that game with the Radeon. If you remember, the peak fill rate of the Radeon is no more than 366 Mpixels/s and they have historically been preaching the efficiency of the Radeon rather than taking the brute force approach of the 3dfx vs. NVIDIA era.

It is this new approach to how these chips and cards are presented that is quite refreshing. And amid an increasingly dull computer hardware industry, it is a breath of fresh air courtesy a market we honestly didn't expect it from.

This is why that through the 22-page GeForce3 Press Presentation there isn't a single mention of the incredibly misleading peak fill rate numbers that have plagued this industry for so long. We've preached the idea of real-world performance countless times in our CPU reviews, now it's time to echo those sentiments in our graphics reviews.

Nothing more than a big Ultra?

The first thing to get over with the GeForce3 is that the 0.15-micron core, with 57 million transistors, has the same number of pixel pipelines and is clocked at the same core speed as NVIDIA's GeForce2 GTS.

The reason for the die-shrink was to make room for the 128% increase in transistor count and to keep the GPU from eating too much power although we already expect it to run fairly hot. For those that don't remember, the GeForce2 GTS core was produced on a 0.18-micron fabrication process and boasted an incredible 25 million transistors. At 57 million transistors, the GeForce3 GPU is more complex than Intel's Pentium 4 processor in terms of sheer transistor count. Obviously this is a flawed comparison since the CPU guys have a much tougher job cramming in all the logic into as small and as efficient of an area, but it is a noteworthy parallel to make.

The GeForce3 still features the same four pixel pipelines that were present on the original GeForce 256 chip and two texture units per pipeline, a feature first introduced on the GeForce2 GTS.

The first advancement comes in the fact that the GeForce3 can now do single-pass quadtexturing. The GeForce 256 could use its four pixel pipelines to render four single textured pixels or two double textured pixels in a single pass, or a single clock. The GeForce2 GTS improved on this by adding a second texture unit per pixel pipeline, meaning that it could now render four dual textured pixels in a single clock. This is what gave the GeForce2 it's "Giga-Texel" moniker since it was now able to boast a peak fillrate of 1.6 Gtexels/s. However the Geforce2 is only capable of applying a maximum of two textures to a pixel in a single pass. The GeForce3 takes this one step further by now being able to do quadtexturing in a single pass, although it still has only two texture units per pipeline. This means that the GeForce3 can handle two quadtextured pixels in a single pass versus having to make two rendering passes in order to render the pixels. NVIDIA claims that this offers a theoretical peak fillrate of 3.2Gtexels/s although it seems like it's definitely time to stop quoting theoretical fillrate numbers in this industry.

The usefulness of this is incredible, to quote one of the industry's greatest developers, John Carmack, his experience with this feature of the GeForce3 was quite positive:

"An interesting technical aside: when I first changed something I was doing with five single or dual texture passes on a GF to something that only took two quad texture passes on a GF3, I got a surprisingly modest speedup. It turned out that the texture filtering and bandwidth was the dominant factor, not the frame buffer traffic that was saved with more texture units." - John Carmack, idSoftware

As for the memory clock of the GeForce3, it will be identical to that of the Ultra. Making use of the same 230MHz DDR SDRAM that debuted on the Ultra, the GeForce3 doesn't offer more raw memory bandwidth but an amount identical to that of the Ultra. This is still the greatest amount of raw memory bandwidth offered on any consumer level 3D graphics accelerator: 7.36GB/s.

This paints the picture that the theoretical fill rate of the GeForce3 is no greater than the previous generation of graphics chips from NVIDIA; if this were all the GeForce3 promised, we wouldn't be nearly as excited about this technology as we are. Keep on reading…

Finally a true GPU

When the GeForce 256 was announced, NVIDIA was touting its achievement of bringing the worlds first Graphics Processing Unit (GPU) to the market. We hesitated in calling it a GPU simply because it drew a parallel between the GeForce 256 and a CPU that we were not willing to make. A CPU like the AMD Athlon and Intel Pentium III was much more useful than NVIDIA's first GPU simply because they were truly programmable entities. They had an instruction set (x86) and programmers could write code to manipulate the CPU's power in whatever means they saw necessary.

The GPU however was not as advanced of a chip. Developers were stuck with a feature-set that NVIDIA's engineering team implemented and could not control the chip in same manner with which x86 programmers can manipulate the Athlon or the Pentium III. NVIDIA introduced a whole new list of features that were supported in hardware with the GeForce2 GTS and its shading rasterizer; however if a developer did not implement a particular function as it was provided for in hardware, their function was useless in the eyes of the GPU. There was a severe lack of flexibility with this revolutionary GPU.

One of NVIDIA's most highly touted features was their hardware transforming and lighting engine that was designed to offload the transforming and lighting calculations from the host CPU to the GPU in an attempt to increase overall performance. Unfortunately very few games still can take advantage of this powerful T&L engine; Quake III Arena can make use of the GPU's hardware accelerated transformation engine however its own lighting engine makes the GPU's other function useless. That is the least we could ask for, games like UnrealTournament could not take advantage of even the GPU's transformation engine. With all due respect to the engineers at NVIDIA, they are not game developers and there is no way they could know the best way to implement a lot of these features in hardware so that everyone is happy.

The solution to this ongoing problem was to take a page from the books of desktop CPU manufacturers. If developers are constantly asking for their features to be implemented in hardware, why not place the burden on them and make them write the code necessary to manipulate the hardware NVIDIA manufactures?

The GeForce3 is thus the first true GPU from NVIDIA. It still features the same T&L engine from before, however now it is fully programmable. The GeForce3 features something NVIDIA calls the nfiniteFX engine that is made up of the hardware T&L unit we are used to, plus an nfiniteFX Vertex processor and an nfiniteFX Pixel processor. This is where the bulk of the transistor count in the GPU comes from.

The instruction set the GeForce3 understands is what is known as the Vertex Shader Instruction Set. This is the equivalent of the x86 instruction set in the PC processor world albeit specifically tailored for the needs of the GeForce3.

The Vertex processor handles the initial geometry and setup math necessary for the production of the 3D scene that is being rendered. When you're dealing with polygons (and obviously their vertices, hence the name), the Vertex processor is your one-stop-shop for all your 3D calculation needs. As you can see by the above graphic, the transformation and part of the lighting stages occur at this point during the rendering process as the scene is being setup. The polygons are transferred from mathematical space stored in memory to their 3D counterparts that will be shortly rendered to the frame buffer and dumped onto the screen.

Operations such as vertex lighting, morphing, key frame animation and vertex skinning are all functions that can be taken advantage of in programs developers will custom make that will be run by the Vertex processor. You've already heard of a lot of the aforementioned operations from our ATI Radeon review, except for now, instead of being limited to what the hardware engineers define should occur when a vertex skinning operation is initiated, the developer is free to control the outcome on their own.

Examples of what the Vertex processor is able to produce are things such as facial expressions. Gone are the days when a developer uses a single blocky texture to represent a hand, now things like individual fingers moving across a keyboard will be easily represented. Using a combination of polygons and programmable lighting effects, some very realistic models and actions can be produced.

An example of vertex skinning from the ATI Radeon Card

The GeForce3 also supports hardware acceleration of curved surfaces as well as higher order surfaces which can be useful as it's easier to represent an arc with a quadratic function than it is with a bunch of small triangles. It saves bandwidth over the AGP bus by using fewer polygons.

The next step in the process is the finalization of the lighting process and the actual rendering of the pixels present in the scene for final display. This is obviously handled by the Pixel processor. One of the basic concepts of 3D acceleration is the idea of using textures. In the early days of 3D gaming a character could be represented by a few polygons covered with two-dimensional textures. The textures would store only a few bits of data, mainly pertaining to the color of the texture.

With the advent of the technology behind the GeForce3's GPU, the opportunity for the texture to become much more useful is limitless. Instead of just holding values for color, textures can now start to hold variables, direction vectors and lighting vectors; all of these values can be actually encoded into the texture.

What's the benefit of this? After the Vertex processor has setup the polygons in a scene and the Pixel processor comes in to apply textures, the textures not only tell the Pixel processor what color they will be but they also tell the processor how they will react if certain events occur. Using the encoded direction and lighting vectors of a pixel, the pixels can now realistically reflect light through a dot product calculation. And the manner in which the pixel reflects light will change according to the direction vector of the light source.

Imagine a wrinkle on a character's face. That wrinkle now contains much more information simply than it's color and general orientation. If a light source shines upon the wrinkle it will now cast a realistic shadow through a series of vector multiplication operations. Through a series of pixel shader programs the developer can now make their game completely unique. Doom3 will look completely different from Duke Nukem Forever not because of the fact that idSoftware uses different textures than 3DRealms, but because their vertex and pixel programs are completely different.

Remember Environment Mapped Bump Mapping Matrox preached with the release of the G400? Instead of having another texture create the illusion of bumps on a surface, the GeForce3 can actually do real time calculation of the effects on ripples in water and how they interact with surrounding objects such as rocks. If you ever noticed, with a game that supported EMBM, you never had a situation where something was protruding out of the bump-mapped water and had the water interact with the object. In the real world waves change according to the rocks they are interacting with, in the EMBM world they didn't. This is because with EMBM all you had was another texture on top of the regular textures representing the water and its ripples.

The GeForce3 will take advantage of Dot3 bump mapping which again uses dot products of direction vectors to produce the resulting ripple effects in water. This isn't limited to water alone, as walls and other surfaces will receive the same treatment. Dot3 has been around for a while but rarely used because of either poor implementations in hardware or a lack of flexibility for developers to use it. EMBM was rarely used because the penalty of rendering another texture was often too great for the cards that supported it. With the GeForce3's extremely flexible GPU, developer's can truly take the power into their own hands and you can finally expect more than just ugly looking 2D textures on walls and completely unrealistic water.

Taking advantage of it all

To summarize, realistic reflections, skin, hair, walls and overall extremely detailed surfaces are now finally going to be made possible through the use of the programmable technology behind NVIDIA's nfiniteFX engine.

As far as compatibility with competitors goes, NVIDIA licensed out this technology to Microsoft for use in DirectX 8. It is an open standard that is accessible by all of NVIDIA's competitors, so you can expect to see similar functions in ATI's next product as well. The OpenGL API will be able to take advantage of these functions through NVIDIA's own extensions to the API that are implemented through the drivers.

NVIDIA has put together a helpful set of custom-made vertex and pixel shader programs that are available on their development site. Their effects browser can be used to not only see the effects of these programs but also the code behind them and is a useful introductory tool into the world of the programmable GPU.

Unfortunately, and this is a very big caveat for early adopters of the GeForce3, it is still going to be a little while before we see DirectX 8 games hitting store shelves and it will be even longer before we see the general mass of games truly take advantage of this power. idSoftware's Doom3 is still looking at an estimated 2002 release date, between now and then there will definitely be titles that take advantage of the features, whether they are worthy of the $500 price tag of the GeForce3 however is another question. It may almost be worth waiting another 6-months for NVIDIA's next card release (by that time ATI will have a DX8 compatible part similar to the GeForce3 out as well) before upgrading.

To the current crop of games that don't utilize these advanced features, the GeForce3 is still much like a GeForce2 Ultra. This was unfortunately a reality that NVIDIA had to face with the GeForce3. It is an excellent technology and the transition to it will have to occur at some point but the value of an upgrade to it at this point is questionable because of the lack of gaming titles.

Luckily NVIDIA realized this and did attempt to sweeten the performance at least a little bit.

Lightspeed Memory Architecture

To any of you that have been on top of the slew of NV20 rumors that have appeared over the past year you're probably expecting NVIDIA's next buzzword to explain to be Hidden Surface Removal. We're sorry to disappoint, but that particular term isn't in their vocabulary this time around, however there is something much better.

The GeForce3's memory controller is actually drastically changed from the GeForce2 Ultra. Instead of having a 128-bit interface to memory, there are actually four fully independent memory controllers that are present within the GPU in what NVIDIA likes to call their Crossbar based memory controller.

These four memory controllers are each 32-bits in width that are essentially interleaved, meaning that they all add up to the 128-bit memory controller we're used to, and they do all support DDR SDRAM. The crossbar memory architecture dictates that these four independent memory controllers are also load balancing in reference to the bandwidth they share with the rest of the GPU.

The point of having four independent, load balanced memory controllers is for increased parallelism in the GPU (is anyone else picking up on the fact that this is starting to sound like a real CPU?). The four narrower memory controllers come quite in handy when dealing with a lot of small datasets. If the GPU is requesting 64-bits of data, the GeForce2 Ultra uses a total of 256-bits of bandwidth (128-bit DDR) in order to fetch it from the local memory. This results in quite a bit of wasted bandwidth. However in the case of the GeForce3, if the GPU requests 64-bits of data, that request can be handled in 32-bit chunks, leaving much less bandwidth unused. Didn't your mother ever tell you that it's bad to leave food wasted? It looks like NVIDIA is finally listening to their mother.

The next part of this Lightspeed Memory Architecture is the Visibility Subsystem. This is actually the closest thing to "HSR" that exists in the GeForce3. As we are all aware of, when drawing a 3D scene, there is something called overdraw; where a pixel or polygon that isn't seen on the monitor, is rendered and outputted to the screen anyways. ATI managed to combat this through the use of what they called Hierarchical-Z. NVIDIA's Visibility Subsystem is identical to this.

What the feature does is simple; it is an extremely fast comparison of values in the z-buffer (what stores how "deep" pixels are on the screen, and thus whether they are visible or not) to discard those values (and their associated pixels) before sending them to the frame buffer.

This technology isn't perfect, because there are going to be a number of cases in which the Visibility Subsystem fails to reject the appropriate z-values and there is some remaining overdraw. If you recall back to our Radeon SDR Review, we actually measured the usefulness of Hierarchical-Z in UnrealTournament and enabling it increased performance by a few percent.

The remaining two features that make up ATI's HyperZ are also mirrored in the GeForce3. The GeForce3 features the same lossless Z-buffer compression as the Radeon. The compression savings can be as great as 4:1, which is identical to what ATI claims that the Radeon's z-buffer compression can do as well.

Finally, NVIDIA claims that they have an equivalent to ATI's Fast Z-Clear. This function does a very quick clear of all the data in the Z-buffer, which we actually found to be quite useful in our UnrealTournament performance tests not too long ago. Enabling Fast Z-Clear alone resulted in a hefty increase. Interestingly enough, NVIDIA downplayed the usefulness of this feature. It wasn't until we asked if they had anything similar to it in our meeting with them that they mentioned it existed in the GeForce3; they did mention that they did not see any major performance gains from the feature, which is contrary to what we saw with the Radeon.

Remember that the Radeon has significantly less peak memory bandwidth than the GeForce2 Ultra and thus the GeForce3. The GeForce3 however is still able to benefit from the aforementioned features, although maybe not as much as the Radeon was and possibly more so in different areas than the Radeon as NVIDIA seemed to indicate with their downplaying of the importance of a Fast Z-Clear-esque function on the GeForce3.

The truth is heard: FSAA is important

Quite possibly the biggest struggle we've ever seen in the 3D graphics market was 3dfx's attempt to convince the industry that their FSAA was of more value than the raw speed NVIDIA could promise. Unfortunately their message wasn't heard clear enough and combined with an extremely late Voodoo5 release, it wasn't enough to do battle with NVIDIA's stronghold.

Now that NVIDIA has the core assets of 3dfx, and along with that a total of 110 engineers and architects from the former 3dfx, NVIDIA is focusing a little more on bringing FSAA to the masses. However the methods employed by 3dfx and NVIDIA with the supersampling FSAA that was pushed so much last year are simply not efficient enough. Supersampling consisted of the graphics accelerator rendering more pixels than were actually going to be displayed, and filtering them down to the point where a lot of the aliasing artifacts were removed. Unfortunately this is very inefficient since you had 2x and 4x reductions in peak fill rate with these supersampling FSAA methods.

With the GeForce3, NVIDIA introduced their High Resolution Antialiasing or HRAA. NVIDIA's HRAA is a true hardware FSAA solution that relies on a multisampling algorithm to achieve the same reduction of aliasing that we're used to from conventional supersampling methods but without the reduction in fill rate. The reduction in fill rate is avoided by making sure that the multiple samples being taken when AA is enabled are all actually using the same texture data. So you get the same memory bandwidth hit, but the pixels are all using the same texture data and thus don't have to be re-rendered.

NVIDIA is combining this with a new AA algorithm using a Quincunx pattern to obtain the points to blend together to actually get rid of the aliasing artifacts. Before you jump on NVIDIA's marketing team for the name Quincunx you have to understand that it's actually the name of the five-pointed pattern that NVIDIA uses as the basis for their HRAA algorithm.

The combination of the GeForce3's hardware based multisampling HRAA and its use of Quincunx patterns results in quality identical to what you would expect from a conventional 4-sample FSAA solution yet with the performance hit of what you would expect from a 2-sample FSAA solution.

The number NVIDIA is quoting with this is running Quake III Arena at 1024 x 768 x 32 with Quincunx AA enabled and getting 74 fps. This is obviously twice as fast as the GeForce2 Ultra with its 4X FSAA enabled under the same situation.

The Card & Final Words

NVIDIA's GeForce3 will be only available in 64MB configurations, with no initial support for 32MB OEM models. The chance of seeing 128MB boards is very rare currently since the extremely fast DDR SDRAM that is used on the GeForce3 boards would actually cause the cost of a 128MB GeForce3 board to be twice of that of a 64MB GeForce3 and there is definitely very little support for a $1000 graphics card with a 128MB frame buffer.

Interestingly enough, the GeForce3 will not support TwinView, making it very clear that only the MX line of cards from NVIDIA will be TwinView capable. It looks like we'll have to wait another year before we'll see a GeForce3 class product with TwinView when the GeForce3 MX hits the streets.

In spite of the incredible size of the GeForce3 GPU, the 144 mm^2 core will draw no more power than the original GeForce 256 core did. While the original GeForce 256 did run fairly hot, there were very few issues that we ran into with heat or power consumption.

Click here to enlarge.

The GeForce3 will be available within a matter of weeks, however NVIDIA is caught in a very interesting predicament. There are currently no DirectX 8 titles available that can truly show off the power of the GeForce3. In DirectX 7 titles, the GeForce3 is actually very similar to the GeForce2 Ultra because its programmable nature is not being harnessed in which case it is nothing more than a GeForce2 Ultra. This is actually why NVIDIA is refraining from pushing forward with review samples of the GeForce3, since any review that would be published would generally paint the picture of the GeForce3 being no faster than the GeForce2 Ultra except in regards to FSAA performance.

NVIDIA is actually in the same position Intel was with the Pentium 4. They have a technology that is definitely paving the way for the future, however the performance gains are simply not going to be there for the current crop of applications and in this case, games. While we will reserve final judgment until the actual benchmarks go up, you shouldn't expect to see too much from the GeForce3 in terms of performance in current games. This is truly a next-generation part, and it needs the next-generation of games to be fully utilized.

Unfortunately the next-generation of games will probably be here around the latter half of this year making a $500 investment in the GeForce3 at this point one that should be reserved for those that were already going to buy an Ultra.

The one thing to keep in mind is that the technology that NVIDIA is introducing here will be the foundation for the Xbox; meaning that consoles will finally get antialiasing as a standard.

The other thing to remember is that since the programmable vertex and pixel shaders are a part of Microsoft's DirectX 8 specification, you can almost guarantee that ATI's Radeon2 (or whatever they call the successor to the Radeon) will be much like the GeForce3 except with ATI's own unique twists including a more advanced HyperZ subsystem.

In spite of the decrease of competition in the graphics market, the sector is truly starting to get interesting. Then again, the desktop CPU market really only has two major competitors and look at how much fun we have with it? The more you think about it, the more the 3D graphics market is mirroring the desktop CPU industry. Scary isn't it…

For a performance evaluation of the GeForce3 please read our NVIDIA GeForce3 Review.