Name: Inside ATI & NVIDIA: How they make frames fly
Item: Inside ATI & NVIDIA: How they make frames fly
Author: Anand Lal Shimpi

Original Link: https://www.anandtech.com/show/998

Inside ATI & NVIDIA: How they make frames fly

VIEW ARTICLE

by Anand Lal Shimpi on September 23, 2002 2:14 AM EST

Posted in
GPUs

0 Comments

Ever since we started our Inside series the two companies we've received the most requests for have been ATI and NVIDIA. After touring Intel's research labs in Oregon and visiting VIA in Taiwan, we felt it was time to visit the two kings of graphics - ATI and NVIDIA.

We combined these two companies into one article for one reason in particular - our tours were complementary. ATI imposed very strict restrictions on photographs during our visit to their offices in Thornhill, Ontario; we saw a lot of interesting things at ATI's offices (including the foundation for their fountain of fire in the lobby of their main building) but we weren't able to take pictures of most of them. On the other hand, ATI sat us down with one of their chip architects and we were able to get a wealth of information about how their GPUs were made.

NVIDIA wasn't able to set us up with any engineers for an extended period of time (although lunch with Chief Scientist, David Kirk is always informative) but they were much more lax on the picture front so we were able to bring you more of the behind the scenes from NVIDIA.

The combination of the two visits provided us with enough material to put together this piece, so without further ado let's take you inside ATI and NVIDIA.

Inside ATI - Designing a Chip

We've often been asked how exactly these graphics giants go about doing what they do, and that's exactly what we asked an ASIC engineer at ATI to help us explain.

The first step in GPU design is of course, marketing; a spot will be defined in the market, where the product will end up being aimed at. A document describing this gets served to a lead architect where details such as costs, schedules and resources required are discussed.

The cost limitations also help determine figures like transistor counts; at this stage the target manufacturing process is also chosen depending on a number of factors. As you can expect the manufacturing process (e.g. 0.15-micron, 0.13-micron, 90nm, etc…) contributes to the cost structure of the chip and imposes die sizes/transistor count limitations as well. What's important to note is that the target manufacturing process is decided upon at the very start of the design cycle based on estimates of where the foundry (the people that actually manufacture the chip - e.g. TSMC) will be at by the completion date. If this estimate is off, which was the case with NVIDIA's NV30 design, then the GPU will be inevitably delayed. Once a manufacturing process is decided upon, it is extremely difficult to, months later, go back and attempt to revise the design for a different process.

Discussions will continue between the designers and the marketing team for a matter of weeks. The process works much like a loop, with the designers revising documents sent to them by the marketing folks and so on and so forth.

Once a product cost and schedule is decided upon, it's time to start building the architecture. A team of engineers is roundup up and they start defining the features of the GPU, who will be working on them as well as defining the design schedule (e.g. Team 1 - have antialiasing unit completed in three weeks). Before we get to the actual designing you have to understand a bit of how you actually make a chip.

These days, chip architecting has been made infinitely easier through the advent of Hardware Description Languages (HDLs). A HDL, as the name implies, is a type of programming language that effectively describes hardware. Using a HDL such as Verilog or VHDL (two common HDLs), a designer would write code that is translated by a synthesizer into a netlist or schematic that can be used to produce a chip from. Thus when designing actually begins, there's a bit of circuit diagramming but mostly a lot of code-writing. Keep in mind that programming in these HDLs is absolutely nothing like programming in C or C++. While the code may look very similar to C, the actual functionality is very different. Let's take a very basic example from Verilog:

always@(posedge clock)
Q <= D;

The above code executes whenever there's a rising edge on the clock; when it does execute, the input signal 'D' is stored in 'Q'. We've effectively designed a very basic form of memory using a storage element known as a flip-flop. A synthesizer would take this code and produce a circuit based on the hardware that this code describes; in this case it would produce a storage element that would retain the value of its input.

Using their skills, the team of engineers would then code and design the chip and all of its units in a HDL, like Verilog, for around 3 to 4 months (depending on the scale of the project). During these months of coding, all of the features decided upon earlier are now implemented into the actual chip itself.

After the design is completed the next few months are spent in verification. The process of verification is critical to meeting production schedules because it helps get rid of problems before the chip is actually sent to the foundry for production. If a chip comes back from the foundry and it turns out that the design doesn't work as planned, then you've not only wasted a good deal of time but also an incredible amount of money.

Part of the verification process entails basic functionality tests to make sure that all of the gates within the chip work properly. Workloads are also simulated to make sure that the gates not only work but they also work as expected. Some of these tests are also conducted through the HDL itself by writing programs to test the hardware (sort of simulating a tester as well as the chip itself).

At this point one team will branch off and begin some static timing analysis to make sure that the chip will be able to meet clock speed goals. Remember that even at this point there is no physical "chip" just a simulation. While all of this is happening, a team of analog engineers is working on the memories, power circuitry, etc… Analog design and engineering is a very complex and dramatically different beast from the digital logic that we've been talking about until this point, there's a strong focus on understanding the elements of complex numbers, differential equations and signal analysis. The analog portion of the equation cannot be ignored as it's a very important part of the GPU design and manufacturing process; luckily the digital designers don't have to mess with it too much.

Finally, after all of the testing the chip is "synthesized" meaning that the HDL is translated into a gate-level netlist or schematic that is used to physically manufacture the chip. The first cut placement route is done, which is the real layout or how the gates will look in silicon. At this point there's still no chip to play around with but the chip can be "produced" using what is known as a FPGA (Field Programmable Gate Array).

A FPGA is a generic logic device consisting of a large number of gates; the gates can be configured in such a way to effectively emulate the chip before production. The benefit of doing this is that the chip can be fully tested and operational for only the cost of the FPGA which is on the order of thousands of dollars, whereas it costs millions to get a chip to and back from a foundry. A particular type of FPGA is used by both ATI and NVIDIA and it's manufactured by a company called IKOS; the IKOS box, as it is often referred to, is effectively a very large FPGA used to emulate designs like the R300 or NV30.

The IKOS box can fully emulate a design however it isn't able to run anywhere near as fast as the final GPU will operate. While today's GPUs are running at speeds above 300MHz, the IKOS box can emulate the design running at a small fraction of that - in the KHz frequency range (1000 KHz = 1 MHz). With the GPU running that slowly, you can't gauge performance but you can run a system with the OS installed, test drivers, and even run games on the IKOS box (albeit at ~0.2fps).

After the verification process is complete and the layout is done then comes the elusive tape-out. This is the preparation of everything necessary to be sent out to the foundry for actual production of the chip. About 4 weeks later you get your first chip or A0 silicon as it is often referred to as. The testing doesn't stop during those four weeks however, simulations continue as well as verification of the PCBs (Printed Circuit Boards) that the GPU will eventually be soldered onto.

Once the first silicon (A0) is back verification of it starts immediately; all of the functional units are tested and any unexpected behavior is immediately noted and debugged. Focused Ion Beam (FIB) tools are sometimes employed to fix bugs in the chip; we introduced the concept of using a FIB tool in our Inside Intel article, but as a brief refresher, a FIB tool allows you to effectively perform surgery on a microprocessor without ruining the functionality of the chip. The tool allows you to cut or lay down new wires on a chip, down multiple metal layers, to fix a bug in the design without destroying the chip. This way ATI or NVIDIA can attempt to fix a bug and see whether the bug is actually fixed before sending the revised design back to the foundry; remember it takes about 4 weeks to get a chip back after tape-out and you don't want those 4 weeks to be wasted if it turns out that the fix doesn't really get rid of the bug.

Once all of the bugs are fixed the finalized design is taped-out and sent to the foundry for production. These chips are then sent through qualification where they are put through all sorts of tests for compatibility, thermal and voltage stresses, signal integrity, etc…

After qualification is complete, it's time for production and now you know how GPUs are made.

Inside NVIDIA - The Santa Clara Tour

NVIDIA's new campus in Santa Clara, CA is impressive to say the least but at the same time, exactly what you'd expect from the leader in the graphics world.

Click to Enlarge - One of NVIDIA's buildings in Santa Clara

We started off our tour revisiting NVIDIA's server farms, a site we had seen one year ago when we first walked around the new buildings. These server farms are used mostly for simulation of hardware (flip back a couple of pages to get an understanding of how) and thus the hardware demands are very high; you need a lot of memory and a lot of processing power.

NVIDIA found themselves in an interesting situation where they had no idea which platform would end up being the fastest yet most economical solution for their hardware simulation applications; NVIDIA's solution was to essentially, buy one of everything. Thus we found everything from Sun boxes to Itaniums to Pentium 4s running in their racks; the hardware that performed the best, NVIDIA bought more of and what didn't work quite as well sat there alone.

A couple of IBM boxes were found while we rummaged through NVIDIA's racks
Click to Enlarge

Inside ATI & NVIDIA: How they make frames fly

Inside ATI - Designing a Chip

Inside NVIDIA - The Santa Clara Tour

More Servers

Storage & the rest of the Network

Testing GPUs

NV30: Somewhat up and running

Probing, FIBing, & Verifying chips

Lunch's on Jen-Hsun

Final Words

Log in

Don't have an account? Sign up now