Original Link: https://www.anandtech.com/show/184
Demo1.dm2, Demo2.dm2, Crusher.dm2, Massive1.dm2, Unreal timedemo, Forsaken Nuke.dem, Forsaken Biodome.dem, Mon2.dm2, etc. What is the difference between all of these benchmarks? What limitations do these benchmarks expose? Why are Crusher results always lower than demo1.dm2? How come Mon2.dm2 runs faster than a Voodoo2 on just about any AGP board? All of these questions will be answered, in detail with this article. | |
Please note: This is a somewhat rigorous analysis of benchmarks, and some elementary calculus is used. I have tried my best to explain the calculus parts in plain English, but I'm not a math teacher... |
Fill Rate Limited Benchmarks
A fill-rate limited benchmark is a benchmark which is designed to expose the fill-rate limit of a certain video card. Fill rate limited benchmarks generally employ multi-pass rendering techniques (3D accelerator cycle chewers) and run at very high resolutions. Nearly every benchmark can be made a fill-rate limited benchmark if run at a high enough resolution. However, not all 3D accelerators will reach their peak fill-rate with a fill-rate limited benchmark. The Voodoo2 SLI, for example, will not reach a fill-rate limit with demo1.dm2 (running at 800x600 or less), while an i740 might reach it's maximum potential on a system as slow as a PII/266. It is important to remember that simply because a benchmark is fill-rate limited for one card, it does not mean that it will be fill-rate limited for another.
Some Fill Rate Limited Benchmarks
Some popular fill-rate limited benchmarks are the recently adopted Unreal timedemo test and Quake2 Demo1.dm2, provided it is run at a high enough resolution (800x600 should do for 'most everything but Voodoo2 SLI) Unreal, which uses three pass rendering is probably the biggest fill-rate hog of any game currently in the market. What does three pass rendering mean? This means that on cards which do not support any special dual-pass / clock rendering, it will take three times as long to render a pixel in Unreal, than a pixel, in, Forsaken, lets say. Obviously, Unreal is very fill-rate limited. Quake2's demo1.dm2 is also a relatively fill-rate limited benchmark. Since Quake2 uses two-pass rendering, most 3D accelerators' fill-rates are already cut in half (when compared to peak fill-rate in single pass games). Though Quake2's demo1.dm2 does not expose the fill-rate limit of most cards as clearly as Unreal timedemo, when running at a sufficiently high resolution, demo1.dm2 can be used to differentiate between the fill-rates of different 3D accelerators.
Recognizing a Fill Rate Limited Benchmark
Perhaps the most important thing which you will hopefully learn in this article is HOW to analyze benchmark results and tell whether or not they reflect the fill-rate limits of video cards, or some other bottleneck (these other bottlenecks will be discussed in the next few pages) In order to recognize fill-rate limited benchmarks, it is necessary to analyze results generated from a fill-rate limited benchmark. Our fill-rate limited benchmark results will consist of a fill-rate limited benchmark run at various CPU speeds (i.e. pumping different amounts of data to the card) Since a fill-rate limited benchmark is (you guessed it) fill-rate limited, no matter how much data is pumped to the card, once you hit the fill-rate limit, the FPS will REMAIN THE SAME.
The ideal Case
What was described above was the ideal case. In the ideal (theoretical) case, the 3D accelerator will scale linearly with CPU speed until the fill-rate limit is reached. At this point the graph, let's call it f(x), comparing FPS vs CPU Speed, will become a horizontal line, with a slope of zero. Of course, this is never really the case. Since most cards have latencies, driver issues, etc. an effective fill-rate limit is reached before the absolute fill-rate limit.
The real world situation
In "real life", As we begin to approach the absolute fill-rate limit of the accelerator, we will notice that the improvement in performance as we increase CPU speed (i.e. increase the amount of Data we feed) is decreasing. This can be summarized using a little mathematics notation by: f ' ' (x) < 0. (Where f(x) is the function of Frames per second vs CPU speed) For those of you who do not know simple calculus, f ' ' (x) is read 'f double prime of X'. What you really need to know is that f ' ' (x) [Which is the slope of the tangent line to the function of the slope of the tangent line to the function f(x)] is that it equals the acceleration at any given point on the function f(x). So what we mean by saying f ' ' (x) < 0 is that the acceleration is negative. (This means that speed (actually velocity) is decreasing) In our case, this means that the rate at which Frames/sec is increasing is decreasing. I hope I didn't lose anyone there... Anyway, below is a graph showing an results from running a fill-rate limited test (demo1.dm2) with the Riva 128ZX (@800x600, where it is fill-rate limited) with various speed Pentium IIs. (233,266,300,350 and 400mhz)
f ' ' (x) is obviously negative, as you can see from the graph. This verifies the information above, regarding real world fill-rate limit expectations. Unreal benchmarks coming as soon as OpenGL and D3D drivers mature more... So, what happens when the results don't fit the curve shown above? This probably means that the test is not fill-rate limited, but something else instead. The next type of benchmark I will talk about is Geometry (CPU) limited benchmark.
Geometry (CPU) Limited Benchmarks
Unlike a Fill-rate limited benchmark, a Geometry limited benchmark is limited by CPU speed, not 3D accelerator speed. (I will use Geometry as a general term encompassing not only geometric transformations, but lighting, etc. as well) Two main conclusions can be drawn from this: (1) Geometry limited benchmarks are poor benchmarks for 3D accelerators, since the CPU (and latencies, drivers) is what sets accelerators apart, not the accelerator's power, which is generally what we want to know in a 3D accelerator round up, for example. The second conclusion is that Geometry Limited Benchmarks generally yield the same FPS even when upping the resolution. The latter conclusion is good to know, as it helps recognize geometry limited benchmarks.
Some Geometry (CPU) Limited Benchmarks
There are not many geometry limited benchmarks. The Unreal timedemo is somewhat geometry limited, but not nearly as geometry limited as a very popular new benchmarks, crusher.dm2. It is somewhat ironic that crusher is gaining popularity, as it is very CPU limited. Even though the DM level used is of relatively low polygon count, all of the people one the screen, and the rocket launcher / BFG explosions account for much of the CPU cycles. (Explosions must be calculated, which takes a lot of extra cycles) The problem with Crusher.dm2 as a CPU limited benchmark is that, since it uses the Quake2 Engine, plus it has a lot of overdraw, due to the excess number of characters on the screen at any one time, it is also very fill-rate limited. @640x480, however, most cards are CPU limited with crusher.dm2. The Unreal benchmark is somewhat CPU limited as well; however, since it is much more fill-rate limited than it is CPU limited, Unreal's geometry limit is probably only evident with Voodoo2 SLI and the likes. (I don't have Voodoo2 SLI yet, so I can't be absolutely sure)
Recognizing a Geometry Limited Benchmark
Recognizing a Geometry Limited Benchmark is relatively easy. Since The 3D accelerator has excess fill-rate, the FPS is directly proportional to the CPU speed.
The ideal Case
What was described above is the ideal case, as usual. In the ideal case, FPS = k * CPUSpeed, where k is some constant which tells us how many frames of data the CPU can pump out in a certain number of time. You can let k * CPUSpeed = FramesDataPerSecond, in which case, FPS = FramesDataPerSecond. Of course, the real world situation is not the same, but it is very similar.
The real world situation
First of all, as mentioned above, crusher.dm2 is also very fill-rate limited, since it uses the Q2 engine, and there is a lot of overdraw (characters generally generate a lot of overdraw). Since crusher.dm2 is very fill-rate limited as well as geometry limited, the fill-rate limit may show on certain cards when running on a fast Pentium II @high resolutions.
It's not exactly a line, but it's close enough. Most geometry limited benchmarks generate straight line graphs. Geometry limited benchmarks which are not fill-rate limited at all (i.e. wireframe, simple flat shaded programs) will generate horizontal line graphs when changing the resolution, since changing the resolution effects the fill-rate, and that's just about it.
Bandwidth Limited Benchmarks
Some benchmarks aren't fill-rate limited OR Geometry limited on certain video cards (read 3Dfx cards). How is it then, that these cards can pump out under 25fps in a game where an inferior (AGP) card can do 40+ with ease? If you ever see the Voodoo2 getting absolutely throttled by "lesser" cards (G200, i740, etc.) You can almost be sure that it has something to do with bandwidth.
Some Bandwidth Limited Benchmarks
The only true bandwidth limited benchmark that I have seen is the S3 mon2.dm2 benchmark. This benchmark uses over 20MB of textures in the scenes, bringing any non-AGP card (with under 20 MB texture memory) to its knees. Even though mon2.dm2 is the only benchmark out there which really needs AGP to run; more and more game manufacturers are coming out with games which require a lot of texture memory. (Epic had to do serious texture management "things" to get Unreal to run acceptably on the Voodoo2, a PCI card) You can expect future games to use well over 8 MB of texture in a single scene. Anything more than this, and that Voodoo2 starts to slow down big time...
Recognizing a Bandwidth Limited Benchmark
The ideal Case
Since Bandwidth is not related to either Fill-rate or CPU Speed, the FPS vs CPU Speed graph of a Bandwidth limited benchmark will yield a horizontal line..
The real world situation
The real world situation won't change much. Possibly a 3fps fluctuation between a PII/233 and a PII/400 due to decreased latencies with faster CPU, higher bus speed, etc.
Bandwidth limited benchmarks are scarce, and very easily recognizable. Read on to find out about Combo limited benchmarks...
Combining Limitations
As I mentioned with Crusher.dm2, some benchmarks have more than one limit, depending on the system, the resolution, and the 3D accelerator used. How do we go about recognizing benchmarks which are limited by a few factors? Read on to find out ...
Is it Math? Chemistry?!
While it may seem that doing some mathematical manipulation on the graphs will give us the graph of a combination, such as adding the graphs, or multiplying the graphs. However, this is not the case. In order to predict the graph of a benchmark with multiple limitations we must analyze our combination benchmark like a chemical formula.. Chemistry? What's that got to do with it you may ask. The answer is the 'limiting reagent' concept.
WTF is a Limiting reagent?
In a chemical equation, i.e. the synthesis of a compound, for example, the limiting reagent is the substance which limits the amount of desired product. Even if there is 100 tons of the other substance in the reaction, the amount of desired product (i.e. not including excess) will remain the same. This is the idea behind predicting the results.
Geometry and Fill-Rate Limited
Both of these limitations cannot coexist, only at one point (i.e. a certain CPU speed). The graph of the results of a Geometry and Fill-rate limited benchmark can be expressed by the following split function.
{ k*CPUSpeed, 0<CPUSpeed < CPUSpeed @fill-rate limit
FPS (CPUSpeed) = {
{ MaxFillRate, CPUSpeed >= CPUSpeed @ fill-rate limit
As you can see, the function aboves' graph looks like the graph of a line until you hit the CPUSpeed at which it is both Geometry and Fill-rate limited (i.e. The Fill-Rate is the "LIMITING REAGENT" in this case) After we pass this Fill-rate limit, the results are always going to be the same, since we hit the limit, and excess CPU Speed isn't going to help us.
Bandwidth and Geometry or Fill-rate limited
Again, this problem involves finding the minimum reagent. The equation is listed below:
{ k*CPUSpeed, 0<CPUSpeed < CPUSpeed @Bandwidth Limit
FPS (CPUSpeed) = {
{ Bandwidth Limit, CPUSpeed >= CPUSpeed @Bandwidth Limit
An easier way...
Looking at these two functions you will notice that they are very similar, actually, almost identical. Why memorize these equations when there is an easier way to represent the frame rate of a benchmarking composed of N number of limitations? The reason I even bothered to do all that work above is to explain where I got the equations from. Now that you pretty much understand (hopefully) the idea, let's employ the limiting reagent idea to ANY number of limitations.
At any given CPUSpeed (from now on, referred to as 'X') Our given 'Y' is limited by the limitation which gives us the least FPS (at that X). Writing this in Math terms, the equation is
F(X) = Minimum ( Lim1, Lim2, Lim3, Lim4,..., LimN);
Where Lim# = The max FPS possible with the Limitation
The Example:
Let's draw a sketch (no numbers) the FPS vs CPUSpeed (F(X) vs X) graph of the results of some benchmark which is geometry limited at a certain range, fill-rate limited at another, and let us introduce a third limitation to make it more interesting. What is this third limitation. Well, we want a limitation which wont be a horizontal line one (since fill-rate is already) so we will make an arbitrary limitation (lets call it the "Windows" limitation) This "Windows" limitation will have a negative acceleration, so the graph of the FPS vs CPUSpeed of a totally "Windows" limited game will look like a parabola with the equation of -x^2 + k, where k is some constant detailing the maximum FPS possible with the "Windows" limitation. The graph is pictured below:
The graph which represents F(X) is the graph which consists of the Blue line until it intersects the Pink, then the pink until it intersects the yellow, and then the yellow until the "Windows" limitation becomes so great that the FPS reaches 0... (No hard feelings Microsoft :)
Conclusion
In conclusion, when choosing a 3D accelerator, it is helpful to know which benchmark results mean what, since benchmarks can be very misinforming if not interpreted correctly. Hopefully this article has given you some insight on how to interpret future benchmark results, like the ones Anand will be posting tomorrow (10/11/98)