Original Link: https://www.anandtech.com/show/1488
Behind the Mask with Optimization and Catalyst AI
by Derek Wilson on September 26, 2004 12:05 AM EST- Posted in
- GPUs
Introduction
Coinciding with the launch of the X700 line of graphics cards, ATI slipped a little something extra into its driver. The lastest beta version of Catalyst that we got our hands on includes a feature called Catalyst AI. Essentially, ATI took all their optimizations, added a few extra goodies, and rolled it all together into one package.Optimization has been a touchy subject for quite some time in the world of consumer 3D graphics hardware. Over the past year, we have seen the industry take quite a few steps toward putting what the developers and users want above pure performance numbers (which is really where their loyalty should have been all along). The backlash from the community over optimizations that have been perceived to be questionable seems to have outweighed whatever benefit companies saw from implimenting such features in their drivers. After all, everything in this industry really is driven by the bottom line, and the bottom line rests on public opinion.
There are plenty of difficulties in getting a high quality, real-time 3D representation of a scene drawn something like every 25 thousandths of a second. The hardware that pushes thousands of vertecies and textures into millions of pixels every frame needs to be both fast and wide. Drivers and games alike need to focus on doing the absolute minimum that is necessary to produce the image desired in order to keep frame rates playable. The fight for better graphics in video games isn't due to a lack of knowledge about how to render a 3d scene; faster graphics come as we learn how to approximate a desired effect more quickly. Many tricks and features, and bells and whistles that we see in graphics hardware have worked their way down to "close enough" approximations from complex and accurate algorithms likely used in professional rendering packages.
Determining just what accuracy is acceptable is a very tough job. The best measure that we have right now for what is acceptable is this: the image produced by a video card/driver should look the way that the developer had intended it to look. Game developers know going in that they have to make trade-offs, and they should be the ones to make the choices.
So, what makes a proper optimization and what doesn't? What are ATI and NVIDIA actually doing with respect to optimization and user control? Is application detection here to stay? Let's find out.
Why Optimize?
All real-time 3D is based on approximation and optimization. If mathematical models that actually represent the real world were used, it would take minutes to render frames. We see this in modern 3D rendering applications like 3DStudio, Maya, and the like. The question isn't whether or not we should "optimize", but how many corners that we can reasonably cut. We are always evaluating where the line between performance and accuracy (or the "goodness" of the illusion of reality) should be drawn.So, who makes the decision on what should and shouldn't be done in modern real-time 3D? The answer is very complex and involves many different parties. There really is no bottom line, so we'll start at academia and research. Papers are published all the time on bleeding edge 3D graphics concepts and mathematics. More and more, it's the game developers who pick up academic papers and look for ways to implement their ideas in fast and efficient ways. Game developers are always trying to find that edge over the competition, which will draw people to their title. They have a strong interest in making a beautiful, playable experience accessible to everyone. The hardware vendors have to take every aspect of the industry into account, but they have their specific areas of focus.
NVIDIA has looked hard at what a few prominent game developers want and tried hard to provide that functionality to them. Of course, game developers don't have the time to write code for every bit of hardware on which their game will run, so they use general APIs to help lighten the load. ATI has tried very hard to figure out how to run Microsoft's DirectX API code run as fast as possible. Both perspectives have had positive and negative effects. ATI doesn't perform quite as well under OpenGL games as NVIDIA hardware in most cases, but generally runs DirectX code faster. NVIDIA may lag in the straight DX performance arena, but they offer more features to developers than ATI in broader DirectX feature support and more vendor specific OpenGL extensions that developers can take advantage of.
Also on the API side, the OpenGL ARB and Microsoft decide what tools to provide to game developers. ATI and NVIDIA decide how to implement those tools in their hardware. The API architects have some power in how tight restrictions are placed on implementation, but game developer and consumer feedback are the deciding factors in the way hardware vendors implement functionality. One of the best examples of how this ends up affecting the industry is in the DX9 precision specification. The original drafts called for "at least" 24bit precision in the pixel shader. ATI literally implemented 24bit floating point (which heretofore had not existed), while NVIDIA decided to go with something closer to IEEE single precision floating point (though, we would like to see both companies be IEEE 754 compliant eventually).
To expand on the significance of this example, both are equally valid and both have advantages and disadvantages. Full precision is going to be faster on ATI hardware as it doesn't need to move around or manipulate as many bits as NVIDIA hardware. At the same time, 24bits of precision aren't always needed for maximal acuracy in algorithms and NVIDIA is able to offer 16bit precision where its full 32bits are not needed. One of the downsides of NVIDIA's implementation is that it requires more transistors, is more complex, and shows performance characteristics that have been hard to predict in the absence of a mature compiler. At a cost of being more flexible (and not necessarily faster), NVIDIA's implementation is also more complicated.
So, what's involved in the process of determining what actually happens in real-time 3D are: hardware architects, API architects, 3D graphics academics, game developers, and consumers. It's almost a much more complex "chicken or the egg" problem. The ultimate judge of what gets used is the game developer in their attempt to realize a vision. They are constrained by hardware capabilities and work within these limits. At the same time, all that the hardware vendor can do is strive to deliver the quality and performance that a game developer wants to see. But even if this relationship ends up working out, the final authority is the consumer of the hardware and software. Consumers demand things from game developers and hardware vendors that push both parties. And it is somewhere in this tangled mess of expectations and interpretations that optimization can be used, abused, or misunderstood (or even all three at the same time).
Perpsectives on Optimizing
For a while now, NVIDIA has been focusing very hard on compiler technology to optimize shader code for their architecture. This was necessary because nv3x had a hard time processing code hand-written by developers (it was very hard to write efficient code - not very intuitive). Before their compiler could handle doing the best possible job optimizing code, NVIDIA would take hand- tuned shaders (for common functionality or specific games), detect when a shader for that effect was used in a game, and run their own instead. This is known as a shader replacement. The only case in which NVIDIA currently does shader replacement for performance reasons is in Doom3. They will also do shader replacement in certain games as bug fixes, such as in Homeworld 2 and Command and Conquer Generals. NVIDIA is relying more and more on their compiler technology to carry them, and this is a commendable goal as long as their compiler team can maintain a high level of integrity in what they do with shaders (mathematical output shouldn't change from the original).ATI has stayed away from the shader replacement and application-specific optimizations for a while. if you don't do them, you can't be tempted to take it too far, which has happened in the past with both ATI and NVIDIA (both Quake and 3dmark spring to mind). That doesn't make it wrong to use knowledge of a running application to enhance performance and/or image quality . And with Catalyst AI, ATI has adopted this stance. They now detect certain applications when they are run in order to use shader replacement or alter the way things are done slightly to suit the game better. In Doom 3, they replace a look-up table with a computational shader. In Counterstrike: Source, they change the way they do caching slightly for a performance gain (they don't do it on all games because it hurts performance in other titles while it helps the source engine). ATI also uses application detection to make sure that AA is not enabled where it would break a game, and other such situations where games have specific quirks when it comes to graphics settings.
Catalyst AI also does texture analysis to determine how to handle bilinear, trilinear, and anisotropic filtering. ATI has done this to some extent before, but now you can turn it off. The low and high settings of Catalyst AI also allow you to adjust how agressively their texture filtering tries to improve performance. They have made a quality enhancing change as well; ATI no longer drops down to bilinear filtering in aniso, no matter what texture stage is used (before, if an object's initial texture was anything other than the highest resolution, bilinear filtering was used). If Catalyst AI is turned off and trilinear is requested, it is always done on everything now.
Anisotropic Filtering and The Test
To verify ATI's claim that full trilinear filtering was done on all texture stages while anisotropic filtering is enabled, we employed the D3D AF Tester. One of the fundamentals of Catalyst AI is that no benchmarks will be detected. So, we can be assured that the D3D AF Tester is not detected. Of course, their texture filtering optimizations may detect colored mip levels and change their behaviour. But even that is a valid optimization. There are fewer optimizations that you can make to trilinear filtering if the mip maps are not just scaled down versions of the original (which is certainly not the case here).
D3D AF Tester with Catalyst AI set to High.
Click to enlarge.
D3D AF Tester with Catalyst AI disabled.
Click to enlarge.
For all our performance tests, we used our AMD Athlon 64 FX 53 (overclocked to 2.6GHz) system. The graphics card that we employed was our ATI Radeon X800 XT Platinum Edition. We chose this card because performance enhancements such as these are generally amplified on higher performance hardware.
Aquamark 3 Analysis
We wanted to test a benchmark because ATI has stated that they would not optimize for any benchmark specifically. But this doesn't mean that a benchmark wouldn't gain a performance advantage that would be applied to any other 3D game using similar techniques. As we can see from the scores, there is no performance difference in whether or not you use Catalyst AI.Doom 3 Analysis
Here, we see a performance improvement when Catalyst AI is enabled and the shader replacement is made. The replacement that they make is in Id Software's specular highlights shader. The attenuation of the specular highlight was determined by a look-up table, but ATI discovered that replacing the look-up table with math runs faster on their hardware. The method by which ATI determined the proper mathematical function was simply to compare the final images rendered with both shaders and find the ones that were the closest. The mathematical result is definitely not exactly the same as the look-up table (becuase ATI is using 24bit precision math, which no one else uses).
D3D AF Tester with Catalyst AI set to High.
Click to enlarge.
D3D AF Tester with Catalyst AI disabled.
Click to enlarge.
We feel that this is a good optimization for ATI's hardware using application detection and shader replacement. Of course, rather than trial and error, it may be safer to go to the developer and ask them for the mathematical function that they used. We are still unclear on Id's take on this, and John Carmack has vocalized previously a distaste for shader replacement in certain situations.
Source Video Stress Test
We don't see any improvement here, and frame rate generally varied 1 fps between test runs; so, these 3 tests really resulted in the same score.
Unreal Tournament 2004 Analysis
This time we see no benefit until AA and AF are turned on. This indicates that ATI is doing some sort of texture filtering optimization when Catalyst AI is enabled, which helps improve performance. In the past, issues in Unreal Tournament performance have been in the area of trilinear filtering, which would be enabled whether or not anisotropic filtering is enabled. But it doesn't seem that ATI's trilinear specific optimizations help them out here (unless they only help when anisotropic filtering is enabled).
Final Words
After having tested a few of the games that ATI now detects under Catalyst AI, we can evaluate what they are doing. They say that they try to keep image quality nearly the same, and in the games that we've looked at, we see that they do. Of course, the only games on which we had seen any "real" improvement were Doom 3 and (only with AA/AF enabled) UT2K4. Maybe there is an issue with this version of the Catalyst AI functionality (this is a beta driver), or maybe there's really just no performance difference for the games that we tested. Usually using higher performance hardware accentuates small tweaks and performance enhancements, and these games shouldn't be limited by other aspects of the system. We are definitely pleased with ATI's move away from dropping to bilinear filtering.NVIDIA first started applying tight restrictions on optimizations last year after their issue with Futuremark. We are glad to see that ATI is embracing a move to enhance the user experience for specific games when possible while tightening up the reins on their quality control as well. We see this as a very positive step and hope to see it continue.
We also like the fact that we have the ability to turn on or off Catalyst AI functionality. When we perform the test, we will be using the default option, but for those out there who want the choice, it is theirs to make. Of course, right now, it doesn't look like it makes much sense to have Catalyst AI on except in Doom 3 and Unreal. As the package matures, we will likely see more games affected and more reason to enable Catalyst AI. But ATI can be assured that we and others in the community will be doing image quality spot checks between running with and without Catalyst AI. And the same goes for any optimization over which NVIDIA allows us control.
Over the past year, we have had ample opportunity to speak with the development community about optimizations and their general take on the situation. They tend to agree that as long as the end result is very nearly the same, they appreciate any kind of performance enhancement that they see. Since NVIDIA and ATI cannot physically produce the same mathematical output, they'll never have the same image appear on systems with different vendors' cards in them. But just as these two different results are equally valid under the constraints of the API used and the developer who implemented them, so can optimized results that are faster to render, but not perceivably different to the human eye.
Both ATI and NVIDIA want to maintain an acceptable image quality because they know that they'll be held accountable for not doing so. If it's not the API architects (who can take away the marketing tool of feature set support), then it's game developers. If it's not the game developers, then it's the end users. And the more we know about what we are seeing, the better we are able to help ATI and NVIDIA give us what we want. As the mantra of "optimizations are bad, mmkay" lifts, we should encourage both companies to focus on balancing mathematically accurate output optimally with pure speed to match our own perception.