Original Link: https://www.anandtech.com/show/424

AMD Athlon 800

by Anand Lal Shimpi on December 20, 1999 4:47 AM EST


We all make mistakes, and on October 4, 1999, in our review of the AMD Athlon 700, we implied that the last Athlon released this year would be the 700MHz part. Shortly thereafter, Intel released a 733MHz Pentium III which forced Compaq to pressure AMD into releasing an Athlon with a higher clock speed and, thus, the Athlon 750 was announced on the 29th of November.

As if that were not excessive enough, with rumors that Intel was going to start sampling their Pentium III based on the new Coppermine core in 750MHz and 800MHz flavors, AMD was pressured to release data on their competing product early. And thus we have our review of the AMD Athlon 800.

The Athlon 800, in brief, is just a faster version of the Athlon 750. It is based on the same 0.18-micron K75 core as the 750, and features a 2/5 L2 cache divider (or 0.4x L2 cache multiplier if you hate fractions) thus giving it a slower L2 cache than the Athlon 700 which was the last Athlon to run its L2 cache at 1/2 the core clock speed.

AMD is currently in a very interesting situation -- they have the ability to push the clock speed of the Athlon to even higher levels; the air-cooled 1GHz mark isn't too far away. If you recall, just four months ago, the only way to get an 800MHz Athlon was through Kryotech's Cool Athlon 800 that ran the CPU at -36 C and, now, we are able to achieve the same results via air cooling alone. So what is AMD's problem?

As we mentioned in our review of the Athlon 750, the problem AMD is running into is getting fast enough L2 cache chips to keep up with the increasing clock speed of the processors because they are dependent on third party manufacturers to produce the L2 cache chips for the processors. The only true solution for this problem is to move the L2 cache off of the Slot-A processor card and onto the die of the Athlon; unfortunately, this move is not scheduled to happen until sometime in the first half of 2000. While that isn't very far away, AMD can't remain idle and let Intel win the clock speed battle; clock speed sells more processors than technology.

While it is hard for all of us to believe, in the end, when you have someone that hasn't the slightest clue about what an L2 cache is and why a faster one is a good thing, the only "performance" data that they can base their buying decision on is clock speed, and the "higher is better" theory comes into play here. This is why a big retail vendor like Compaq, would push for the release of an Athlon with a higher clock speed.

If they continue down this road, eventually the performance difference between the Athlon, with its cache increasing at rates now less than ½ of the increase in clock speed, and the Pentium III, with its cache increasing directly with the clock speed, will grow to such a point that the Athlon will clearly be the slower processor.

Luckily, AMD's roadmap does not place the Athlon on that path for too long as they will soon move the L2 cache onto the Athlon's die with their Thunderbird and Spitfire cores. The problem here is that, while they are competing with Intel on a clock for clock basis, they are going to start losing the performance battle in certain situations. While professional applications that can enjoy the Athlon's superior FPU will continue to perform at a superior level on the Athlon, business, productivity and even some games may begin to favor the Pentium III over the Athlon due to the Pentium III's L2 cache speed advantage.



The Athlon on the losing side?

With more applications showing SSE support, it is going to become increasingly more difficult for AMD to gain support in benchmarks as well. Remember the SSE compiler we talked about in our Coppermine review? Here is a quick refresher:

Intel has been working on making the implementation of SSE even easier for developers. They currently have two compilers in beta that are SSE optimized, a Fortran and a C++ compiler, that should become available in the first half of next year. These compilers will automatically generate SSE enhanced code where appropriate. So when a programmer goes to write a function that could be aided by the use of SSE, the compiler generates a single binary with SSE and non-SSE code. The generation is done in such a way that the overall size of the binary is not increased too greatly and there should be no compatibility problems with non-SSE processors.

This method of implementing SSE is a definite step forward on Intel's part because it is the only way to get SSE into the hands of the users without asking for too much from the developers. In terms of integration, the compilers essentially plug into the Microsoft Development Environment so the developers don't even have to migrate from the environment they are used to. The compilers will be a part of Intel's VTUNE package and should be in final release form sometime in the first half of 2000. The package will retail for around $500.

The compilers should make it very easy and thus very tempting to include SSE support in upcoming applications, especially since the added development time is next to nothing because of the convenient compiler. Unfortunately for AMD, there is no equally as simple compiler for 3DNow! or Athlon optimizations, making it more likely for a program to ship with better SSE optimizations than 3DNow! or Athlon specific optimizations.



Two new Benchmarks

The recent release of BAPCo's SYSMark 2000 and Ziff Davis' Content Creation Winstone 2000 will also cloud the issue of performance as well, since the Pentium III seems to naturally fare better under these newer benchmarks than it did in the past against the Athlon in SYSMark 98 and Winstone 99.

SYSMark 2000 and Content Creation Winstone 2000 are both steps forward in the benchmarking arena. SYSMark 2000 updates the extremely out of date application versions present in SYSMark 98 to their latest counterparts while continuing to provide a very comprehensive set of business, professional and content creation application tests. Content Creation Winstone 2000 received a small introduction to the AnandTech crowd in our review of the Athlon 750, and, since then, it has become a part of our standard CPU performance evaluation test suite. The Content Creation Winstone 2000 is a simple one-run test that simulates real world usage through the multitasking use of six content creation applications. This test is light years ahead of the old Business Winstone 99 that we were analyzing performance by for so long, there is only so much of a performance boost you need under Microsoft Word until it becomes excessive (running spell check can only get so fast).

Currently, Intel has a very strong influence in the benchmark industry, with close ties to BAPCo and ZDBOp; it is time that AMD tried to attain a presence in benchmarks that is just as strong. Seeing the lack of Athlon-specific optimizations in applications is disappointing to say the least. Hopefully as time progresses, we will continue to see more Athlon-optimizations placed in applications.

3DMark 2000 - NOT synthetic a CPU benchmark

MadOnion, the company formerly known as FutureMark (don't ask questions, just accept it), has also released their updated fall benchmark, 3DMark 2000. 3DMark 2000, like its predecessor, features a 'CPU test' which is actually a very useful tool. The test runs through the frame rate tests of 3DMark but in a low resolution thus signifying the performance of the CPU over that of the video card. It is the same basis we use behind running Quake III Arena and Unreal Tournament at 640 x 480 in CPU reviews -- this way we eliminate possible video card bottlenecks.

However, one clarification must be made when calling it a 'CPU test' -- the test does not test how well one CPU stacks up against another; instead, it illustrates how well a particular CPU can drive a specific video card. A CPU scoring higher than another in 3DMark 2000's CPU test does not mean that it is the faster overall CPU, it means that, provided that both systems are using the same configuration with the exact same video card, the higher scoring CPU is better at handling the transforming and lighting calculations offloaded to it by that particular graphics card.

Why does this matter? A perfect example of being fooled by this test would be when using the GeForce, which is currently forced in to AGP 1X mode by default on the Athlon using the latest Detonator drivers. Under 3DMark 2000's CPU test, a Pentium III 750 on a BX board comes dangerously close to beating out an Kryotech Athlon running at 1000MHz. Don't you just love how benchmarks can be deceiving?

In essence, 3DMark 2000 is a fine benchmark, just be aware of what the results are portraying, as it's not the same thing as a Winbench CPUMark. For this reason, we will be illustrating gaming performance using two video cards, one using NVIDIA's GeForce, and one using the TNT2 Ultra.



The Test

This review wasn't intended to be a massive CPU roundup, rather a comparison of 700MHz+ Athlon processors amongst themselves as well as to their Pentium III counterparts. For more Athlon performance scores visit our previous Athlon reviews.

Windows 98 SE Test System

Hardware

CPU(s)

Intel Pentium III 750
Intel Pentium III 700

Intel Pentium III 800
AMD Athlon 800
AMD Athlon 750
AMD Athlon 700
Motherboard(s)
AOpen AX6BC Pro-II
AOpen AX6C
Gigabyte GA-7IX
Memory

128MB PC133 Corsair SDRAM

128MB PC800 Samsung RDRAM
128MB PC133 Corsair SDRAM
Hard Drive

IBM Deskstar DPTA-372050 20.5GB 7200 RPM Ultra ATA 66

CDROM

Phillips 48X

Video Card(s)

NVIDIA GeForce 256 32MB (default clock - 120/166)
NVIDIA RIVA TNT2 Ultra 32MB (default clock - 150/183)

Ethernet

Linksys LNE100TX 100Mbit PCI Ethernet Adapter

Software

Operating System

Windows 98 SE

Video Drivers

NVIDIA GeForce 256 - Detonator 3.65 @ 1024 x 768 x 16
NVIDIA Riva TNT2 - Detonator 3.65 @ 1024 x 768 x 16

Benchmarking Applications

Gaming

GT Interactive Unreal Tournament 4.04 UTbench.dem
idSoftware Quake III Arena demo001.dm3
MadOnion 3DMark 2000
Rage Software Expendable Timedemo

Synthetic
Distributed.net RC5 Client CSC Cracking Test
Productivity
BAPCo SYSMark 98
BAPCo SYSMark 2000
Ziff Davis Content Creation Winstone 2000

 

Windows NT SP6 Test System

Hardware

CPU(s)

Intel Pentium III 750
Intel Pentium III 700

Intel Pentium III 800
AMD Athlon 800
AMD Athlon 750
AMD Athlon 700
Motherboard(s)
AOpen AX6BC Pro-II
AOpen AX6C
Gigabyte GA-7IX
Memory

128MB PC133 Corsair SDRAM

128MB PC800 Samsung RDRAM
128MB PC133 Corsair SDRAM
Hard Drive

IBM Deskstar DPTA-372050 20.5GB 7200 RPM Ultra ATA 66

CDROM

Phillips 48X

Video Card(s)

NVIDIA GeForce 256 32MB (default clock - 120/166)
NVIDIA RIVA TNT2 Ultra 32MB (default clock - 150/183)

Ethernet

Linksys LNE100TX 100Mbit PCI Ethernet Adapter

Software

Operating System

Windows NT SP6

Video Drivers

NVIDIA GeForce 256 - Detonator 3.65 @ 1024 x 768 x 32
NVIDIA Riva TNT2 - Detonator 3.65 @ 1024 x 768 x 32

Benchmarking Applications

Professional
3D Studio MAX R2
SPECviewperf 6.1.1
Productivity
BAPCo SYSMark 98
Ziff Davis Content Creation Winstone 2000


ZD's Content Creation Winstone 2000 taxes the system in a much more effective way than the old Business Winstone 99 used to. The result is a more accurate representation of overall system performance under content creation applications. In spite of its slower L2 cache, the Athlon 800 manages to pull ahead of the pack by a small margin. In this test, the Athlon seems to hold its ground above the Pentium III on a clock for clock basis, although in some cases not by a large performance advantage.

SYSMark 2000, unlike its predecessor SYSMark 98, features more up to date versions of the applications included in the benchmark. Many of them happen to boast SSE optimizations that help keep Intel's measurable but not noticeable advantage over AMD.

While this last benchmark isn't a real world test, we decided to include it as a sort of "drool-inducer" for the members of Team AnandTech, our very own RC5 cracking team. The Athlon 800 will be cracking for us in the lab as soon as this is published ;)



The latest GeForce 256 drivers from NVIDIA force the Athlon to operate in AGP 1X mode, thus making the Athlon suffer in the texture transfer tests. As the texture sizes increase, the Athlon's performance suffers in comparison to that of the Pentium III which runs at full AGP 2X/4X depending on the platform (BX/820).

One interesting thing to note is that the Athlon has no problem beating the Pentium III in the smaller texture tests where AGP texturing isn't being forced.

With full AGP 2X support with the TNT2 Ultra, the Athlon manages to beat out the Pentium III, clock for clock, in the texturing tests. The only setup that gives the Athlon competition here is the AGP 4X i820 platform running the Pentium III 800.



Testing at 640 x 480 helps to remove the graphics card as a bottleneck and focus the attention on the CPU. In this case, the Athlon takes a clear back seat to the Pentium III when armed with the GeForce.

The Athlon 800 moves up one notch with the TNT2 Ultra, but still comes in 2nd to the Pentium III 800.



Increasing the resolution and color depth to a more realistic setup unfortunately maxes out the GeForce and thus the performance difference between the six CPUs is nonexistent.

The same picture here.



With updated drivers we are seeing the Athlon outperform the Pentium III under UnrealTournament by a slight margin. The performance between the two platforms is very close however.

With the TNT2 Ultra, the Athlon just barely loses the lead it had over the Pentium III 750, 0.2 fps isn't worth arguing about.



Once again, we are seeing the limits of the GeForce 256 as we hit 1024x768x32.

Truncating/rounding would produce the same picture with the TNT2 Ultra as we just saw with the GeForce 256.



Here we see a clear dividing line between the older BX setup and the newer contenders. Memory and system bus bandwidth play a very important role in the Expendable benchmark and thus we see the narrow victory of the Athlon 800 over the Pentium III 800 on an i820/RDRAM platform; and the noticeable performance lead of all three Athlons and the single i820 test system over the two BX setups.

The Athlon holds the lead here as well.



One fact about the Expendable timedemo is that the high resolution tests are wonderful for illustrating further performance differences between CPUs. While the performance of all of the CPUs is being hindered by the fill rate/memory bandwidth limitations of the GeForce 256, we can see the positive effects of the AGP 4X support and greater memory bandwidth of the i820 in that it helps give the test system the slight advantage over the AGP 2X Athlon 800 platform.

The TNT2 Ultra meets its match at the 1024x768x32 setting, there is no change between the CPUs in this fill rate/memory bandwidth limited test.



SPECviewperf

The Standard Performance Evaluation Corporation, commonly known as SPEC, managed to come up with a synthetic benchmark with real world implications. By running specific "viewsets" SPECviewperf can simulate performance under various applications. To be more accurate, according to SPEC, "A viewset is a group of individual runs of SPECviewperf that attempt to characterize the graphics rendering portion of an ISV's application." While this method is by no means capable of identifying the performance of a card in all situations, it does help to indicate the strengths and weaknesses of a particular setup.

SPECviewperf 6.1.1 currently features five viewsets: the Advanced Visualizer, the DesignReview, the Data Explorer, the Lightscape and the ProCDRS-02 viewset. Before each benchmark set we've provided SPEC's own description of that particular viewset so you can better understand what that particular viewset is measuring, performance-wise.

Each viewset is divided into a number of tests, ranging from 4 to 10 in quantity. These tests each stress a different performance element in the particular application that viewset is attempting to simulate. Since all applications focus on some features more than others, each one of these tests is weighted meaning that each test affects the final score differently, some more than others.

All results are reported in frames per second, so the higher the value, the better the performance is. The last result given for each of the viewsets is the WGM or Weighted Geometric Mean. This value is, as the name implies, the Weighted Geometric Mean of all of the test scores. The formula used to calculate the WGM is as follows:

With n being the number of tests in a viewset and w being the weight of each test expressed as a number between 0.0 and 1.0.

If you'd like to know more about why a Weighted Geometric Mean is used, SPEC has an excellent article detailing just why, here.

We ran the SPECviewperf 6.1.1 package under NT for a high-end workstation performance comparison. In order to place the strain on the CPU, we replaced the GeForce 256 with a regular TNT2 Ultra which does not feature any on-board geometry acceleration thus offloading all transforming & lighting requests onto the CPU.



Advanced Visualizer (AWadvs-03) Viewset

Taken from http://www.spec.org/gpc/opc.static/awadvs.htm

Advanced Visualizer from Alias/Wavefront is an integrated workstation-based 3D animation system that offers a comprehensive set of tools for 3D modeling, animation, rendering, image composition, and video output. All operations within Advanced Visualizer are performed in immediate mode with double buffered windows. There are four basic modes of operation within Advanced Visualizer:

     
  • 55% material shading (textured, z-buffered, backface-culled, 2 local lights)
    • 95% perspective, 80% trilinear mipmapped, modulated (41.8%)
    • 95% perspective, 20% nearest, modulated (10.45%)
    • 5% ortho, 80% trilinear mipmapped, modulated (2.2%)
    • 5% ortho, 20% nearest, modulated (.55%)
  • 30% wireframe (no z-buffering, no lighting)
    • 95% perspective (28.5%)
    • 5% ortho (1.5%)
  • 10% smooth shading (z-buffered, backface-culled, 2 local lights)
    • 95% perspective (9.5%)
    • 5% ortho (.5%)
  • 5% flat shading (z-buffered, backface-culled, 2 local lights)
    • 95% perspective (4.75%)
    • 5% ortho (.25%)

3D modeling and animation take advantage of the Athlon's superior FPU but at the same time seem to benefit more from the Pentium III's full speed L2 cache. The 1.6GB/s memory bandwidth made available by the i820/RDRAM platform also seems to have a positive effect on the results as the Pentium III 800 clearly distances itself from the competition.



DesignReview (DRV-06) Viewset

Taken from http://www.spec.org/gpc/opc.static/drv.htm

DesignReview is a 3D computer model review package specifically tailored for plant design models consisting of piping, equipment and structural elements such as I-beams, HVAC ducting, and electrical raceways. It allows flexible viewing and manipulation of the model for helping the design team visually track progress, identify interferences, locate components, and facilitate project approvals by presenting clear presentations that technical and non-technical audiences can understand. There are 6 tests specified by the viewset that represent the most common operations performed by DesignReview.

 

While the Athlon is still performing respectably, the Pentium III, even on the old BX platform is outperforming it on a clock for clock basis.



Data Explorer (DX-05) Viewset

Taken from: http://www.spec.org/gpc/opc.static/dx.htm

The IBM Visualization Data Explorer (DX) is a general-purpose software package for scientific data visualization and analysis. It employs a data-flow driven client-server execution model and is currently available on Unix workstations from Silicon Graphics, IBM, Sun, Hewlett-Packard and Digital Equipment. The OpenGL port of Data Explorer was completed with the recent release of DX 2.1.

The tests visualize a set of particle traces through a vector flow field. The width of each tube represents the magnitude of the velocity vector at that location. Data such as this might result from simulations of fluid flow through a constriction. The object represented contains about 1,000 triangle meshes containing approximately 100 verticies each. This is a medium-sized data set for DX.

The Pentium III is completely dominating here, the Athlon needs a faster L2 cache in order to help keep up. The added memory bandwidth of the i820 chipset (courtesy of RDRAM) gives Intel an even more noticeable lead at 800MHz. While they weren't included, moving to an i840 platform with double the memory bandwidth would most likely yield a very impressive performance boost here as well.



Lightscape (Light-03) Viewset

Taken from: http://www.spec.org/gpc/opc.static/light.htm

The Lightscape Visualization System from Discreet Logic represents a new generation of computer graphics technology that combines proprietary radiosity algorithms with a physically based lighting interface.

There are four tests specified by the viewset that represent the most common operations performed by the Lightscape Visualization System

The Athlon 800 finally makes its way to the top of this test. All of the contenders perform respectably however you'd most likely see a greater improvement in this benchmark by simply adding a GeForce 256 to any one of the systems instead of upgrading the CPU.



ProCDRS-02 Viewset

Taken from: http://www.spec.org/gpc/opc.static/procdrs.htm

The ProCDRS-02 viewset is a complete update of the CDRS-03 viewset. It is intended to model the graphics performance of Parametric Technology Corporation's CDRS industrial design software.

For more information on CDRS, see http://www.ptc.com/icem/products/cdrs/cdrs.htm

The viewset consists of ten tests, each of which represents a different mode of operation within CDRS. Two of the tests use a wireframe model, and the other tests use a shaded model. Each test returns a result in frames per second, and a composite score is calculated as a weighted geometric mean of the individual test results. The tests are weighted to represent the typical proportion of time a user would spend in each mode.

All tests run in display list mode. The wireframe tests use anti-aliased lines, since these are the default in CDRS. The shaded tests use one infinite light and two-sided lighting. The texture is a 512 by 512 pixel 24-bit color image.

The scale tips in favor of Intel here once again.



Under 3D Studio MAX AMD regains the performance lead we have been used to seeing from the Athlon. The raw FPU dependent nature of 3DSMAX gives the Athlon the edge here. AMD won't hold this lead for long if Intel keeps on beating them in L2 cache frequencies however.



Conclusion

The Athlon 700 was the last Athlon to feature the 1/2 speed L2 cache and it will most likely be a fairly expensive part to pick up even after the 750 and 800MHz parts make their way into the retail channels. The Athlon 800 is just another step in the Athlon line, for AMD, it's purpose is to offer a clock speed competitor to Intel's Pentium III 800, but it's main purpose for most of you will be to drive the prices of the other Athlon processors down which is never a bad thing.

The Athlon 800 will make its official shipping introduction sometime in January, hopefully by then the Athlon 750 will begin to surface around the net as well. It won't be much longer until we begin to hear more about AMD's Thunderbird and Spitfire based Athlon processors featuring a likely 512KB of full speed on-die L2 cache, until then, if you are in the market for an Athlon, the 500 - 600MHz chips are pretty affordable and very high performing solutions. If you must have the best then there will always be the 800, but be warned, the performance gain going from 700 to 800 isn't as great as we'd hope it would be courtesy of the 2/5 L2 cache divider.

All of this makes you wonder, who will be the first to break 800MHz?

For more information on the Athlon 800 take a look at Sharky Extreme's review of the CPU.

Log in

Don't have an account? Sign up now