Original Link: https://www.anandtech.com/show/645
AMD's 760 Chipset: DDR for the Athlon is here
by Anand Lal Shimpi on October 30, 2000 5:29 AM EST- Posted in
- CPUs
A little more than a month has passed since we first offered our preview of DDR SDRAM on the Athlon platform. Back then we concluded that the move to DDR SDRAM would provide, on average, a 5 – 20% boost in performance. Supported by the data in our preview, the AMD Athlon was set to be even more competitive with the Pentium III however it didn’t seem as if the performance improvement would be great enough to challenge Intel’s upcoming Pentium 4.
As we mentioned continuously throughout that preview, the article was nothing more than an indication of the minimum level of performance to expect from forthcoming DDR solutions for the Athlon. Today, AMD is providing us with the means to discover what the final shipping performance of the first DDR solutions for the Athlon will be as they will officially launch the AMD 760 chipset, the world’s first DDR solution for the Athlon platform.
Removing the first bottleneck: DDR SDRAM
When talking about memory bandwidth, one of the most commonly discussed areas where there is a “lack of enough” has been on the current crop of 3D graphics cards. Ever since the introduction of NVIDIA’s GeForce we have seen an increased focus on memory bandwidth and the bottlenecks its limitation provides us with when it comes to gaming performance.
Discussion about memory bandwidth is present in other areas of the PC outside of the 3D graphics card arena, however it isn’t the first thing that comes to mind when you’re talking about CPUs. Normally, after architecture, clock speed and cache size are the main factors that are taken into account when talking about CPUs, however memory bandwidth is critical to the performance of high speed CPUs.
Take the 1.2GHz Athlon that was released earlier this month. At 1200MHz, the CPU is running at 9 times the frequency of the memory bus (assuming PC133 SDRAM). As the Athlon gets higher and higher in clock speed, the ratio of CPU clock to memory clock will increase to the point where the CPU has to wait around on the memory before proceeding with any calculations.
The AMD 760 chipset helps to alleviate this problem by introducing the first ever support for Double Data Rate (DDR) SDRAM on an Athlon platform. By transferring twice as much data per clock (once on the rising edge and once on the falling edge), DDR SDRAM effectively offers twice as much memory bandwidth as an “equivalently clocked” Single Data Rate SDRAM solution.
As we mentioned in our initial preview of DDR technology on the Athlon, there are two flavors of DDR SDRAM that will be made available for use with the AMD 760 chipset: PC1600 and PC2100 DDR SDRAM.
The PC1600 and PC2100 names simply define the amount of available bandwidth each solution provides, so PC1600 essentially means a 1600MB/s or 1.6GB/s transfer rate and PC2100 means a 2100MB/s or 2.1GB/s transfer rate, both figures are theoretical maximums of course.
How are those two figures derived? Let’s look at the below table to explain that:
DDR
SDRAM Comparison
|
|||||||||||
Type
|
Memory
Bus Width
|
Operating
Frequency
|
Data Transferred per Clock
|
Memory
Bandwidth
| |||||||
PC1600 DDR SDRAM |
64-bits
(8-bytes)
|
100MHz
|
2
|
8
x 100 x 2 = 1600MB/s
|
|||||||
PC2100 DDR SDRAM |
64-bits
(8-bytes)
|
133MHz
|
2
|
8
x 133 x 2 = 2100MB/s
|
So as you can see, it’s quite simple. PC1600 DDR SDRAM operates at 100MHz and PC2100 DDR SDRAM runs at 133MHz. In terms of cost, PC1600 DDR SDRAM should be priced almost on par with PC133 SDRAM while PC2100 DDR SDRAM should carry a 15 – 25% price premium over PC133 SDRAM.
The AMD 760, much like the ALi MAGiK 1, does not allow for asynchronous operation of the FSB and the memory bus. For example, if you’re using PC1600 DDR SDRAM (100MHz operating frequency) your FSB must also be set to 100MHz. And if you’re using PC2100 DDR SDRAM (133MHz) your FSB must also be set to 133MHz. This means that both your FSB and your memory bus are running at 100 or 133MHz DDR. According to motherboard manufacturers it is very difficult to produce a design that will run the two buses asynchronously reliably, for example running PC2100 SDRAM using a 100MHz DDR FSB.
This obviously poses a problem as there have been no Athlon CPUs that use the 133MHz FSB. In order to get around this, the AMD 760 chipset will allow you to use PC2100 DDR SDRAM in a system that’s running at a 100MHz DDR FSB setting, however in doing so it will underclock your memory bus to 100MHz DDR as well, effectively making your PC2100 memory, PC1600 SDRAM.
So what’s the point of PC2100 SDRAM if you can never use it? Let’s find out…
New Chipset, New FSB, New CPUs
With the AMD 760 boosting memory bandwidth up to 2.1GB/s, all of the sudden the 1.6GB/s of bandwidth provided by the EV6 bus isn’t “enough.” So what’s the most logical step from here?
Well, if the memory bus is going to be running at 133MHz, why not do the same for the FSB. The AMD 760 also quickly removes the potential for a FSB bottleneck to be created by finally offering official support for the 133MHz DDR (266MHz) FSB on the Athlon platform.
If you think this is overkill, you have to take into account that the new Pentium 4 will feature a 100MHz quad-pumped (effectively 400MHz) FSB. The EV6 bus has the ability to scale up to 200MHz DDR (effectively 400MHz), so don’t be surprised if the next major chipset release from AMD brings that support as well.
But again, as we mentioned before, all of the Athlons that have been released prior to the AMD 760 chipset all use the 100MHz DDR FSB, and chances are that a 1.2GHz Athlon will not want to work at 12.0 x 133MHz.
This paves the way for AMD to introduce a new set of Athlon CPUs based on the 133MHz DDR FSB. Let’s take a look at those new CPUs:
New
Athlon CPUs
|
|||||||||||
CPU
|
FSB
Frequency
|
Clock
Multiplier
|
|||||||||
AMD Athlon 1.2GHz |
133MHz
|
9.0x
|
|||||||||
AMD Athlon 1.13GHz |
133MHz
|
8.5x
|
|||||||||
AMD Athlon 1.0GHz |
133MHz
|
7.5x
|
You can immediately tell that two of these CPUs, the 1GHz and 1.2GHz parts, overlap with 100MHz FSB parts while the 1.13GHz Athlon is a 133MHz FSB-only chip. The only thing that sets the 1GHz and 1.2GHz CPUs apart from their 100MHz FSB counterparts is their clock multiplier. While the 100MHz FSB Athlon running at 1.2GHz has a 12.0x clock multiplier (12 x 100 = 1200MHz), the 133MHz FSB version only has a 9.0x multiplier (9 x 133 = 1200MHz). Obviously if you have a motherboard that can get around the clock multiplier lock this won’t pose a problem to you, but if not, then you will want to make sure that you only use a 133MHz FSB CPU on an AMD 760 board.
A "new" Athlon
1.2GHz using the 133MHz FSB
The way you can tell if your CPU is a 100MHz or 133MHz FSB part is by looking at what AMD calls the Ordering Part Number (OPN). If the last letter in the OPN is a ‘B’ then you have a 100MHz FSB part, if it is a ‘C’ then you have a 133MHz FSB part. However vendors should properly advertise the CPUs as either 100MHz or 133MHz FSB models.
Rounding out the solution
The combination of DDR SDRAM and an increased FSB should give the Athlon a hefty boost in performance. But what else does the AMD 760 chipset offer?
The AMD 760 chipset is basically made up of two parts, the AMD 761 North Bridge and the 766 South Bridge. The AMD 761 North Bridge provides the DDR SDRAM and 133MHz FSB support we talked about earlier in addition to AGP 4X support which hasn’t really been of much use as we’ve seen from our benchmarks on the KX133 and KT133 chipsets from VIA.
The AMD 766 South Bridge adds Ultra ATA-100 support to the list of features, but other than that the features of the South Bridge are pretty much what we have been used to on the VIA boards.
What is most likely going to happen will be that motherboard manufacturers will use the AMD 761 North Bridge in combination with the VIA 686B South Bridge, since the latter is pin compatible with the AMD 766 and will most likely carry a cheaper price.
From AMD’s perspective, this isn’t really bothering them as they have never been nor will they ever be in the chipset business. They are here to make processors, and the only reason they continue to produce chipsets is so that they can provide a platform that supplies the technology their CPUs need to run optimally. If it were left up to VIA or ALi the Athlon would not have DDR SDRAM support this early, while we can expect their solutions to follow, AMD is definitely the first.
This is the same approach they took with the AMD 750 chipset. They released the AMD 750 so the Athlon would have a platform to run on then they let VIA take over with the KX133 chipset. The AMD 760 chipset will carry the Athlon with its support for DDR SDRAM until either ALi or VIA provides a solution that’s ready to take over.
The Board
AMD sent us their Corona EVT8 reference board for all of the tests in this review. Below you can see exactly what the Corona board looks like. Remember that this is only a reference board and it won't necessarily be what motherboard manufacturers choose to design their solutions around, although you may see it in some OEM systems.
Micron PC2100 CAS2.5 DDR SDRAM
New Test Bed & New Benchmarks
We recently conducted a Poll on the AnandTech Front Page that asked readers what they would like to see more of in our CPU reviews. There was an overwhelming demand for more analysis in the reviews, and while this AMD 760 review isn’t entirely a CPU review, we are introducing a brand new test bed and testing methodology starting with this review.
For starters, the test bed gets a couple of upgrades. We are still using the GeForce2 GTS (32MB) as our video card of choice, however we are upgrading the memory and hard drive on the test bed. It is finally time for our CPU review test bed to feature 256MB of memory to limit the amount of disk swapping that occurs during the more complex tests, especially the new ones we have added to our test suite.
With the introduction of the AMD 766 South Bridge and the VIA 686B which will shortly be available, this will mean that all of the popular chipsets will have native Ultra ATA/100 support. In order to take advantage of this we have switched our CPU test bed hard drive from the aging 20.5GB ATA/66 IBM Deskstar DPTA-372050 to a 30GB ATA/100 IBM Deskstar 75GXP. We reviewed this drive not too long ago and discovered that it has quite stellar performance, and thus we have adopted it for use in our CPU and platform tests. It also helps us to better test the IDE controllers on platforms and expose any weaknesses.
In terms of our test suite, we have seen some upgrades there as well. While we will continue to benchmark under Windows 98SE and Windows 2000, we will only be running our gaming suite under Windows 98SE. All other benchmarks will be reserved for Windows 2000. Speaking of which, we will start using Windows 2000 SP1 for all of our Win2K benchmarks as well.
Under Windows 98SE the following Gaming Benchmarks are run, all at a 32-bit color depth (demo used in parentheses):
Quake III Arena (demo001) – 640 x 480 & 1024 x 768
UnrealTournament (thunder) – 640 x 480 & 1024 x 768
MDK2 (built in timedemo) – 640 x 480 & 1024 x 768
Expendable (built in timedemo) – 640 x 480
The reason for the switch to 32-bit color testing alone is because we have noticed that with the GeForce2 GTS, the performance limitations provided by switching to 32-bit rendering aren’t great enough to influence CPU performance to any major degree. The Expendable demo isn’t run at 1024 x 768 since it is only used as a measure of cache/system memory performance and not as a gauge of real world performance like all of the other 1024 x 768 numbers.
Under Windows 2000 SP1 the following benchmarks are run at a resolution of 1024 x 768 x 16:
BAPCo SYSMark 2000
Ziff Davis Content Creation Winstone 2000
Ziff Davis High End Winstone 99
I-STREAM & FPU-STREAM
HD-Tach 2.61 (HDD controller max performance)
Distributed.net’s RC5 “Long” Benchmark
SPECviewperf 6.1.2 is also run under Windows 2000 SP1 however the resolution is set to 1280 x 1024 x 32 here in order to simulate a real world usage environment for the type of applications viewperf illustrates performance for.
Windows 2000 SP1 benchmarks are generally a few percent faster than their Windows 98SE counterparts making running both sets of tests very redundant and thus useless. You won’t see the performance standings of a group of CPUs change just because you’re running an application under Windows 98SE instead of Windows 2000 SP1.
In order to make sure that the platform isn’t behaving oddly under Windows 2000 SP1 or Windows 98SE we run a set of platform comparison benchmarks to illustrate any driver/chipset issues that need to be addressed by the manufacturer.
AnandTech uses SPEC CPU2000 for the first time
Unique to this review as well as a handful of reviews to come is the presence of an extremely useful benchmark, SPEC CPU2000.
One of the biggest problems that exists when dealing with benchmarks is the problem of platform specific optimizations that exist in these benchmarks. For example, in our Desktop CPU Comparison that was published in September 1999 we showed exactly what SSE or 3DNow! specific optimizations can do to otherwise normal performance standings. It is quite simple to make an Athlon seem like the slowest CPU in a test, just optimize heavily for SSE instructions. The reverse works as well, if you eliminate any sort of SSE optimizations and simply optimize for Athlon architectural advantages you can just as easily make the Pentium III seem like the slowest CPU.
The only way around this is to essentially make your own benchmarks, however that is not always a realistic thing to do as developing such benchmarks usually require more time than can be devoted to a single review. Instead, we provide you all with a suite of benchmarks that present an overall performance picture, and based on that you can generally come to the conclusion that one processor is faster or slower than another.
The Standard Performance Evaluation Corporation (SPEC) came up with another solution to this problem, make the person running the tests handle the job of compiling them as well. By forcing the tester to compile the tests on his/her own, the tester can choose to heavily optimize for one platform or another, but even better yet, the tester can heavily optimize for more than one platform to get the best possible performance figures on any given system. And being a non-profit organization, you don’t have to worry about one company or another contributing to paying SPEC’s bills at the end of each month.
We have already been using a SPEC supplied benchmark for quite some time now, SPECviewperf which measures graphics performance. SPEC CPU2000 is, as you can probably guess, a benchmark that more closely focuses on CPU performance. It does so by executing a total of 25 CPU intensive applications that can be used to stress different performance areas. The SPEC CPU2000 benchmark is split into two separate benchmarks, SPEC CINT2000 and SPEC CFP2000, the Integer and Floating Point benchmark sets.
The ‘C’ in the name CINT2000 and CFP2000 is supposed to stand for component-level benchmarks, indicating that SPEC CPU2000 is designed not to measure the overall performance of a system, as a real world application test would (i.e. SYSMark, Winstone, etc…) but measures the performance of one or more components in a system.
SPEC CPU2000 is completely disk and graphics independent, rather its performance depends on your CPU, FSB, memory bus, and of course, your compiler; the latter being a performance constant as long as you use the same compiler in all of your tests on a given platform.
For this particular review we’re going to be looking at SPEC CFP2000 performance since it seems to stress memory performance much more than the Integer benchmarks. In order to be fair to both AMD and Intel, we used the same config files that they used to submit their benchmarks to SPEC.
We also limited the compilers that we used to things that were currently available in final form meaning that Intel’s 5.0 Compiler did not make it into this review. This hurts Intel’s performance a bit as the 5.0 Compiler optimizations improved performance around 10%, but we will take that into account in our analysis and in our next major use of SPEC CPU2000 we will attempt to obtain the beta 5.0 Compiler from Intel to include benchmarks with as well.
The only other change we made was that we did not use the SmartHeap libraries that Intel and AMD both used in their SPEC tests as we did not have a copy during the time of benchmarking. With each full run of SPEC CFP2000 taking approximately 12 – 14 hours from compile time to result production, we were limited as to the amount of time that could be devoted to the SPEC benchmarks alone. In future tests we will use the SmartHeap libraries which seem to boost performance a few percent across the board.
The last thing to take into account when dealing with SPEC CPU2000 scores is that there are two overall scores that are outputted for every official run: SPECfp2000 and SPECfp_base2000.
The difference between the two is simple, base only allows a certain number of platform specific optimizations to be made and only one compiler to be used whereas the SPECfp2000 (otherwise known as ‘peak’) is a bit more lenient with the compiling stipulations and allows for more aggressive optimization, including the ability to use more than one compiler (i.e. using one compiler for some tests, and another compiler for others).
So basically the base numbers show you how the platform performs without any aggressive optimizations, while the peak numbers show you what kind of performance you can get if a particular application developer heavily optimizes for a particular platform. Since we’re heavily optimizing for both Intel and AMD CPUs in our tests, we should see the highest performance available today using these CPUs.
Now that you’ve made it through all of that, let’s get to see our tests in action and see how DDR SDRAM really stacks up on the Athlon…
The Test
Windows 98SE / 2000 Test System |
|||||
Hardware |
|||||
CPU(s) |
Intel Pentium III 1GHz
|
AMD Thunderbird
1.2GHz |
|||
Motherboard(s) | ASUS CUSL2/Intel OR840/Intel VC820 | ASUS A7V/AMD Corona Reference Board | |||
Memory |
256MB
PC133 Corsair SDRAM (Micron -7E CAS2) |
||||
Hard Drive |
IBM Deskstar 30GB 75GXP 7200 RPM Ultra ATA/100 |
||||
CDROM |
Phillips 48X |
||||
Video Card(s) |
NVIDIA GeForce 2 GTS 32MB DDR (default clock - 200/166 DDR) |
||||
Ethernet |
Linksys LNE100TX 100Mbit PCI Ethernet Adapter |
||||
Software |
|||||
Operating System |
Windows
98 SE |
||||
Video Drivers |
|
||||
Benchmarking Applications |
|||||
Gaming |
Unreal
Tournament 4.32 Reverend's Thunder.dem |
||||
Productivity |
BAPCo SYSMark
2000 |
||||
Low Level |
Linpack
SPEC CPU2000 (Intel C/C++ & Fortran Compilers 4.5, Compaq Visual Fortran 6.5, MS Visual C++ 6.0) SiSoft Sandra HD-Tach 2.61 |
Illustrating the need for DDR - Linpack
A benchmark that has always come in handy but has never really been used on AnandTech is something known as Linpack. The guys over at Ace's Hardware have been using Linpack for quite some time, and it is to them that we owe the discovery of the benchmark as we will begin using it as well to help analyze performance.
While it may be hard to get anything out of the above graph at first sight, keep in mind that all of the blue lines represent the Intel Pentium III 1GHz while the two green lines (one dark one light) represent the Athlon 1GHz.
What Linpack allows us to do is notice how the performance of the various CPUs/platforms drops off as the size of the data set being manipulated increases. For example, if you notice at the very top of the chart is a peak shared by the two AMD setups. This peak occurs at a data size of around 64KB, which is exactly the size of the Athlon’s L1 Data Cache. The performance then drops until the data size reaches around 384KB which is the end of the Athlon’s on-die cache, and this is where we see the two Athlon lines separate, the darker line representing the performance of the AMD 760 which features a faster DDR memory bus. This faster bus translates into less of a performance hit when going from on-die cache to system memory since the system memory is performing more like the on-die cache (but still far from coming close in the big picture).
The picture is a bit different when it comes to the Pentium III as the CPU peaks much later than the Athlon, however the big drop off again occurs after the Pentium III’s 256KB on-die cache is filled (256KB L2 + 16KB L1 - 16KB duplicated L1 click here for more info on the Pentium III's Inclusive L2 Cache).
It is this tail that we need to pay attention to since it represents what happens when an application doesn’t conveniently fit within the Athlon’s large L1 and/or L2 cache.
Zooming in on the tail from above we see a very interesting picture. Let’s start from the bottom and go up. The lower the line is the more limiting the memory solution is to the performance of the CPU.
At the very bottom of the list we find the i820 chipset which boasts a hefty 1.6GB/s available memory bandwidth however, in a real world scenario, isn’t able to even outperform the i815’s PC133 SDRAM. In the end, the i820 chipset is holding the Pentium III back more than its helping it.
The i840 is represented by the second to last line. While it offers superior performance in comparison to the i820 it is still pretty low on the performance scale, not to mention its price is much higher than everything else on this chart.
Now here’s the interesting part, while you would expect the i815 to follow as the next slowest, you don’t see that. Instead, it’s the VIA KT133 chipset that follows. This means that the KT133 chipset provides less memory bandwidth to the CPU than the i815E does, with both memory buses clocked identically at 133MHz.
Finally we have the AMD 760 with its PC2100 DDR SDRAM coming out on top, not only offering the best performance out of the five solutions featured in the chart but also giving the Athlon the ability to scale and perform even better than it already has been doing with the VIA KT133 chipset.
SPEC CFP2000 Performance
Time constraints kept us from completing the 6 to 8 hour run of SPECfp2000 on the KT133 chipset, however we managed to get individual scores for it which you will see shortly. But let’s look at how the AMD 760 stacks up to the i840 and i815 in SPECfp2000.
Intel’s current fastest setup is the i840 with the Pentium III 1GHz, and the 1GHz Athlon on the AMD 760 has no problem trampling all over that solution. A combination of the new 266MHz FSB and its PC2100 DDR SDRAM provide for this 25% performance lead.
We expect that the new Intel Compilers (v5.0) will improve performance another 10% or so in this benchmark for the Intel CPUs, still leaving the Athlon/AMD 760 with a decent lead.
From what we’ve seen though this level of performance won’t be enough to compete with the Pentium 4, luckily AMD can also compete on the basis of price and they also have another ace up their sleeves which they have yet to reveal.
The performance advantage is severely reduced with just bare minimum compiler optimizations, and as we’re about to find out this isn’t the only time we’ll see that happen.
Let's start out by looking at the breakdown of the SPECfp2000 scores...
Floating
Point Performance - SPEC CFP2000
base number (peak number) |
|||||||||||
Test
|
Athlon/AMD
760 (PC2100 DDR)
|
P3/i840
(PC800 Dual RDRAM)
|
Athlon/VIA
KT133 (PC133)
|
P3/i815
(PC133)
| |||||||
168.wupwise |
334
(476)
|
348
(360)
|
316
(437)
|
346
(350)
|
Let's start with the 168.wupwise test, off the bat the i840 with its Dual Channel RDRAM comes out ahead. This test is more of a computational test than one that deals with a large data set which would stress memory bandwidth performance. In this case, both the Intel platforms come out ahead of the two AMD platforms. This helps to eliminate the platform as the reason for the lower scores, and pinpoints the CPU as being at fault. The only explanation that exists here is an architectural advantage of the Pentium III over the Athlon.
Turning on peak optimizations changes things a bit as AMD can now take advantage of more aggressive optimizations. Only with the upcoming 5.0 Compiler from Intel will the Pentium III on the 840 be able to catch up.
Floating
Point Performance - SPEC CFP2000
base number (peak number) |
|||||||||||
Test
|
Athlon/AMD
760 (PC2100 DDR)
|
P3/i840
(PC800 Dual RDRAM)
|
Athlon/VIA
KT133 (PC133)
|
P3/i815
(PC133)
| |||||||
171.swim |
763
(763)
|
439
(438)
|
496
(488)
|
285
(285)
|
While the wupwise test was very floating point intensive and not very stressful on the memory bus, this next test, 171.swim is the exact opposite. This test is a weather prediction program that works with a huge 1335 x 1335 array of data run over 512 distinct timesteps.
The Athlon does hold a bit of an advantage here as it has a larger total cache than the Pentium III (384KB vs 256KB) so more of the data being manipulated can be stored in its cache, but once that cache is full the performance drop is enormous since the CPU must rely on the slow path to system memory.
The Athlon on a VIA KT133 is thus a little faster than the Pentium III on the i840 because of its larger L2 cache, however with the newer Intel 5.0 Beta compiler the 840 would be virtually on par with the KT133. The i815 gets penalized the most here since it has the same amount of memory bandwidth as the KT133 but it lacks the Athlon's large on-die cache.
Now take a look at the Athlon on the AMD 760 with a full 2.1GB/s of CAS2.5 PC2100 DDR SDRAM. The performance is no less than 54% greater than the second fastest KT133. Even with the new Intel 5.0 Compiler Beta the Athlon on the AMD 760 would enjoy at least a 50% performance advantage. The high latency of RDRAM, even in spite of the dual channel nature of the i840's implementation of it, is too high to compete with DDR SDRAM in this case.
The peak performance does not change, further supporting the idea that this is mostly a cache/memory test and there's very little you can do to make it perform better on any given CPU other than increasing the clock speed or dramatically changing the architecture.
Floating
Point Performance - SPEC CFP2000
base number (peak number) |
|||||||||||
Test
|
Athlon/AMD
760 (PC2100 DDR)
|
P3/i840
(PC800 Dual RDRAM)
|
Athlon/VIA
KT133 (PC133)
|
P3/i815
(PC133)
| |||||||
172.mgrid |
254
(346)
|
251
(251)
|
238
(277)
|
208
(208)
|
172.mgrid is another heavy computational test that doesn't really stress memory bandwidth to the degree that the 171.swim test did. There is much less of a focus on memory latency as there is on memory bandwidth in this test. If you'll notice, the Pentium III on the i815 comes in last, however with the 3.2GB/s of memory bandwidth on the i840 the Pentium III very quickly becomes faster than the Athlon on the KT133 platform. It takes the added bandwidth of DDR SDRAM to give AMD the slight lead here over the more expensive 840. It seems like AMD's large L1 and L2 caches come in handy yet again as latency isn't a major factor in this benchmark, rather pure memory bandwidth.
The peak SPECfp2000 numbers indicate that there is still much room for the Athlon to grow in terms of performance if optimized properly for the CPU.
Floating
Point Performance - SPEC CFP2000
base number (peak number) |
|||||||||||
Test
|
Athlon/AMD
760 (PC2100 DDR)
|
P3/i840
(PC800 Dual RDRAM)
|
Athlon/VIA
KT133 (PC133)
|
P3/i815
(PC133)
| |||||||
173.applu |
377
(394)
|
248
(249)
|
282
(303)
|
219
(221)
|
The APPLU benchmark plots the solution of five Partial Differential Equations on a 3D-grid. In this case you're dealing with quite a bit of data yet again, however the Athlon architecture in general seems to be providing quite a bit of the performance advantage over the Pentium III here. For example, the Athlon on the KT133 is 14% faster than the Pentium III on the i840. Drop the Athlon in an AMD 760 board with PC2100 DDR SDRAM and a 266MHz FSB and all of the sudden you've got a 52% advantage over Intel's fastest solution.
The performance benefit here comes from a combination of the DDR SDRAM and the increased FSB. For the two Intel platforms, peak optimizations did not do much however on both of the Athlon platforms the more aggressive optimizations improved performance by a noticeable degree.
Floating
Point Performance - SPEC CFP2000
base number (peak number) |
|||||||||||
Test
|
Athlon/AMD
760 (PC2100 DDR)
|
P3/i840
(PC800 Dual RDRAM)
|
Athlon/VIA
KT133 (PC133)
|
P3/i815
(PC133)
| |||||||
177.mesa |
256
(390)
|
391
(391)
|
256
(382)
|
387
(386)
|
The situation deviates from the "norm" (at least for the past few benchmarks) with the Mesa test. The Mesa test performs the following operation:
"The input data is a 2D scalar field. The scalar data is mapped to height, creating a 3D object with explicit vertex normals. Contour lines are mapped onto the object as a 1D texture."
The description of the test has SSE written all over it, and considering that Intel's SSE optimizations are superior to the current state of AMD's 3DNow! it doesn't seem surprising that the Pentium III completely tramples over the Athlon here. Only when using aggressive optimizations (peak) can the Athlon come close to the performance of the Pentium III, however the new 5.0 Intel Compilers should help extend that lead even further.
There is almost no stress here on memory performance as the i840 and i815E perform within 1% of one another and the AMD 760 and KT133 scores are identical. It just goes to show you that DDR won't be your savior in all situations.
Floating
Point Performance - SPEC CFP2000
base number (peak number) |
|||||||||||
Test
|
Athlon/AMD
760 (PC2100 DDR)
|
P3/i840
(PC800 Dual RDRAM)
|
Athlon/VIA
KT133 (PC133)
|
P3/i815
(PC133)
| |||||||
178.galgel |
533
(532)
|
292
(295)
|
451
(429)
|
277
(281)
|
The 178.galgel test is a test in computational fluid dynamics and inherently benefits from the Athlon's superior FPU, thus giving the Athlon on the KT133 the immediate lead over both the i840 and the i815 platforms. The combination of the 266MHz FSB and the PC2100 DDR SDRAM gives the Athlon an 18% advantage over the Athlon on the KT133 platform, not to mention an even greater advantage on the two Intel platforms.
Floating
Point Performance - SPEC CFP2000
base number (peak number) |
|||||||||||
Test
|
Athlon/AMD
760 (PC2100 DDR)
|
P3/i840
(PC800 Dual RDRAM)
|
Athlon/VIA
KT133 (PC133)
|
P3/i815
(PC133)
| |||||||
179.art |
292
(304)
|
328
(331)
|
208
(208)
|
271
(275)
|
As we noticed at the start of this review with our Linpack benchmarks, the Athlon on a KT133 offers worse memory performance than an equivalently clocked Pentium III on an i815 as the data size increases beyond the initial 384KB cache of the Athlon. Thus it isn't surprising that the Athlon on the KT133 comes out slower than the Pentium III on the i815 chipset here, but the fact that the Pentium III is 30% faster is pretty surprising.
We can't attribute this to a shortcoming of the Athlon as the same CPU with a faster FSB and DDR SDRAM on the AMD 760 produced a score greater than that of the Pentium III on the i815. And we also know that the FSB isn't the cause for the bulk of the performance improvement as the Athlon already has effectively a 200MHz FSB. So this benchmark obviously stresses memory bandwidth. With the Pentium III on the 840 offering at least a 12% advantage over the Athlon with DDR SDRAM, you can expect the data accesses to come in a very serialized fashion and thus taking full advantage of the i840's 3.2GB/s of memory bandwidth.
The only thing we're not taking into account here is price, in which case the i840 loses quite a bit of its appeal.
Floating
Point Performance - SPEC CFP2000
base number (peak number) |
|||||||||||
Test
|
Athlon/AMD
760 (PC2100 DDR)
|
P3/i840
(PC800 Dual RDRAM)
|
Athlon/VIA
KT133 (PC133)
|
P3/i815
(PC133)
| |||||||
183.equake |
246
(319)
|
256
(263)
|
219
(266)
|
243
(256)
|
This test basically attempts to simulate the 1994 Northridge Earthquake aftershock in the San Fernando Valley of SoCal. The benchmark does give the Intel CPUs quite a bit of an advantage, requiring the AMD 760 before the Athlon can even begin to compete. This continues to illustrate the point that VIA has been holding the Athlon's performance back by a considerable amount.
Once again we see that with aggressive compiler optimizations the Athlon can gain a noticeable lead over the Pentium III, however when looking at relatively unoptimized scores the Athlon's performance isn't too spectacular. This is one benefit Intel gets out of having their own C/C++ and Fortran compilers whereas AMD has to rely on Intel's compilers as well as Compaq's Visual Fortran.
Floating
Point Performance - SPEC CFP2000
base number (peak number) |
|||||||||||
Test
|
Athlon/AMD
760 (PC2100 DDR)
|
P3/i840
(PC800 Dual RDRAM)
|
Athlon/VIA
KT133 (PC133)
|
P3/i815
(PC133)
| |||||||
187.facerec |
419
(418)
|
240
(238)
|
378
(377)
|
241
(239)
|
The Face Recognition benchmark is quite possibly one of the most interesting benchmarks in the FP suite of SPEC CPU2000. It essentially involves converting a photograph of a face into a set of graphs (with values representing mathematical equivalentes of features of the face) and probing through a gallery attempting to find a match for the face. This is an application could very possibly be a situation that the next generation of CPUs may find themselves in, and while it makes for an interesting benchmark it's not too useful for proving the performance of DDR SDRAM on the Athlon.
The 11% improvement the AMD 760 does hold over the KT133 is most likely due to the increase in FSB frequency as the increased memory bandwidth of the i840 didn't offer any performance improvement over the i815 where the FSB remained the same.
In any case, the main reason for the Athlon's superior performance here is its powerful architecture.
Floating
Point Performance - SPEC CFP2000
base number (peak number) |
|||||||||||
Test
|
Athlon/AMD
760 (PC2100 DDR)
|
P3/i840
(PC800 Dual RDRAM)
|
Athlon/VIA
KT133 (PC133)
|
P3/i815
(PC133)
| |||||||
188.ammp |
219
(318)
|
301
(301)
|
203
(271)
|
321
(320)
|
The 188.ammp test, according to SPEC, models "large systems of molecules usually associated with Biology." The test is far from memory bandwidth dependent, rather the base optimizations run beautifully on the Intel platforms with the lower latency SDRAM on the i815E giving it the advantage over the higher bandwidth i840 solution. Only when using aggressive compiler optimizations, again, can the Athlon begin to compete.
The performance improvement caused by the move to DDR SDRAM is most likely negligable, the greatest performance boost seems to come from the FSB or possibly a combination of both.
Floating
Point Performance - SPEC CFP2000
base number (peak number) |
|||||||||||
Test
|
Athlon/AMD
760 (PC2100 DDR)
|
P3/i840
(PC800 Dual RDRAM)
|
Athlon/VIA
KT133 (PC133)
|
P3/i815
(PC133)
| |||||||
189.lucas |
244
(244)
|
307
(307)
|
219
(219)
|
298
(298)
|
189.lucas attempts to prove whether a very large number is prime or not. This fairly simple (simple in that it's not doing much outside of that one function) benchmark can very easily take advantage of a particular architecture over another. In this case, the Pentium III takes the gold and it doesn't seem that even more aggressive optimizations can help the Athlon here. There is almost no memory bandwidth dependency in this benchmark.
Floating
Point Performance - SPEC CFP2000
base number (peak number) |
|||||||||||
Test
|
Athlon/AMD
760 (PC2100 DDR)
|
P3/i840
(PC800 Dual RDRAM)
|
Athlon/VIA
KT133 (PC133)
|
P3/i815
(PC133)
| |||||||
191.fma3d |
399
(399)
|
275
(294)
|
339
(339)
|
267
(285)
|
FMA-3D is essentially a complex 3D collision simulation benchmark. With no graphics card dependencies we can see that the performance of this benchmark is dependent on a powerful FPU and a fast bus interface. The benefit of a high bandwidth memory solution isn't as evident here as the i840 only offered a 3% improvement over the i815.
Floating
Point Performance - SPEC CFP2000
base number (peak number) |
|||||||||||
Test
|
Athlon/AMD
760 (PC2100 DDR)
|
P3/i840
(PC800 Dual RDRAM)
|
Athlon/VIA
KT133 (PC133)
|
P3/i815
(PC133)
| |||||||
200.sixtrack |
218
(234)
|
165
(175)
|
217
(230)
|
164
(175)
|
Again we have a benchmark that doesn't focus on memory or FSB bandwidth, although the Athlon's FSB bandwidth advantage over the Pentium III most definitely plays a part in its performance lead here.
Floating
Point Performance - SPEC CFP2000
base number (peak number) |
|||||||||||
Test
|
Athlon/AMD
760 (PC2100 DDR)
|
P3/i840
(PC800 Dual RDRAM)
|
Athlon/VIA
KT133 (PC133)
|
P3/i815
(PC133)
| |||||||
301.apsi |
298
(298)
|
333
(352)
|
259
(259)
|
317
(330)
|
Our final SPEC CFP2000 benchmark uses another weather prediction algorithm. This time around the data array is smaller than what we saw with the 171.swim test, however it is still fairly large. In spite of this, the benchmark is dominated by the Pentium III on the i840 platform. This advantage doesn't come from the memory bandwidth advantage the i840 holds over the competition, since even it only scores 5% above that of an i815. It seems as if the Pentium III is better suited for this test, and the performance lead will only extend with the next version of Intel's compilers.
SPECfp2000 Performance Summary
Floating
Point Performance - SPEC CFP2000
|
|||||||||||
Test
|
AMD
760 (PC2100 DDR)
|
Intel
840 (PC800 Dual RDRAM)
|
VIA
KT133 (PC133)
|
Intel
815 (PC133)
| |||||||
168.wupwise |
334
(476)
|
348
(360)
|
316
(437)
|
346
(350)
|
|||||||
171.swim |
763
(763)
|
439
(438)
|
496
(488)
|
285
(285)
|
|||||||
172.mgrid |
254
(346)
|
251
(251)
|
238
(277)
|
208
(208)
|
|||||||
173.applu |
377
(394)
|
248
(249)
|
282
(303)
|
219
(221)
|
|||||||
177.mesa |
256
(390)
|
391
(391)
|
256
(382)
|
387
(386)
|
|||||||
178.galgel |
533
(532)
|
292
(295)
|
451
(429)
|
277
(281)
|
|||||||
179.art |
292
(304)
|
328
(331)
|
208
(208)
|
271
(275)
|
|||||||
183.equake |
246
(319)
|
256
(263)
|
219
(266)
|
243
(256)
|
|||||||
187.facerec |
419
(418)
|
240
(238)
|
378
(377)
|
241
(239)
|
|||||||
188.ammp |
219
(318)
|
301
(301)
|
203
(271)
|
321
(320)
|
|||||||
189.lucas |
244
(244)
|
307
(307)
|
219
(219)
|
298
(298)
|
|||||||
191.fma3d |
399
(399)
|
275
(294)
|
339
(339)
|
267
(285)
|
|||||||
200.sixtrack |
218
(234)
|
165
(175)
|
217
(230)
|
164
(175)
|
|||||||
301.apsi |
298
(298)
|
333
(352)
|
259
(259)
|
317
(330)
|
In spite of SPEC CFP2000's compiler dependencies it tells us quite a bit about the performance of DDR and higher memory bandwidth solutions in future applications.
For starters, in many situations, the Athlon requires specific attention be paid to optimizing for its unique architecture in order to gain the most performance out of it. Intel holds the advantage here in that their C/C++ and Fortran compilers are readily available to developers which obviously make optimizing for the Pentium III a fairly simple task. In contrast, AMD must rely on not only these Intel compilers which obviously don't favor the Athlon platform in addition to third party compilers that don't necessarily have all of the latest Athlon optimizations in all versions. This helps to explain why the Athlon was outperformed under some SPECfp_base2000 tests while enabling more aggressive optimizations tilted the balance in the favor of AMD.
The combination of PC2100 DDR SDRAM and the 266MHz EV6 bus is killer for AMD, it often times puts the i840 to shame. For the high-end workstation, the i840 is no longer the best solution as the AMD 760 chipset can easily outperform it in many situations, not to mention that it is much cheaper. There was only one situation in which the Athlon/AMD 760 setup came out as the slowest among the four, that being the Mesa test, however with proper optimizations you will notice that the performance of that same platform comes within 1 point of the Pentium III/i840 test bed.
There were a number of situations (7 out of 14 to be exact) in which the Athlon on the KT133 was the slowest solution of the four. This, combined with the knowledge we extracted from the Linpack benchmarks earlier helps to illustrate the shortcomings in VIA's memory controller that is present in the KT133. Remember that this is the same memory controller that we criticized on the KX133 chipset and on the VIA Apollo Pro 133A chipset. We have said it before and we'll say it again, the KT133 chipset is continuing to hold back the performance of the Athlon. While it's better now than when the memory controller was first introduced to the Athlon on the KX133 chipset, it's still a noticeably limiting factor. Let's hope that VIA's DDR memory controller is better.
Prior to the arrival of the AMD 760 chipset, the Athlon has lost out to the Pentium III on a clock for clock battle under most games. As we see from the above performance chart and as we're about to see more of, the rules have changed with the AMD 760.
Once again we're seeing a 10% performance advantage held by the AMD 760 over the KT133 on equivalently clocked Athlons. Remember that this performance advantage comes both from the increased FSB (133MHz DDR vs 100MHz DDR) as well as the use of DDR SDRAM.
At 1024 x 768 x 32 the memory bandwidth of our GeForce2 GTS is the limiting factor and thus there is no performance difference between any of the platforms in the test.
MDK2's performance is similar to that of a FPS like Quake III Arena, however noticeably less complex. We see an 8% performance advantage provided by the AMD 760 in this test.
And once again, with video card limitations kicking in, all of the platforms appear to perform identically.
UnrealTournament is showing us an 8% improvement like MDK2, and we also see the dethroning of the Pentium III as the performance king on a clock for clock basis.
As video card limitations kick in, the performance improvement DDR SDRAM can offer is limited to a very small amount.
Historically a very cache/memory intensive benchmark, Expendable proves to be so once again as it shows a 14% improvement with the AMD 760 chipset.
As we noticed in our first preview of the AMD 760 chipset, the performance boost we saw in benchmarks such as SYSMark 2000 was negligible. You have to understand the way SYSMark 2000 works in order to understand why this sort of performance is expected.
SYSMark 2000 runs a total of 12 applications ranging from simple office applications like Microsoft Word 2000 to much more demanding applications such as Avid Elastic Reality.
However SYSMark 2000 runs each one of these applications individually, which places the least amount of stress on an effective high bandwidth memory solution. Thus the performance improvement here bought by DDR SDRAM and the 266MHz FSB is hardly noticeable.
Fortunately for AMD, this isn’t how most people use their systems, instead you generally have more than one application open at once and you switch between them. Ziff Davis has the benchmark that simulates performance under that sort of an environment.
Content Creation Winstone 2000 takes applications such as Microsoft Word 2000 and runs them in the background with other applications such as Adobe Photoshop or Macromedia Dreamweaver (HTML editor), the benchmark then runs tasks in all of the applications by switching between them, much like a power user would use his/her system at home.
As you can guess, this puts a much greater stress on the memory bus of a system as the applications can no longer reside solely in the L2 cache of the CPU. Here we see a 5 – 10% boost in performance courtesy of the higher FSB as well as the DDR SDRAM.
It is important to note that, like our Linpack benchmarks showed earlier, the Athlon with the KT133 chipset at 1GHz comes out in last place among the above contenders.
The principle behind High End Winstone 99 is much like that of SYSMark 2000, a single application is run and its time to complete certain tasks is recorded.
The performance boost we see from the move to a faster FSB frequency as well as DDR SDRAM is around 5% here as well.
In this first viewset, Awadvs-04, a simulation of an workstation level 3D animation system is run that stresses both shading and wireframe rendering of polygons. With the majority of the benchmark dealing with shading it isn’t surprising to see a distinct lead held by the AMD 760 solutions over the rest of the contenders.
Notice that the i820 and i840 platforms, with their higher peak memory bandwidth also outpace the i815, with the i840 doing so by the largest amount at 7%.
The AMD 760 allows for a 7 – 10% lead in performance here which is in line with the other performance figures we’ve seen thus far.
The DesignReview viewset (DRV-07) is heavily weighted towards polygon throughput which happens to give the advantage to the dual channel RDRAM setup of the i840 chipset. This allows the i840 to hold close to a 7% performance advantage over the 1.2GHz Athlon on an AMD 760 platform.
The i815 and i820 solutions are performing about the same, while the regular KT133 based platforms are lagging noticeably behind the rest of the contenders here.
The Data Explorer viewset (DX-06) makes use of a fairly large data set that can take advantage of the increased FSB and memory bus of the AMD 760 platform, thus giving a 10% improvement in performance on a clock for clock basis over the regular Athlon.
We also see an example here of the Pentium III being unable to keep up with the Athlon regardless of platform. We noticed similar cases in the SPEC CPU2000 tests earlier, as well as situations where the exact opposite was true as well.
The Lightscape viewset (Light-04) has always been able to scale with CPU performance quite well, mainly because of its use of very complex rendering algorithms. Again we’re seeing a 10% improvement in performance on a clock for clock basis with the AMD 760.
MedMCAD-01 is a newcomer to the SPECviewperf test suite and it simulates the performance of MCAD applications such as Pro/E and SolidWorks. While a lot of the calculations here are offloaded onto the GeForce2 GTS’ GPU, decent amount of success in this benchmark is due to more powerful CPUs and in the case of the AMD 760, faster FSB and memory buses.
The final part of SPECviewperf is the ProCDRS-03 viewset which is a simulator of industrial design software. This time the performance advantage the AMD 760 holds over the KT133 is more in the 15% range. Because of this increased dependency on memory bandwidth the i820 and i840 chipsets move from the bottom of the charts up a notch to replace the KT133.
As you can see, the available memory bandwidth is noticeably greater on the AMD 760 than on any of the other platforms, including the i840 which theoretically offers a greater amount of memory bandwidth. As we've noticed from our earlier Linpack results however, available memory bandwidth does not always translate into higher performance as the lowest scoring i815 here managed to outperform the VIA KT133 in our Linpack as well as some of our SPEC CPU tests.
Again, stellar bandwidth figures offered by the AMD 760 chipset.
Issues
In order to make sure that the AMD 766 chipset did have a functional ATA-100 controller we attempted to test its theoretical maximum transfer speed, however as you can see above we noticed that under Windows 2000, the controller wouldn't burst above 60MB/s. While this doesn't matter for current hard drives it could be a problem in the future, however when we ran the same benchmark under Windows 98SE we noted a 78.5MB/s score meaning that the issue is driver related. Hopefully updated Windows 2000 Bus Mastering drivers will resolve the issue.
Other than the aforementioned hard disk controller performance issue, the AMD 760 performed fine under both Windows 98SE and Windows 2000 SP1. There were no major performance anomalies that could be attributed to driver issues under either OS.
The only suggestion AMD gave us was to make sure to use NVIDIA's 6.27 Detonator3 drivers under Windows 98SE since they properly supported the AGP 4X capabilities of the chipset whereas the newest 6.31s didn't. However it was ok to use the 6.31's under Windows 2000 SP1 with the AMD 760.
An issue we encountered with the AMD Corona reference board was that it would not allow us to set the FSB to 100MHz DDR (200MHz). This prevented us from doing any clock for clock comparisons using the 133MHz DDR vs 100MHz DDR FSBs. Even when we plugged in a 100MHz FSB CPU the board simply attempted to use the 133MHz FSB, it doesn't seem as if there's a pin that sets the FSB state on the Athlon or if there is the reference board doesn't perform a detect as to the status of that pin.
For the most part the issues we encountered were minimal, the success of the AMD 760 will be dependent on motherboard manufacturers properly implementing the chipset and AMD releasing updated drivers under all of the major OSes.
Final Words
DDR SDRAM is finally here and with it the Athlon has also received a bump in its already speed FSB operating frequency, but at what performance benefit to the end user?
In general you're looking at a 10% increase in performance across the board when using PC2100 DDR SDRAM, using PC1600 DDR SDRAM will obviously yield lower results. While there will be some applications that will benefit more from DDR SDRAM and the increased FSB (servers, high end workstations, etc...) there will be other situations in which DDR SDRAM doesn't do much at all (office applications, web browsing, etc...).
If you currently own an Athlon running on a KT133 board, you'll probably want to save your money and upgrade when the next generation Athlon hits the streets instead of upgrading now for an extra 10% boost. There won't be much of a reason to go with 100MHz FSB Athlons after the first AMD 760 boards start hitting the streets later this year. You can probably expect to see the first boards surface in the next 1 - 2 months, with OEMs taking orders for systems very soon. However don't expect to see AMD 760 boards in major retail systems until sometime next year.
As far as price is concerned, the AMD 760 chipset will most likely be more expensive than the KT133. Combine that with the fact that the 133MHz FSB Athlons are carrying a small price premium over their 100MHz siblings ($20 - $50 extra for the 133MHz FSB parts), and the added cost of DDR SDRAM (assuming you're going for PC2100 DDR SDRAM) you're probably going to find yourself wanting to wait before committing to such a major upgrade.
In comparison to the upcoming Pentium 4, even with DDR SDRAM and the 133MHz FSB it may be difficult for a 1.2GHz Athlon to compete with a 1.5GHz Pentium 4 in many benchmarks. AMD may be forced to release a higher clocked Athlon in order to remain competitive performance-wise, however they will almost definitely hold the price to performance ratio crown throughout the rest of the year.
In the end, the move to the AMD 760 chipset is very well received by the Athlon as it is no longer held back by the memory performance limitations of the KT133. And while we hope that VIA's DDR chipset will prove to be even better than the AMD 760, our true hope for the Athlon platform lies in the successor to the Thunderbird core, let's hope that the next incarnation of the Athlon core is a much cooler running one. It should be interesting to see what AMD positions against the Pentium 4 after the Thunderbird core is replaced...