Original Link: https://www.anandtech.com/show/207
It all started with the introduction of the AMD K5, a processor that was supposed to be superior to the Intel Pentium at a much lower cost and a lower clock speed as well. Unfortunately that introduction was plagued by a nine-month delay that allowed Intel to push forth with the announcement of their Pentium MMX which AMD simply couldnt compete with. Then came the AMD K6, which was supposed to offer performance greater than any Intel processor, once again, at a lower cost. When the K6 was finally introduced, it struggled to compete with the Pentium Pro, and left a minor gap between itself and Intels desktop class of processors, the Pentium MMX. |
It wasnt until the release of the K6-2 that there was reason to have faith in AMD again. The K6-2 pulled through as a highly competitive product to the Pentium II, at a lower cost. In response to this threat, Intel released their own low-cost alternative, the Celeron A which once again, put AMD to shame. Throughout their history, AMD has always seemed to fall just a hair short of winning the gold, and in a race where only the winner survives, second place just doesnt cut it.
That brief synopsis sums up the general state of things from 1997 with the release of the K6, to 1998, the year dominated by the Super7/K6-2 platform. While it has been said that history repeats itself, for AMD to repeat the course of events in the past 3 years wouldnt be the most desirable. It is true that AMD has been successful in their ventures, however theyve never really captured the lime light as well as they could have. So what better way is there to start off a brand new year, than with the introduction of a brand new processor that is finally worthy of the AMD name. As we welcome the New Year this holiday season, its also time to introduce AMDs latest concoction, the K6-3.
What you need
Officially planned for launch sometime in early 1999, the K6-3 will be the last processor from AMD to be used in a Socket-7 motherboard before they make the transition to their new slot based architecture for the K7. The roots of the K6-3 are securely fastened in the same ground that sprouted the K6-2, in that the K6-3 is based on the same core as its predecessor was. The 0.25 micron chip boasts the same 64KB of L1 cache (32KB data & 32KB instruction set), the same 3DNow! instructions, and the same motherboard requirements as the old K6-2.
AMDs goal throughout the process of revitalizing the Socket-7 platform has been to offer a clear upgrade path to all Socket-7 users, without requiring them to purchase new motherboards, as AMD assumes that if youre going to buy a new motherboard you may be lured away from Socket-7 by a tempting Slot-1 board. In the past, this goal has been met to a certain degree, with the K6, you needed to have a board that supported the unique core voltages of the processor, and with the K6-2 you needed to have a Super7 compliant motherboard in order to get the full benefit of your processor. This time around, AMD simply requires that your motherboards BIOS be up to date with full support for the new AMD K6-2 400 (based on the CXT core) in order to take advantage of the K6-3.
What you get
If a higher clock speed was all AMD would provide as an improvement over the K6-2 with the introduction of the K6-3, this review would have come to an abrupt end a few paragraphs ago, however its obviously not. Quietly learning from Intels experimentation with the effects of L2 cache on overall system performance, AMD decided to take a stab at including a set amount of L2 cache on the K6-3 chip itself, and from AnandTechs experience with the unreleased K6-3, it seems as if they put their money on the right bet. Looking at the K6-3 itself, there is one thing youll notice off the bat, the K6-3 is around 1mm thicker than the K6-2whats the reason for that?
When Intel released the Celeron, the lack of any L2 cache dropped the processors business application performance (i.e. Microsoft Office, Lotus Smart Suite, etc) to below Pentium MMX levels. That mistake was critical to the overall failure of the original Celeron processors, although they were generally accepted by the overclocking market, the rest of the world wouldnt accept a processor with no L2 cache. Turning the Celeron name into a success, Intel decided to include a full 128KB of L2 cache on the processor die of the Celeron, which dramatically increased its business application performance, and brought rave reviews from all that touched the new processor, dubbed the Celeron A.
AMDs decision was to include 256KB of L2 cache on the die of their K6-3, while leaving the rest of the design of the K6-2 (with the CXT core modifications) intact, making the K6-3 AMDs Celeron A, with the K6-2 being AMDs Celeron. Thats what makes up the extra 1mm in thickness on the K6-3 chip.
The Importance of Cache
Cache is one of those topics most people just assume is important and move on with their lives, an approach you cant really condemn since, for most of you, there is no pressing reason to understand the immediate functionality of cache in a system. However, if youre making any purchase, you should always be aware of the factors that would make one purchase a better than another.
Cache is nothing more than high speed memory that is located closer to your CPU for faster access to frequently used data. The first place your CPU looks for data is in the cache, and more specifically, the cache located on the CPU itself, referred to as Level 1 or L1 cache. If the data the CPU is looking for isnt present in the L1 cache, or it fails to retrieve it in the current clock cycle, it then looks for it in the secondary cache, if present, otherwise it retrieves it from your system memory. Assuming that there is a secondary cache present (L2 cache), the processor can then retrieve it from a source slower than that of the L1 cache, yet still faster than if it had gone all the way to the system memory to retrieve the data. This process continues with however many levels of cache your system has before the processor has no other option than to retrieve the data from system memory, the slowest option out of them all.
In the ideal situation, all one would need to have an efficient system would be a large amount of cache, where most data would be retrieved from, unfortunately this isnt the case. In the event that the data isnt retrieved from the cache, the data is obtained from the system memory. Lets take two identical computers, both with 64KB of L1 cache, one with 512KB of L2 cache running at 150MHz and the other with 256KB of L2 cache running at 300MHz, twice the speed. Now lets say that we have a number of applications running at the same time, nothing too incredibly strenuous on the processor, just a bunch of your normal office applications. Every time we open up a file, send something to the printer, or modify a document, were executing a number of instructions over and over again, this is where cache shows its true benefits, in accessing frequently used data. If all the data could be retrieved by the processor from the cache in both cases, the system with 512KB of L2 cache running at 150MHz would probably end up being faster simply due to the fact that it has more cache and could probably store more of the repeated instructions over time. However, if only a small percentage of the data was actually retrieved by the processor from the L2 cache, the second system would probably be faster as the data which could be retrieved by the processor would be accessed at a much higher rate since the L2 cache is operating at twice the speed of the first processor.
There is a tradeoff between more cache running at a lower speed, and less cache running at a higher speed, and AMD decided to position themselves at the most strategic point, an almost perfect balance between quantity and performance. While the Pentium II has a full 512KB of L2 cache, it is only running at 50% of the clock speed, and the Celeron A has its L2 cache running at clock speed, however it is only outfitted with 128KB of L2 cache. AMD chose to include a full 256KB of L2 cache at clock speed on the K6-3, something Intel will be doing in January with the release of their Dixon processor.
The problem with the original K6-2 was that the L2 cache was always locked down to the speed of your systems Front Side Bus (FSB) frequency, in most cases, 100MHz, and realistically, at most, 125MHz. With the L2 cache on all K6-2 systems never rising above 125MHz (anything above 125MHz put too much of a strain on peripherals, and would usually crash randomly), AMD was at a disadvantage in that with every clock speed increase, the Pentium II would widen the performance gap between itself and the K6-2 since the Pentium II derives its L2 cache speed from the CPUs speed. This issue has been thoroughly averted with the inclusion of the L2 cache on the die of the K6-3, so for once, AMD has a performance advantage over the Pentium II. When the K6-3 makes its debut, even the Pentium II 450s 225MHz L2 cache wont be able to keep up with the 350MHz - 450MHz L2 cache speeds of the first K6-3s.
Backwards Compatibility
We previously discussed that AMDs goal was to allow for a direct upgrade path for AMD users, so this is a question that is on every K6-2 owners mind: Will the K6-3 work on my motherboard?
First of all, upon installing the K6-3, the L2 cache on your motherboard no longer functions as L2 cache, it is bumped down a notch to Level 3 cache, without any modifications to your motherboard itself. From a performance perspective, the presence of the L3 cache improves performance by around 5% in comparison to a K6-3 system without any L3 cache.
At this years Fall Comdex, AMD told AnandTech that in order for a motherboard to work with the K6-3, the only requirement would be that it supports the K6-2 400, and has the latest BIOS updates installed. After a skeptical acceptation of that statement, AnandTech finally got the opportunity to try the K6-3 out on a Super7 motherboard with support for the K6-2 400. Who wouldve guessed, AMD was right, the K6-3 was detected as a K6-3 running at 450MHz on the FIC PA-2013 AnandTech used in the tests without a single problem.
As long as you have a Super7 compliant motherboard, with BIOS support for the K6-2 400, you now have a guaranteed upgrade path to the K6-3, without spending a penny outside of the cost of the new processor. AMD has stretched out the life of the Socket-7 standard to a level once thought unattainable, it really makes you question whether we needed to make the transition to a slot based architecture back in 1997
3DNow! 6 Months Later
Around 6 months ago, AMD first introduced the K6-2 with their new 3DNow! instructions designed to improve 3D gaming performance. At the introduction, the only question that remained after seeing the 60+ fps in Quake 2 on a K6-2 333 was whether or not support for 3DNow! would really begin to appear in games. Since then, there have been numerous title releases with 3DNow! support built into the engine, and you can expect virtually every game based on the Quake 2, or Unreal engines to ship with some support for 3DNow! regardless of how minute. Unfortunately, whats becoming apparent is that most game developers dont seem to be taking 3DNow! seriously enough, which is why the K6-2 still trails the Pentium II & Celeron A in performance in some games such as Half-Life. You can expect support for 3DNow! to grow even more, however it is doubtful that 3DNow! will gain the support needed for all games to perform like the 3DNow! version of Quake 2 does on a K6-2 system.
Luckily, with the increased clock speeds of the K6-3, the gaming performance gap between AMD and Intel is closing in on itself as youll be able to tell from the gaming performance benchmarks AnandTech ran in Quake 2, which offers an excellent example of what proper 3DNow! implementation can really do on a 3DNow! capable processor, and Unreal, which demonstrates a more realistic implementation of 3DNow! from a performance perspective.
It is more than obvious that Intels MMX instructions have done very little for the hardware world in terms of performance, and as a redemption tactic Intel will be introducing the follow-up to MMX with their next Pentium II processor. The debate surrounding 3DNow! vs Intels new MMX instructions (often referred to unofficially as Katmai New Instructions - KNI) will continue to develop as the release of Intels Katmai grows closer, for now, there is really nothing to be said on the 3DNow! vs Katmai issue other than, wait and see, there is no real way of accurately predicting the effectiveness of KNI or how well 3DNow! will compare. Making a decision now would be purely speculation.
Performance
The Socket-7/Super7 Test System Configuration was as follows:
- AMD K6 233, AMD K6-2 350, AMD K6-3 450 (engineering sample)
- FIC PA-2013 w/ 2MB L2 Cache
- 64MB PC100 SDRAM
- Western Digital Caviar AC35100 - UltraATA
- Matrox Millennium G200 AGP Video Card (8MB)
- Canopus Pure3D-2 Voodoo2 (12MB)
The Pentium II comparison system differed only in terms of the processor and motherboard in which case the following components were used:
- Intel Celeron 300, Intel Celeron 300A, Intel Pentium II 400, Intel Pentium II 450
- ABIT BH6 Pentium II BX Motherboard
The following drivers were common to both test systems:
- MGA G200 Drivers v1677_426
- DirectX 6
The benchmark suite consisted of the following applications:
- Ziff Davis Winstone 98 under Windows 98 & Windows NT4 SP3
- Ziff Davis Winstone 99 under Windows 98 & Windows NT4 SP3
- Ziff Davis Winbench 99 under Windows 98
- Quake 2 v3.17 using demo1.dm2 and Brett "3 Fingers" Jacobs Crusher.dm2 demo
- Unreal using Lothar's FPSTimeDemo test (run 10 times for each test)
All Winstone tests were run at 1024 x 768 x 16 bit color, all gaming performance tests were run at 800 x 600 x 16 bit color. 3DNow! support was enabled when applicable.
For the in-depth gaming performance tests Brett "3 Fingers" Jacobs Crusher.dm2 demo was used to simulate the worst case scenario in terms of Quake 2 performance, the point at which your frame rate will rarely drop any further. In contrast, the demo1.dm2 demo was used to simulate the ideal situation in terms of Quake 2 performance, the average high point for your frame rate in normal play. The range covered by the two benchmarks can be interpreted as the range in which you can expect average frame rates during gameplay.
Windows 98 Performance
Windows 98 has always been the Pentium IIs domain in terms of overall performance simply because its L2 cache performance would increase with every clock increase, unlike the K6-2. With the L2 cache of the K6-3 being on-chip, Intel has been booted from the top of the charts, and replaced by the third generation K6 processor. At 350MHz, the K6-3 gives the Pentium II 450 and Celeron 450A a run for their money, at 400MHz AMD already has the fastest business processor, and at 450MHz, the K6-3 sees no competition at all. Even under Winstone 99, a benchmark that seems to perform better on Intel processors/chipsets, the K6-3 still comes out on top by a fairly large margin. An 8% performance differential exists between the K6-3 450 and a Pentium II 450 under Winstone 99, a difference that is expressed as a 12% gap under Winstone 98. The bottom line? The K6-3, clock for clock, is, without a doubt, faster than the Pentium II in business applications.
Disk Performance
The larger cache of the Pentium II (512KB) does give it the edge over the K6-3 in terms of disk throughput and overall disk performance, however the separation isnt noticeable enough to deem unacceptable from AMDs standpoint. This is nothing more than an illustration of the 512KB L2 running at 50% clock speed vs 256KB L2 running at clock speed comparison made earlier in the article, from a disk perspective that is.
Gaming Performance
Quake 2 is truly the best case scenario when it comes to 3DNow! implementation in a game. AMD spent months working on the driver for Quake 2, and the results of their efforts are outstanding. The K6-3 is a solid gaming performer, even outrunning the Intel processors in the CPU intensive crusher.dm2 test, and leading the pack in both benchmark runs. Turning off the precious 3DNow! compatibility causes those frame rates to drop around 20 fps illustrating the dire need for 3DNow! support in games in order for the K6-2 & K6-3 to survive. Let’s see how the picture changes if we remove the crutches the Voodoo2 card provides for the benchmarks…
Without the 3Dfx card to rely on, the picture doesn’t change at all. The K6-3 is still strong in spite of the lack of any hardware accelerators to take the load off of the processor itself.
But all of this Quake 2 talk is quite idealistic since the implementation of 3DNow! in Quake 2 is very unrealistic compared to most other games, let’s see how things look through the fog in Unreal, one of the most taxing games for an entire system (one of the reasons why it wasn’t a big seller).
The benefits of the K6-3’s L2 cache are seen once again with the performance of the K6-3 under the L2 cache-happy Unreal benchmark.
Windows NT Performance
Under Windows NT, the K6-3 still remains dominant, even in comparison to Intels Pentium II, which is definitely a great accomplishment. Even at 350MHz, the K6-3s L2 & L3 cache performance outshines the Pentium II 400, and at 400MHz, the Pentium II 450 is closely trailing AMD and the K6-3 lead the pack with no competition.
The Final Blow or the Calm before the Storm?
When the K6-3 is officially released, you can expect to see a processor that is highly competitive in price, more overclockable than the engineering sample AnandTech tested, and a real blow to Intels market share.
If you remember back to the release of the original K6, AMD stole the lime light from Intel for a full month before Intel released the Pentium II which restored the balance of power in favor of Intel. Intel has already announced that they will be releasing a few new processors in January of 1999, will one of those processors be a K6-3 killer? Theres no way to say for sure, however Intels upcoming Dixon sounds very familiar to the K6-3 (256KB L2 cache on-chip running at clock speed). Chances are, Intel wont sit around and watch themselves be trampled by AMD, however, from a price to performance ratio standpoint, especially for a K6-2 owner or a Super7 advocate, the K6-3 cant be beat.
Current Slot-1 owners will want to hang on to their investments, since the performance difference between a K6-3 and a Pentium II 233 isnt worth the hassle of migrating to the Super7 platform which still isnt completely solid when it comes to compatibility with next generation graphics accelerators.
Those of you looking to buy a new system may opt for a lower cost K6-3, then upgrade to a slot based Katmai or K7 in the future depending on how that comparison turns out.
There is one thing for sure, the K6-3 is an outstanding chip from AMD, AnandTech commended Intel on a job well done with their Celeron A, and likewise, congratulations to AMD on a job well done. The K6-3 probably wont last you through the year 2000 as the fastest overall processor, but itll definitely put Intel in check, and force prices down even further, which is what competition is always good for, in the end, it benefits us all, the consumers.
With that said, there remains only one question: Is the K6-3 AMDs final blow at Intel, or is it the calm before the storm the K7 will bring upon the microprocessor giants? Keep your eyes open, 1999 is bound to be an interesting year, the release of the Voodoo3, a new chipset from nVidia, Intels Katmaiand maybe even the year AMD pulls ahead of Intel and sets the pace for the microprocessor industrythen again, maybe not. Dont you just love suspense? ;)