Original Link: https://www.anandtech.com/show/211
The Shot Heard Around the World
Approximately two months before AMD's official announcement of the K6-3 processor, the cat was let out of the bag and the upcoming processor's performance was revealed here on AnandTech. The benchmarks the K6-3 produced at this early of a stage in its production managed to topple any lead Intel currently had over the rest of the market, however what good is an AMD performance lead now when the processor is months away from being released? Will Intel have something prepared in response to the K6-3? How well does the K6-3 perform without the benefit of a large L3 cache? What motherboards can be recommended for use with the K6-3 next year? One thing is for sure, although the K6-3 may not hold the glorious lead it initially illustrated over the competition in AnandTech's initial coverage of the chip, it is still the only true performance upgrade path for a Super7 user not looking to purchase a new motherboard.
100MHz FSB: A Temporary Solution
One of the biggest advantages the Pentium II offered over the competitor was its high speed L2 cache. By deriving the speed of the L2 cache on the Pentium II's processor card from the clock speed of the processor itself, Intel made sure that the Pentium II's performance increase to clock speed increase didn't experience any sort of diminishing returns. On the other side of things, the Socket-7 market was quickly dying due to the inability of the 66MHz Front Side Bus (FSB), the speed which the L2 cache also ran at, to compete in the much more aggressive world introduced by the Pentium II.
The temporary solution was the advent of the Super7 extension of the Socket-7 platform, pushed solely by AMD, the Super7 design brought two critical wins to the socket-7 community: 1) AGP support, something which became more of an accepted standard with the Super7 platform although it was available prior to the release; and 2) the 100MHz FSB, a 50% increase in the frequency of the L2 cache on socket-7 systems, and a minimum of a 10% boost in overall system performance due to the increased L2 cache performance. The latter was an incredible performance winner for socket-7 advocates at the time, since even the Pentium II 333's 166MHz L2 cache was threatened by the presence of a killer AMD + 100MHz FSB combo. Unfortunately, for those users that were promised a true upgrade path for their older socket-7 motherboards, the first chip to officially support the 100MHz FSB, the K6-2, required that setting in order to achieve the level of performance AMD had promised. Those with older socket-7 motherboards were left to either run their K6-2's at the standard 66MHz FSB, or forced to purchase new Super7 motherboards to receive the benefits they were promised.
As both the K6-2 and the Pentium II rose in clock speed, the performance gap between the two processors began to grow to a much more noticeable separation, simply because the K6-2's L2 cache (located off chip, on the motherboard) was locked at that 100MHz FSB frequency (in some cases 112MHz depending on whether or not you overclocked the chip), and the Pentium II's L2 cache had already broken the 200MHz barrier with the Pentium II 450. Although the 100MHz FSB was a solution to the competitive performance problem AMD faced in the early part of 1998, in 1999, that solution simply won't cut it.
As mentioned in the original K6-3 Review, the key to the K6-3's performance is it's on-chip L2 cache running at clock speed ala the Intel Celeron A, this quickly avoids the temporary solution the 100MHz FSB provided by allowing the speed of the L2 cache to rise directly with the speed of the processor, removing any bottlenecks the L2 cache performance would be able to offer for a K6-3 system. At the same time, by removing the performance dependency of the L2 cache on the system's FSB frequency, AMD also managed to remove another problem older socket-7 users faced, the need for the 100MHz FSB.
100MHz FSB: No Longer Necessary?
We've had proof for months now that the Business Application performance of our current generation of processors is greatly dependent on the performance of the cache subsystems, from the example the Pentium II set back in 1997, to the performance of the K6-2 in 1998, without a high speed L2 cache, a processor cannot be competitive in the business world. The temporary solution to that was, of course, the 100MHz FSB, a requirement which forced older socket-7 owners to upgrade their motherboards to new Super7 boards in order to gain the performance benefits they desired. Now, with the K6-3's on-chip L2 cache, the only beneficiary of the 100MHz FSB is the L3 cache located on the motherboard, an issue which will be discussed a little later in this article.
The K6-3 will ship with support for clock multipliers ranging from 2.5x to 6.0x, in 0.5x increments, meaning that an owner of a motherboard with support for the 2.2v core voltage required by the K6-3 would be able to run a K6-3 400 at 66MHz x 6.0 without losing too much performance in comparison to an identical system running at 100MHz x 4.0. The Pentium II is a living, breathing example of this theory, in that the performance difference between a system running at a100MHz FSB and one running at a 66MHz FSB is next to nothing, with the greater performer being the chip with the faster L2 cache. Although the ideal high performance solution for anyone interested in making an upgrade to a K6-3 would be coupling the processor with a Super7 motherboard, all of those who invested in Socket-7 motherboards with support for the 2.2v core voltage yet without full Super7 compliance (i.e. no 100MHz FSB support) wont be out of luck with the K6-3. This opens up a whole new world for the low-cost socket-7 upgrade solution, unfortunately we are still bound by the 2.2v core voltage specification of the K6-3 which does considerably limit the eligibility of many older motherboards for the K6-3 upgrade.
The Socket-7/Super7 Test System Configuration was as follows:
- AMD K6 233, AMD K6-2 300, AMD K6-2 400, AMD K6-3 450 (engineering sample)
- FIC PA-2013 w/ 2MB L2 Cache
The Pentium II comparison system differed only in terms of the processor and motherboard in which case the following components were used:
- Intel Celeron 300, Intel Celeron 300A, Intel Pentium II 400, Intel Pentium II 450
- ABIT BH6 Pentium II BX Motherboard
Running Winstone 99 under Windows 98, the Business Application performance difference between a K6-3 running at 100MHz x 4.0 and 66MHz x 6.0 is less than 2%, and definitely insignificant. If you switch benchmark suites, and use Winstone 98 which is centered much less upon multitasking performance, there is absolutely no distinguishable performance difference between the two setups.
What about in games? Using the nVidia Riva TNT chipset, a very L2 cache dependent video chipset, the performance difference between the K6-3 400/100 and the 400/66 systems is negligible as is shown by the following fps performance comparison charts.
With the performance difference never peaking above 8%, in favor of the K6-3 running at 100MHz x 4.0, even in games, the 100MHz FSB is no longer a necessity for socket-7 users to enjoy the performance they have been denied for quite some time.
Windows NT is barely any different that its younger brother in terms of its performance reaction to the 66MHz FSB instead of the 100MHz FSB with the K6-3. Windows NTs heavy reliance on a high speed L2 cache masks the 50% decrease in FSB frequency, leaving the performance difference between a K6-3 running at 100MHz x 4.0 and a K6-3 running at 66MHz x 6.0 barely above 2%.
With that settled, lets take a look at the role L3 cache plays in the overall performance of the K6-3.
The Benefit of L3 Cache
The K6-3s low latency L1 and L2 caches give it the immediate advantage over its predecessors, however in the event that data being retrieved cannot be acquired from neither on-chip cache, the presence of the motherboards L3 cache becomes critical.
As originally discussed in the K6-3 review, the cache on a motherboard equipped with a K6-3 processor immediately becomes the systems L3 cache. Those of you with motherboards with 512KB, 1MB or 2MB of L2 cache on-board will soon have the ability to take advantage of 512KB, 1MB or 2MB of L3 cache if you make the upgrade to the K6-3.
From the initial discussion surrounding the importance of cache, we concluded that the less often a CPU has to return to the system memory for data retrieval, the faster the systems overall performance will be. For example, lets assume that the K6-3 is attempting to retrieve a segment of data that it could not retrieve from its on-chip L1 or L2 cache. On a system with no L3 cache (no cache on the mainboard), the processor would have to go directly to the slow system memory to retrieve the data. On a system with on-board L3 cache, the processor could try the much faster L3 cache first for the data before having to resort to retrieving it directly from the system memory, improving performance considerably for that single data retrieval operation.
The above Winstone 98-performance comparison illustrates the benefit a larger L3 cache has on overall system performance when running business applications. In comparison to a K6-3 system with no L3 cache, a K6-3 system outfitted with 2MB of L3 cache offers a 9.5% performance improvement. When multitasking, the benefits of a larger L3 cache can be seen as well as illustrated by the below Winstone 99 comparison. A 2MB L3 cache provides for an 11% increase in performance over a 0MB L3 cache, a definitely noticeable performance differential.
3D gaming performance remains virtually unaffected by the presence (or lack thereof) of any L3 cache in a system with the K6-3, as both the Quake 2 and Half-Life frame rate numbers failed to change according to how much L3 cache was present in the test system. This can be expected as most 3D games rely almost entirely on raw FPU calculations, and very rarely exhibit the need to be able access frequently used data, in a game, there is very little frequently used data, just a bunch of mathematical calculations being processed over and over again.
Performance under Windows NT is affected by the presence of L3 cache much more than under Windows 98. The Winstone 98 scores show a 12% improvement in performance when going from no L3 cache to 2MB, which intermediate steps in between depending on the size of your motherboards cache. The same 12% improvement is present under the multitasking tests of Winstone 99, so if youre a die-hard NT user, then the more L3 cache you have, the happier youll be.
3DNow! vs Raw FPU Power
The key weakness of all non-Intel processors has seemed to be the FPU performance they brought with them. From the days of the horrendously slow FPU of the Cyrix 6x86, to the present day with the K6-3 still lagging behind the Pentium II in FPU performance, how much does 3DNow! affect gaming performance? AnandTechs tests illustrated a 4% increase in performance when using 3DNow! accelerated Quake 2 OpenGL drivers as opposed to using the standard OpenGL drivers under Quake 2 with the K6-3 test systems Riva TNT video card.
The K6-3s 3DNow! instruction set provides for a 30% increase in performance in software rendering under Quake 2, comparatively speaking, making the K6-3 a slower gaming competitor to Intels Pentium II without 3DNow! support.
Gaming Performance Comparison
Using the Riva TNT as a benchmark video card this time around, lets see how the K6-3 compares to the competition under Quake 2 and Half-Life:
The K6-3 is still placed far enough from the Celeron A and the Pentium II to be considered a weak gaming performer with the Riva TNT. The TNT performance has been improved greatly over the K6-2 400, simply due to the increased L2 cache speed, which the TNT benefits greatly from as you can see by the performance difference between the cacheless Celeron 300 and the Celeron 300A equipped with 128KB of L2 cache running at clock speed. The K6-3 400 scores a smooth 13 fps more than the K6-2 400 for no reason other than its L2 cache, as the K6-3 core is identical to the K6-2 400 CXT's core in every way, including FPU performance.
Half-Life shows a fairly similar picture, however the game's dependency on L2 cache performance closes the performance gap between the K6-3 and the Pentium II/Celeron A, making the benchmarks much more competitive than before with the Quake 2 comparison. You can expect most games to follow the trend Half-Life has set with the K6-3 & TNT combo benchmarked here.
Windows 98 Performance Comparison
Conclusion
That's how things stack up in the end. Without its precious L2 cache, the K6-3 is nothing more than a K6-2, and without its newly found L3 cache, the K6-3 is quickly able to lose the grip it had on the top of the performance charts. The best bet for any user looking for a K6-3 upgrade would be a 1MB or 2MB Super7 motherboard, however as the test results show, older motherboards with 512KB of cache and those that operate only at the 66MHz FSB will be fine with the K6-3 as an upgrade path for the future.
Chances are that the K6-3 won't be released as the world's fastest x86 processor, as Intel does have quite a rollout of new processors due early in January of 1999, however as a true upgrade path for super7, and now older socket-7 motherboard owners, the K6-3 still can't be beat.