Original Link: https://www.anandtech.com/show/168
Introduction
Intel's roadmap has been carefully laid out for us at Microprocessor Forum, detailing every major product to come out of the Intel fab. plants until the year 2003. What about Intel's closest competitor in the PC market, AMD? Among AMD's plans is one very interesting processor, due out in the first half of 1999, the AMD K7. What is the AMD K7 all about? What's special about it? Find out... | |
AMD K7 Features
0.25 micron process
128K L1 cache
Alpha EV6 compatible, 200mhz front side bus
500+mhz initial speed
3DNow
Fully pipelined FPU
9 issue superscalar
Multiprocessor capable
H1 '99
AMD K7 Features Summarized
Looking at the specifications of the AMD K7, you can probably tell that it is going to pack quite a punch when it makes its way out of the fab. plants and into our systems. The AMD K7 will be made on a 0.25 micron process (0.25 micron = the size of the connections on the silicon wafer) initially, the current state of the art. This process should allow the K7 to run cool enough to reach speeds of 500mhz, but once the K7 reaches higher speeds, such as 700mhz, the 0.25 micron process might not be efficient enough to warrant stability. The 128K L1 cache is one of the more intriguing features of the AMD K7. Current state of the art processors have 64K L1 (The K6(-2) and 6x86MX are examples). L1 cache is where the most frequently used data is stored on the chip. L1 cache has extremely fast access times and is the most efficient place to retrieve data from. (More on cache in bandwidth discussion) Basically, more L1 cache speeds up software significantly, especially highly repetitive software, such as Microsoft Word and spread sheet software. (these programs do the same thing over, and over, and over). AMD, however, doesn't think 128K L1 cache is enough. They plan on using Digital's Alpha EV6 bus to transfer data from RAM to the CPU. The EV6 bus allows for RAM to CPU transfers of up to 200+mhz! (FrontSideBus) yielding a bandwidth of 2.6GB (Gigabytes) / sec.
Another cool feature of the K7 is the fully pipelined FPU, which, according to AMD is supposed to peak at two instructions per clock. If this is true, the K7 will significantly outperform Intel equivalents in FPU intensive applications. 3DNow will of course be implemented, to add to the gaming horse power of the AMD K7.
The K7 will fit in a Slot A, (A Slot similar to Slot 1, but using a different protocol to transfer data (EV6) and house the L2 cache in the cartridge like the Pentium II.
The core is a seventh generation core with advanced scheduling, branch prediction, pipelining and more...
3DNow!
Don't forget the '!', 3DNow! is AMD's 3D enhancing instruction set. What exactly does 3DNow! do? Basically, 3DNow is FPU MMX. 3DNow provides SIMD (Single Instruction, Multiple Data) replacements for common FPU instructions, such as adding, multiplying, subtracting, loading, etc. 3DNow also provides various other functions used heavily in 3D applications, such as a 3 cycle division (albeit only 14bits accurate), a very fast inverse square root function, and more.
What exactly is SIMD, and what is it helpful for? SIMD, stands for Single Instruction, Multiple Data as mentioned above. This means that a SIMD instruction can operate on multiple data items at the same time. For example, a SIMD addition can take 4 pairs of values (32bit, (single precision) floating point values, in the case of 3DNow!) and add each of the pairs up AT THE SAME TIME. What does this have to do with 3D?, you may ask. Well, it so happens that the matrix multiplication used in 3D games (to perform transformations) involve multiplying a 4x4 transformation matrix (which performs the transformation) by a 4x1 vertex matrix, vector, for the math people out there ;).
The following example will show how powerful SIMD is.
K7's FPU
When it comes to FPU power, it looks like Intel won't be alone in the x86 market. The K7 features 3 floating point units (Load, Add, Multiply) each fully pipelined and superscalar. The latencies on the Add and Multiply are both 4 cycles, higher add latency than the PII (PII has 3 cycle), but lower multiply (PII has 5 cycle). The pipelined FPU should help give AMD a high throughput rate in intense FPU applications, especially when combined with the already advanced K7 core. (Out-of-order execution especially helps maintain high throughput (it tries to keep the pipelines full by executing instructions "out of order") Will the K7's FPU be faster than Intel's? It's hard to tell right now, but find out what I think later in the article.
L1 cache
One of the K7s major features is the 128K L1 cache, four times as much as the Pentium II has, and twice as much L1 as AMDs K6(-2). What's the big deal about having a lot of L1 cache anyway? The L1 cache holds the most commonly used data in the running application. L1 cache is the first place the CPU looks for information. More L1 cache means that more information can be stored "closer" to the CPU, which in turn, translates to reduced latencies between data retrieval, which means faster performance. Furthermore, many business applications constantly manipulate the same data. These applications can benefit greatly from an increased L1 cache size because there is a better chance that the CPU will have the application data in the L1, increasing performance. The operations performed by most business applications are so trivial that I/O, both internal and external (Hard Drive, for example) is a significant limiting factor in performance.
What kind off performance increases should we see with 128K L1 cache? It's difficult to tell exactly, and even after the K7 is released it will be difficult to isolate the L1 cache and analyze the performance increase. The next page will discuss probable performance increases due to doubling the L1 cache size.
L2 cache
The AMD K7 will feature an "on card" L2 cache as found in the Pentium II. Yes, that means the K7 will be inside a cartridge, like the PII. The speed of the L2 cache is configurable-- it can range anywhere between 1/3 clock speed all the way to full speed. The size of the L2 cache is currently limited to 512KB; however, AMD will release revisions of the K7 capable of addressing up to 8 MEGABYTES of L2 cache.
We all know from the Pentium II that on-card cache running at even half clock speed is much faster than slow cache on motherboards running at bus speed. Increasing cache speed is an easy way to get a significant performance increase. For AMD, it is an easy upgrade well worth it.
The K7, Intel killer, or does AMD have a little more work to do?
The K7 looks like a damn fast CPU on paper. Since I did not get to personally see the K7 in action, nor do I have any benchmarks to base my opinions on, It is going to be hard for me to tell you how well I think the K7 will perform.
128K L1 cache...so what?!
The 128K L1 cache looks good on paper, but will we really see the benefits of it in the real world? When it comes to business applications like Microsoft Word, the answer is a definite yes; however, the Quake3 and the next 3D Studio may not really see much of a benefit from the L1 cache. Granted there will be some performance increase; however, don't count on the doubled amount of L1 cache to increase your Q3 framerates by 20%, or even 10%, for that matter.
On card L2 cache
Since the K7 is also geared towards servers, a large, fast L2 cache is necessary. Large, fast L2 cache benefits multiple processor systems greatly because they prevent the CPUs from accessing the main system RAM as much. If you have 2, or better yet 4, CPUs accessing the system RAM at the same time, there is going to be some serious slow downs, unacceptable slow downs in a server environment. For this reason, large, fast L2s are necessary, to reduce the strain on RAM. The faster L2 cache won't benefit games and number crunching applications as much as the L2 helps out multiprocessor servers, but don't expect it to sit there idly. The faster L2 cache will help increase overall system performance.
Superscalar, 9 issue, out-of-order [insert technical jargon here] ...
While the architecture improvements over previous generations are plenty, the K7's core is evolutionary, not revolutionary. Don't expect 2x PII performance because of the improved core, it simply won't happen. (If it does, I'll be the first one with a K7) I think the K7 will run neck to neck with Intel's latest chip out at the time. Mhz for Mhz, I'm betting on the K7 to outperform the Intel counterpart.
200mhz Bus
I neglected to talk much about this because I couldn't find enough information on EV6. The 200mhz bus will obviously help performance. Whether or not by a significant amount is a different issue. Remember the 66mhz vs 100mhz bus. In the case of the Pentium II, the performance increase was negligible, perhaps nonexistent. What's to stop the K7 from exhibiting the same behavior? The L2 cache runs independent of the bus speed, like the Pentium II, so all that is really benefiting from 200mhz bus is the system RAM. Maybe AGP 4x will put the 200mhz bus to good use. But other than that, I don't see the 200mhz bus providing any really huge performance increase.
Slot A
I think that putting the K7 on Slot A was a bad idea. Even though Slot A is faster than Slot 1 (well, not the actual slot, but the protocol, etc.) , Slot A is implying competition with a CPU you DON'T want to compete with, Digitals 21264. The K7 is not the 21264's friend just because the run on the same motherboard. (The Pentium MMX and the K6 were never friends) The K7 may end up becoming the low-end Alpha alternative rather than the high-end Pentium II alternative. If this happens, AMD won't be too happy. I think that AMD should have chosen the safe route and released the K7 for Slot 1. This way the K7 would imply a more direct competition with the Pentium II plus offer the end-user a brighter upgrade path. (Slot 1 isn't dying anytime soon, while, Slot A, on the other hand, gives you two choices, "cheap" K7, or $2500+ 21264)
Gaming
Will it outperform Katmai (or the latest Intel chip out at the time)? I doubt it. KNI is superior to 3DNow, and I don't think AMDs newly designed pipelined FPU will make up for Katmai's advantage in the KNI area. (Since it isn't really used in SIMD optimized applications) Further, Intel hasn't really said much about Katmai, so, either they have nothing to say, or they have a lot of goodies they are keeping secret, the latter being the more probable, in my opinion. Only time will tell...
Business
K7 all the way. 128K L1 will play a big role in the K7's surpassing of the Pentium II in business applications.