Original Link: https://www.anandtech.com/show/171
Introduction
Digital brought us the first 64bit processor around 1992, the 21064. Sun Microsystems followed, and IBM came soon thereafter. The largest CPU manufacturer in the world (Intel); however, is waiting until early 2000 to introduce its first 64bit processor, Merced. This 64bit processor is not only new to Intel, but new to the world. Based on Intel's "EPIC" architecture (actually very similar to VLIW = Very Long Instruction Word = a technique which originated in the early 1980s) , Merced, is going to be the first true processor of its kind. What exactly is EPIC? How does it help? Find out... | |
EPIC Features
Predication
Speculation
Explicit Parallelism
128 integer registers
128 floating point registers
64 predication registers
Processors
Merced
20x FPU performance of Pentium Pro
600+mhz, 0.18 micron process
first EPIC processor
Release date mid-2000
Mckinely
2x as fast as Merced
1Ghz+
0.13 micron, copper technology
2001+
Madison
Next generation Mckinely core
0.13/ copper
1Ghz+
2002
Deerfield
Desktop IA64
2003
EPIC Features Summarized
Before I go in-depth about certain topics (Predication, Speculation, Registers, etc.) I am going to give you a basic idea about EPIC and what it is. The basic idea behind EPIC is parallelism. Current processors must analyze code on the fly to determine the best execution path. An EPIC processor leave the compiler to do the dirty work of arranging the code to benefit from parallelism. This is known as EXPLICITLY PARALLELISM; the code is explicitly arranged to take advantage of parallelism. Since the EPIC processor is based on the idea of explicit parallelism; this processor must be capable of processing lots of data in parallel. EPIC processors have multiple instruction pipelines, generally many registers, wide data paths, and other special features such as Predication and Speculation to aid them in keeping the code highly "pipelineable" and to avoid stalls at all costs. One of the main advantages of EPIC is the efficiency of it. Although efficiency depends on compilers, overall an EPIC processor can use more of it's processing power to process meaningful operations, rather than waiting for instruction fetches, flushing pipelines, etc. Another major advantage of EPIC is that predication, speculation, and explicit parallelism reduce branch mispredictions significantly, because most of the code is organized prior to execution to eliminate mis-predictions (The EPIC processor does not have to guess what to execute simultaneously, etc. the compiled code TELLS it what to do.) (Also, predication and speculation soften the penalty of mis-predicting branches, more on this later)
Predication is an improvement to prediction in that it reduces waiting. For example, lets say at a diner you can have either Steak or Fish, and it takes 5 minutes to prepare (red and full of germs but that's besides the point...) each. It also takes 5 minutes to tell the waiter to ring it up and pay etc. etc. The waiter may try to reduce the time it takes for you to order by predicting that you usually order raw Steak and not raw Fish, so he would ring up the steak and begin preparing it. The problem is, what happens when the waiter is wrong. If the waiter is wrong, the waiter has to stop cooking the Steak, ring up a different price, etc. This is very costly in terms of time. Prediction works the same way. Sometimes it's good, other times it's not so good. (it's mostly good, around 90-95% of the time) Now, a Predication waiter would, instead of guessing whether or not you want steak or fish, will prepare both steak and fish simultaneously and then give you the one you want. Which one is more efficient? Obviously Predication; since the architecture is very efficient. Since not all instructions can be predicated; the EPIC processor must decide which instructions can be executed in a branch (actually, the compiler TELLS the processor) by setting the predicate register (there are 64 of them) to true. Notice that in our waiter example above, Predication eliminated a choice (i.e. reduced the number of branches in the code) by executing everything simultaneously.
Example of predication in action
A big problem with processors nowadays is memory latencies. Increasing cache sizes can only go so far, and CPU developers must constantly worry about whether or not RAM designers is going to come up with faster RAM, and also about how to deal with increasing latencies. (RAM "runs" much slower than a processor) EPIC architecture has an good solution for dealing with the latency problem. EPIC processors are capable of scheduling a load instruction even BEFORE a branch is entered. This process is called speculation. The problem here is, what happens if there is an invalid value, (e.x. a value not yet defined). Normally, this would generate an exception, and the program would crash. In EPIC systems; however, the processor remembers that it did a speculative load which may not be defined and continues processing. When it needs the data, the processor checks back to verify the data. If the data is valid, then the CPU can use it, saving a lot of time. (Since the processor had a while to do the load, since it was initiated before the branch)
Speculation and Predication in Action
Taken from Next Generation Instruction Site Architecture (from Intel & HP), Intel and HP use a portion of the 8 Queens problem (position 8 queens on a chess board so that none of them are attacking each other.) to demonstrate the power of Speculation and Predication. To view the entire slide show of this portion. (Quite informative) Click here.
Unlike other CISC or RISC processors which generally have 32 or less registers; Merced and other EPIC processors will have 128 all-purpose registers, 128 floating point registers, and 64 predication 1bit registers. This high amount of registers will give programmers and compilers the flexibility they need with EPIC architecture. Since EPIC does a lot of operations in parallel, it needs many registers to store information in. (Since lots of data is being manipulated simultaneously.) For this reason EPIC processors provide 256+ total registers. This should be more than enough for programs to make full use of the EPIC processor and to most efficiently expose parallelism.
Hardware translation
Merced, Intel's first IA64 processor will provide full IA32 and PA-RISC (HPs PA-RISC series RISC processors instruction set) by means of hardware translation. Intel was debating on whether or not to add an entire separate IA32 core; along with a PA-RISC core, but decided against it because it would be inefficient use of die-size, since many features are already included. Hardware translation takes IA32 code and runs it on an IA64 processor "on-the-fly". The advantage of Hardware translation over a new IA32 core is that the processor can take advantage of 64bit data paths, among other improvements already included in the core. This approach is the most elegant according to Intel, and the route they plan on taking. Merced is expected to perform on par, or faster than the fastest IA32 processor out at the time.
Problems
The first major concern regarding EPIC is that it relies heavily on compilers to perform optimally. While this may not seem like a problem, and really isn't if you look at it from a certain angle; however, not all compilers are created equal. A slow compiler will result in a significant performance loss (as compared to the loss for a RISC or CISC chip). Compiler developers will play an important role in the performance of Merced and other EPIC processors. Good compilers will warrant very good performance, bad compilers will yield very poor performance.
Debugging
Debugging an EPIC application may be somewhat strange due to all the pre-processing and parallelism. Speculation will especially cause some interesting complications during debugging. Good debugging software will need to be written to help developers debug and optimize their code.