Original Link: https://www.anandtech.com/show/1113
There are many recent technologies that have signalled a shift in the way
data is sent within a desktop computer in order to increase speed and
efficiency. Universal Serial Bus (USB), Serial ATA, and RDRAM, are all examples
of moving away from a parallel architecture to a high-speed serial format,
designed to ensure maximum bandwidth and provide future scalability.
The PCI (Peripheral Component Interconnect) Bus has been widely used as a
general purpose I/O interconnect standard over the last ten years, but is really
beginning to hit the limits of its capabilities. Extensions to the PCI
standards, such as 64-bit slots and clock speeds of 66MHz or 100MHz, are too costly, and
just cannot meet the rapidly increasing bandwidth demands in PCs over the next
few years.
3rd Generation IO, or 3GIO, has been recently renamed PCI Express, and looks to be the replacement for the ubiquitous PCI bus, the most successful peripheral interconnect bus used in PCs. With support coming in the Intel Grantsdale chipset, along with Microsoft's next version of Windows, codenamed Longhorn, let's take a look at the technology that is designed to last the computer industry for the next ten years.
Intel proposed the original PCI 1.0 specification back in 1991. The PCI Special
Interest Group (which took over development of PCI), produced revision 2.0 in
May 1993.
Its rival at the time was the VESA Local Bus (VL-bus or VLB). Introduced by the
Video Electronics Standards Association, VL-bus was a 32-bit bus that involved a
third and forth connector appended to the end of a regular ISA slot. It ran at a
nominal speed of 33MHz and offered significant performance over ISA.
One of the main features that provided such great performance was, ironically,
one of the main factors in VLB's downfall. It was essentially a direct extension
of the 486 processor/memory bus, running at the same speed as the processor,
hence the name "local bus". This direct extension meant that connecting too many
devices risked interfering with the processor itself, particularly if the
signals went through a slot. VESA recommended that only two slots be used at
clock frequencies up to 33MHz, or three if they were electrically buffered from
the bus. At higher frequencies no more than two devices should be connected, and
at 50MHz or above they should both be built into the motherboard.
Because the VL-bus ran synchronously with the processor, increasing processor
frequencies caused real problems for VL-bus peripherals. The faster the
peripherals are required to run, the more expensive they are, due to the
difficulties associated with manufacturing high-speed components. Very few VL-bus
components were built to handle speeds in excess of 40MHz.
PCI had some compelling advantages over VL-bus. It was designed as a mezzanine bus: PCI
was a separate bus isolated from CPU, but still had access to main memory.
It had the ability to run asynchronously from the processor, with the nominal
speeds of 25MHz, 30MHz and 33MHz. As processor speeds increased, the PCI bus
speed could remain constant, as it ran at an adjustable fraction of the front
side bus. The maximum number of slots and/or peripherals allowed by PCI, 5 or
more, doubled what the VL-bus could handle, without any restrictions set by bus
speed, buffering or other electrical considerations.
Other "smart" features promoted ease of use. Plug and Play allowed automatic
configuration of peripherals without the need to set IRQ jumpers, DMA and IO
addresses. It allowed IRQs to be shared, as well as having its own interrupt
system (hidden away as #A, #B, #C and #D).
Finally, PCI bus mastering allows devices on the PCI bus to take control of the
bus and perform transfers directly without CPU arbitration. This lowers latency
and processor usage.
Its introduction alongside the Pentium processor, along with its clear benefits
over rival buses at the time, helped PCI emerge from the bus wars as the
dominant standard in 1994. Since then, just about all peripheral devices, from
hard disk controllers, sound cards, to NICs and video cards, have been PCI
based.
With the advent of RAID arrays, Gigabit Ethernet and other high bandwidth
devices on consumer class systems, PCI's 133MB/s available bandwidth is clearly insufficient to handle these demands.
Chipset makers have foreseen this limitation and have made various changes to
motherboard chipsets in order to alleviate some of the load from the PCI bus.
Up until 1997, graphics data was probably the single largest cause of traffic on
the PCI bus. The Accelerated Graphics Port (AGP), introduced by Intel's 440LX
chipset, had two main purposes: to increase graphics performance and to pull the
graphics data off the PCI bus. With graphics data transfers taking place on
another "bus" (technically, AGP is not a bus, since it only supports one
device), the previously saturated PCI bus was freed up for use with other
devices.
Yet AGP was just one step in reducing the load on the PCI bus. The next was to
redesign the link between the North Bridge and South Bridge of motherboard
chipsets. Older chipsets, such as the Intel 440 series used a single PCI bus to
connect the North Bridge to the South Bridge. The PCI bus not only had to cope
with inter-bridge traffic, but it also had to carry regular PCI traffic, IDE,
Super I/O (Serial, Parallel, PS/2), and USB. To alleviate the situation Intel,
VIA and SiS replaced the PCI bus between the North and South Bridges with a High
Speed interconnect, and then shifted IDE, Super I/O and USB to their own
dedicated links to the South Bridge.
Now with Intel's Communications Streaming Architecture bus built into the Memory
Controller Hub of the i875/i865 chipsets, even Gigabit Ethernet is off the PCI bus.
Numerous dedicated interconnects for various devices in the i875 chipset: not really a cost effective solution
While AGP, CSA, Intel's Accelerated Hub Architecture Hub Link, VIA's V-Link and SiS' MuTIOL have been relatively successful in reducing the PCI bus load, those are just stop-gap solutions.
PCI Express, previously known as 3rd Generation I/O (3GIO), is all set to
replace PCI and take general IO connectivity into the next decade.
PCI Express seeks to fulfil a number of requirements.
It is designed to support multiple market segments and emerging applications, as
a unifying I/O architecture for Desktop, Mobile, Server, Communications,
Workstations and Embedded Devices. It is not just for the desktop, like the
original PCI specification was designed to be.
With regards to cost in both high and low volumes, the target is to come in at
or below PCI cost structure at the system level. A serial bus requires fewer traces on PCBs,
easing board design and increasing efficiency by allowing more space for other components.
It has a PCI Compatible software model, where existing Operating Systems should
be able to boot without any changes. In addition, configuration and device
drivers for PCI Express are to be compatible with existing PCI.
Performance scalability is achieved through increasing frequency and adding
"lanes" to the bus. It is designed for high bandwidth per pin with low overheads
and low latency. Multiple virtual channels per physical link are supported.
As a point-to-point connection, it allows each device to have a dedicated
connection without bus sharing.
Other advanced features include
- ability to comprehend different data structures,
- low power consumption and power management features
- quality of service policies
- hot swappability and hot pluggability for devices
- data integrity and error handling end-to-end and at the link level
- isochronous data transfer support
- host-based transfers through host bridge chips and peer-to-peer transfers
through switches
- packetized and layered protocol architecture
At the high level, the PCI Express System is comprised of a root complex, which
would be placed either in the chipset's North Bridge or South Bridge, switches,
and finally end-point devices. The new item in the PCI Express topology is the
switch. It replaces the multi-drop bus and is used to provide fan-out for the
I/O bus. The switch provides peer-to-peer communication between different
end-point devices and does not require traffic to be forwarded to the host
bridge if it does not involve cache-coherent memory transfers.
System Topology using the new Switch
The following diagrams show possible PCI Express implementations across an entire range of platforms: Desktops and Mobiles, Servers and Workstations, and Networking Communications Systems.
PCI Express based desktop and mobile system
Server and Workstation system
Networking Communications system
The PCI Express Architecture is specified in layers, which helps ease cross-platform design.
At the very bottom is the physical layer. The most basic PCI Express link
consists of two low voltage differential signals: transmit and receive. A data
clock is embedded using the 8/10b encoding scheme to achieve very high data
rates. The initial frequency is 2.5Gb/s in each direction, with speeds expected
to increase with advances in silicon technology up to possibly around 10Gb/s in
each direction.
Transmit and receive signal pairs
One of the most exciting features for all the speed freaks out there is PCI Express's ability to scale speeds by aggregating links to form multiple lanes. The physical layer supports X1, X2, X4, X8, X12, X16 and X32 lane widths. Transmission over multiple lanes is transparent to other layers.
The data link layer ensures reliability and data integrity for every packet sent across a PCI Express link. Along with a sequence number and CRC, a credit-based, flow control protocol guarantees that packets are transmitted when a buffer is available to receive the packet at the other end. Packet retries are eliminated resulting in a more efficient use of bus bandwidth. Any corrupted packets are automatically retried.
The transaction layer creates request packets from the software layer to the link layer, implemented as split transactions. Each packet is uniquely identified, supporting 32-bit memory addressing as well as extended 64-bit addressing. Additional attributes including "no-snoop", "relaxed ordering" and priority are used for routing and quality of service.
Furthermore, the transaction layer comprises of four address spaces: memory, I/O, configuration (these three are already in the PCI specification) and the new Message Space. This fourth address space is used to replace prior side-band signals in the PCI 2.2 specification and does away with all the "special cycles" in the old format. These include interrupts, power management requests and resets.
Finally, the software layer is touted as of utmost importance as the key to maintaining software compatibility. The initialisation and runtimes are unchanged from PCI with the purpose of allowing operating systems to boot with PCI Express without modification. Devices are enumerated such that the operating system can discover the devices and allocate resources as necessary while the runtime again reuses the PCI load-store, shared-memory model. Whether or not modification is really required remains to be seen as "PCI Express support" is counted as one of the features of Microsoft's next Operating System codenamed Longhorn; a tacit implication that previous operating systems may not support PCI Express.
Initial implementations are designed to co-exist with legacy PCI connectors. As
you can see from the diagram below, a 1X connector sits neatly behind the PCI
slot at the back of the motherboard, allowing either a regular PCI card or a
PCI-Express card to be used.
Other innovations include separating the main "box" from the human interface, and "device-bay" units which allow hot-swapping of cards and other PCI-Express Peripherals.
PCI Express slot on left, hot swappable PCI Express device bays on the right
Even mobile users won't be left out, with the new PCMCIA standard codenamed NEWCARD. The NEWCARD features a form-factor that neatly fits two NEWCARDS side by side in the space of a single CardBus card. Unfortunately, it is not designed to handle graphics, so the possibilities of video upgrades on a laptop are still virtually non-existent. On the bright side, future expansion capabilities range from wireless communications, ultra wideband TV tuners, security card readers to optical compression/encryption and smart clocking.
Single-wide and Double-wide NEWCARDs: a Double-wide is the same width as the old PCMCIA standard
With over 200 Megabytes per second in each direction for an X1 lane, PCI-Express
claims to be a very cost effective solution for bandwidth per pin.
Intel's Grantsdale chipset provides an X16 link for graphics, some 4 Gigabytes per
second in each direction (8GB/s concurrent bandwidth) dedicated to graphics, over double the bandwidth
offered by AGP 8X. Hopefully, this additional capacity would be able to accomodate graphics demands for the next couple of years.
X16 and X1 PCI Express Slots
PCI Express Slots on the BigWater reference form factor shown at IDF 2002
Will PCI Express start a new bus war with other solutions such as PCI-X and
HyperTransport?
The PCI Express Working Group, Arapahoe, claims that these buses
are targeted at different solutions. RapidIO and HyperTransport were developed
for specific applications while PCI-Express is designed for general usage.
The possibility that PCI Express could replace HyperTransport as a processor to
processor interconnect is also unlikely. PCI-Express lacks the cache coherency
protocols and its higher latency than parallel interconnects with
source-synchronous clocks make it inappropriate for that type of usage.
Certainly, AMD and nVidia have nothing to fear. Intel probably would not use it
to replace the P4 bus either, since an open PCI Express standard means that Intel
would not be able to charge third party chipset vendors for P4 bus licensing.
PCI Express has a great deal of potential. Its positioning as a general purpose
interconnect gives it clear advantages in terms of flexibility and ensures that
it is capable enough to be used in a wide variety of solutions.
As with many major changes, the transition from PCI to PCI Express won't happen
overnight. ISA slots had stuck around for nearly 10 years before they were
finally gone, so don't assume that your PCI peripherals are obsolete just yet.
The PCI Express Base 1.0a Specification and Card Electromechanical 1.0a
Specification have already been released, although we won't see any PCI Express products until 2004, probably the first being video
cards from nVidia and ATi, along with motherboards based on the Grantsdale
chipset from Intel. At the server end of the market, Intel is looking to
introduce PCI Express with the Lindenhurst and Twin Castle chipsets. With new form factors and promising great performance, the
future looks good.