Original Link: https://www.anandtech.com/show/715
ServerWorks HEsl: DDR bandwidth without DDR SDRAM
by Anand Lal Shimpi on February 7, 2001 8:16 AM EST- Posted in
- CPUs
Historically, Intel has not been the number one provider of hardware for servers and workstations. Before the release of the Pentium Pro, companies like Sun, HP, IBM and DEC had their way with the high-end server and workstation markets. After the release of the Pentium Pro however, things began to change; For the first time Intel and the x86 world in general, could be taken seriously as a provider of server and workstation class hardware. Although quite expensive compared to Intel's other offerings and in spite of the fact that it was based on a platform that had quick obsolescence written all over it (Socket-8 never reared its head again), the Pentium Pro was a promising server solution. What allowed for its success was that Intel also provided reliable and downright desirable platforms for the Pentium Pro to be run on. A processor is only as good as the platform that it must run on. Case in point was the introduction of the first 133MHz FSB Pentium IIIs; they didn't sell well initially because of the fact that the i820 chipset wasn't an attractive platform.
The Pentium II expanded on the strengths and nullified the weaknesses of the Pentium Pro, continuing to offer an "Intel inside" approach to servers and workstations. The release of the Xeon line of Pentium II and later Pentium III processors extended Intel's presence in this market as well. Unfortunately, by the time the first 133MHz FSB Xeons were released, Intel was faced with a major problem.
We just finished mentioning the importance of a good platform in making a good processor, unfortunately, as we all are very aware of, Intel didn't have the best chipsets available at the end of 1999. If you'll recall back to our first article on upgrading the AnandTech server backbone, we felt the results of this deficiency in Intel's line when upgrading our servers last year. This forced us to seek alternatives, which eventually brought us to AMD's Athlon.
If the solution were as simple as recommending that everyone make use of the AMD Athlon, there would be no point for this review and no point for the company we are about to introduce in particular. The fact of the matter is that the Athlon is still not ready for the big screen when it comes to high-end servers and workstations. While the processor has the benefit of being an extremely well performing solution there is the one major caveat that is continuing to hold it back, that being multiprocessor support. The AMD 760MP chipset, the first SMP chipset for an AMD processor, is still a few months away from being available in retail channels and unfortunately, not everyone can just wait.
As we have seen so many times in history, with a hole like this in the product line of a major player, there is more than enough room for a smaller company to fill the void with a niche product. That smaller company was ServerWorks, just recently acquired by Broadcom and the niche product is their ServerSet III line of chipsets.
First Introductions: Comdex '99 with Tyan
Walking around Tyan's suite at Fall Comdex in 1999 there was a very unique board sitting there alone that caught our eye. It was an odd-looking dual Slot-1 motherboard design that used a chipset with NEC's logo prominently displayed on it. An NEC Pentium III chipset? Even more interesting was that the board featured a total of eight DIMM slots; this during a time when even finding four DIMM slots on a board was rare. This was obviously a high-end board but based on what chipset?
It turns out that the chipset we saw was the RCC Champion III HEsl; the NEC logo was present because NEC actually manufactured the chip that was designed by Reliance Computer Corporation, or RCC. We never saw much under the RCC name because in January of 2000, Reliance Computer Corp. decided to change their name to ServerWorks. Are you beginning to see the connection?
ServerWorks has actually been around for a reasonable amount of time, although generally they remained quiet because of the fact that they haven't shipped much more than 2 million parts since their creation in 1994. As we mentioned before, this is a relatively small company (compared to the Intels and AMDs of the world) that exists to tailor to the needs of a niche market, in this case the server and workstation market. ServerWorks' ServerSet line of chipsets have actually been around since 1997, beginning with the Champion 1.0 (later renamed to the ServerSet I) for the Pentium Pro. Back then, the specifications consisted of a 66MHz FSB, a 66MHz EDO memory bus and support for up to 6 Pentium Pro processors along with two 32-bit PCI buses. The ServerSet then evolved into a second-generation offering that boasted a 100MHz FSB, PC100 SDRAM and quad Xeon support as well as the introduction of a 64-bit PCI bus.
The third generation of ServerSet technology brings us to the present day and the specifications are definitely impressive. If you thought DDR SDRAM offered bandwidth, what do you say to 4.1GB/s of memory bandwidth via a 256-bit memory bus using regular PC133 SDRAM? Things are about to get interesting and yes there is a catch.
The different flavors of ServerSet III
ServerWorks is now up to their third generation of ServerSet technology. Making its debut in 1999, the ServerSet III line of chipsets from ServerWorks makes some evolutionary improvements over the ServerSet II chipsets that are now three years old.
In order to tailor to the specific needs of the market, ServerWorks produced three variations of their third generation ServerSet chipset.
The entry-level ServerSet III product is the LE chipset. This is by far the most common ServerSet III model simply because it's the most affordable and it does offer all of the basic features that high-end users are demanding. The ServerSet III LE does not support AGP meaning that the LE is only really useful for servers and workstations where 3D graphics performance isn't important. We will reserve an in-depth look at the LE for a later date but as it stands the solution is quite attractive if you need dual processor support for Socket-370 CPUs. The LE chipset is present in our recently upgraded AnandTech Forums Database server. Click here to read about the exact configuration of that server as well as the rest of the systems that run the site.
While there are a total of three ServerSet III chipsets listed on ServerWorks' website (including the LE), there are really two major members of the III series: the LE that we just finished introducing and the HEsl.
The HEsl chipset improves upon the LE by not only adding support for an AGP 2X bus but support for a few more technologies that truly make it a high-end solution. From this point on we will be discussing the HEsl chipset and the features that it boasts starting with the basics, the FSB and the PCI bus.
More bandwidth
For starters, the ServerSet III line finally supports PC133 SDRAM and with that, the 133MHz FSB. While that may seem like no big accomplishment, remember that Intel still does not offer a 133MHz FSB workstation/server class chipset that doesn't use RDRAM.
Because of the increased FSB and memory bandwidth, the ServerSet III chipsets are able to offer two 66MHz, 64-bit PCI buses and one "legacy" 33MHz, 32-bit PCI bus. This is compared to the ServerSet II line in which two 32-bit/33MHz PCI buses were offered and one 64-bit/33MHz PCI bus for the most specialized needs at that time.
We hinted at the need for a PCI bus with more bandwidth in our latest server upgrade article. As it stands now, the 32-bit PCI bus that is present in all of our desktop systems operates at 33MHz. If you do the math you'll realize that this allows for a total of 133MB/s to be transferred over the PCI bus meaning that your hard drives, Ethernet cards and RAID adapters all together are limited to a total transfer rate of 133MB/s. This isn't so much of a problem for even a power user with a single hard drive and a 100Mbps Ethernet card because, at most, he/she is using 50 - 60MB/s of that 133MB/s of available bandwidth.
Where the 133MB/s PCI bus becomes a constraint is in situations where you have a 2 or a 4-drive RAID array, sustaining transfers of over 40 - 80MB/s and bursting at much more than that. For example, the AnandTech Forums Database Server makes use of a RAID 10 array with a total of four Quantum Atlas 10K II hard drives. The average sustained read rate on those drives is approximately 25MB/s. If you multiply that by four you'll get a 100MB/s bandwidth requirement for sustained reads, it would be very likely that the burst reads peak well above 100MB/s and 133MB/s in fact. And we're not even taking into account the two 100Mbps Ethernet adapters that are present in the system on the same 32-bit PCI bus either.
While it's easy to see that your personal system doesn't have the same types of disk bandwidth requirements, don't forget that the server needs of today will become the home desktop needs of tomorrow.
A solution to this problem is to obviously give the PCI bus more bandwidth and there are two ways of doing so. You can either widen the PCI bus from 32-bits to 64 or 128-bits, which increases the overall board cost since you have more traces to deal with. Or you can simply increase the clock on the PCI bus from 33MHz to 66MHz or beyond.
PCI Bus Bandwidth Requirements | ||||||
Bus Width |
Frequency |
Max. Theoretical Bandwidth | ||||
32-bit |
33MHz |
133MB/s | ||||
64-bit |
33MHz |
266MB/s | ||||
64-bit |
66MHz |
533MB/s |
The industry accepted solution was to offer a wider 64-bit PCI bus that could run at either 33MHz or 66MHz. The 64-bit PCI bus is simply an extension of the 32-bit PCI bus meaning that there is a potential for backwards compatibility allowing 32-bit PCI devices to work in 64-bit PCI slots. It's unfortunately not that simple.
First you have to understand something about the way PCI cards are "keyed". There are two operating voltages of PCI cards, 3.3V and 5V. PCI cards have been running at 5V for the longest time, however just recently in the past couple of years there has been a move to 3.3V PCI devices. The way you can tell the difference between 3.3V and older 5V PCI cards is simple, just look at the way the PCI connector is divided, or "keyed." This picture should help clear things up:
Here we have one notch near the back of the PCI connector, this is keyed for 5V operation.
Here we have the notch from above on the right for backwards compatibility plus another one on the left for 3.3V operation.
A 64-bit PCI slot running at 33MHz looks just like a 32-bit slot with an extension placed on the end of the slot. This type of slot can be keyed for 3.3V or 5V operation; if it is keyed for 5V then it can be populated by any PCI card (3.3V or 5V, 64-bit or 32-bit). However if a 66MHz, 64-bit PCI card is installed it will operate at 33MHz.
A 66MHz, 64-bit PCI slot can only be keyed for 3.3V operation according to the specification. This means that only 3.3V PCI cards (32-bit or 64-bit) can be used in the slot.
The second and third PCI slots from the left are 64-bit/66MHz slots, the rest are 33MHz slots.
In terms of support, all of the ServerSet III chipsets support up to two 64-bit/66MHz slots and up to five 64-bit/33MHz slots not to mention the support for a legacy 32-bit PCI bus but with the flexibility of the aforementioned 64-bit slots it doesn't really make sense for a manufacturer to offer 32-bit PCI slots.
More PCI bandwidth requires more memory bandwidth
The LE chipset can get away with "only" 1.06GB/s of memory bandwidth since it does not support AGP, leaving your PCI devices and your CPUs as the only users of that bandwidth. But with the HEsl chipset, adding in AGP 2X support stretches the limits of that available memory bandwidth. Since ServerWorks intended the HEsl chipset to be for even more high-end applications, they did take this into account.
The methods for increasing memory bandwidth in regards to the PCI bus are applicable here as well. You can either increase the operating frequency or widen the bus. Increasing the operating frequency of PC133 SDRAM is out of the question since you would have to define an entirely new standard, get support from the memory manufacturers and do all the necessary testing which isn't the easiest thing to do. Widening the bus is a very realistic option and it is the path that ServerWorks chose with the HEsl chipset, but not in the same sense you would think.
As we know from our chipset reviews in the past, the North Bridge of a chipset houses the memory controller among other things. Normally, this memory controller provides a 64-bit path to the system memory. In the case of the HEsl's North Bridge, there are simply two of these 64-bit paths to the system memory.
These two "ports" are made possible by ServerWorks' own Memory Address Data Path controller that is integrated into the North Bridge. The two 64-bit ports are interleaved to provide bandwidth similar to what a 128-bit memory bus would while still conforming to the PC133 SDRAM specification. The only stipulation is that you have to install the 64-bit SDRAM modules in pairs (to total the effective 128-bit bus width) and, as is the case with most server solutions, the modules must be Registered and support ECC.
For those of you that aren't familiar with Registered DIMMs, they are not generally the same type of modules you use in your personal systems except for in some unusual cases (you will most likely know if you have registered memory or not). In contrast to unbuffered SDRAM (conventional SDRAM), Registered SDRAM features small registers present between the module's interface and the actual SDRAM chips on the PCB. They are often used to decrease loading and allow for more physical SDRAM devices to be used on a single DIMM.
If you do the math, you'll realize that this interleaved memory bus of the HEsl chipset is exactly equal to the amount of memory bandwidth provided by PC2100 DDR SDRAM while the HEsl only requires PC133 SDRAM (1.06GB/s x 2 = 2.1GB/s).
Memory Bandwidth Comparison | ||||||
Chipset |
Memory Type |
Max. Theoretical Bandwidth | ||||
i815 |
SDRAM |
1.06GB/s | ||||
i820 |
RDRAM |
1.6GB/s | ||||
i840 |
dual channel RDRAM |
3.2GB/s (2 channels) | ||||
Apollo Pro 133A |
SDRAM |
1.06GB/s | ||||
Apollo Pro 266 |
DDR SDRAM |
2.1GB/s | ||||
ServerSet III HEsl |
SDRAM |
2.1GB/s |
When you take into account that DDR isn't exactly 100% efficient and you don't often get a doubling of the actual memory bandwidth, as well as the fact that regular PC133 SDRAM is still accessed at a lower latency than PC2100 SDRAM, you'll realize that ServerWorks is onto something very special with the HEsl chipset.
To take things even further, on the higher-end quad processor Xeon platforms, ServerWorks has a special version of the HEsl chipset simply called the ServerSet III HE. This chipset uses an external Memory Address Data Path (MADP) controller that provides a total of four x 64-bit interleaved memory ports for an effective 256-bit memory bus and up to 4.1GB/s of available memory bandwidth.
Connecting it all together
With a two 64-bit PCI buses running at 33 and 66MHz, there is a clear need for a high-bandwidth connection from the 64-bit PCI Bridge to the North Bridge. As you can probably guess, ServerWorks was already on the ball when they developed what they like to call the "Inter Module Bus" or IMB for short.
The IMB is a path from the North Bridge to the NB6555IO Bridge 2.0 which is essentially the 64-bit PCI Bridge which controls all of the 64-bit PCI slots on a HEsl board. With a 64-bit PCI bus running at 66MHz offering 533MB/s of bandwidth and another 64-bit PCI bus running at 33MHz offering a theoretical maximum of 266MB/s of bandwidth, this IMB has to be pretty quick in order to avoid creating a performance bottleneck. However since 64-bit PCI devices still aren't to the point where they are requiring 533MB/s of bandwidth the bus does not have to be quite that fast.
The trade-off ServerWorks made was to make the IMB a 16-bit wide bus capable of transferring at up to 1GB/s meaning that it is clocked at an incredible 500MHz. This narrow but very fast bus is reminiscent of the most recent trends in the PC hardware industry towards more serialized operation. From things like Intel's Hub Architecture (266MB/s, 8-bit interlink bus like the IMB) and RDRAM to the forthcoming Serial ATA specification, ServerWorks' IMB is also a very serialized protocol.
Bus Link Comparison | ||||||
Architecture |
Max. Transfer Rate | |||||
Intel Hub |
266MB/s | |||||
VIA V-Link |
266MB/s | |||||
ServerWorks 'IMB' |
1GB/s |
Although ServerWorks wouldn't divulge too much information on it, they claim that there is some sort of an intelligent caching mechanism present to help manage the available bandwidth. There is a strong sense of secrecy about a lot of the technology behind the ServerWorks chipsets mainly because of fears that in this highly competitive industry the technology that they have worked so hard to build can be copied so very easily if too much is disclosed.
With all of this, the HEsl's South Bridge isn't even necessary other than to provide support for the 32-bit PCI bus and any IDE hard drives that may be used. The South Bridge does claim support for up to 4 USB ports.
Don’t count on IDE
Things weren't perfect with our HEsl experience unfortunately, in fact there were a handful of issues that we would not have expected from a server class solution like this.
The first time we ran through the HEsl's performance tests we used our usual chipset test bed, which made use of an Ultra ATA/100 IBM 75GXP drive. Realistically, HEsl purchasers will most likely be pairing up their boards with some sort of high-speed SCSI solution however for our purposes we just wanted to compare it to other platforms not necessarily construct a server out of it.
With the 75GXP we noticed that the HEsl platform was performing quite poorly even to the point where it was being outperformed by a dual Apollo Pro 133A based system. After running some IDE performance tests on the platform we noticed a very unusual anomaly, the 75GXP was bursting at 16.5MB/s. This is even slower than the PIO Mode 4 specification allowed for (this was prior to ATA/33 if you remember).
We asked ServerWorks about it and learned some very interesting information. It turns out that the HEsl's South Bridge isn't even ATA/66 compliant, it only offers ATA/33 support and even then the performance of it is quite poor indeed offering burst speeds at barely above half of the ATA/33 specification.
This wasn't a problem for us; we just switched to the Seagate Cheetah X15 as our test drive. For those of you that aren't familiar with it, the X15 is a 15,000 RPM Ultra160 SCSI drive and calling it fast is definitely an understatement. But you should be very weary of using an IDE drive on any ServerSet III chipset (they all use the same South Bridge) if you're going to be using it for anything more than just a boot drive. This shouldn't be too difficult of a problem to fix with some driver workarounds, but it is a disappointment that it does exist.
Poor AGP performance
The second issue we ran into was that the graphics performance of the HEsl chipset. Using our test bed's GeForce2 GTS, the HEsl platform was over 40% slower than the rest of the test setups. We also found it odd that there were no AGP drivers present anywhere, either from ServerWorks or the motherboard manufacturers that used their chipsets.
It turns out that in order for the OS to properly recognize the HEsl's AGP bus you've got to fool it into thinking the North Bridge is actually an LE chip. ServerWorks was kind enough to send us instructions on how to do this, have a look at them first:
1. Program the following registers in Function 0 of the chipset.
(a) Set the bit “7” of Index “5B” to One.
(b) Set bits 3:0 of Index “5B” to 0111h.
(c) Read the Index “02” to know bits are set (it must read 0x07).2. Program the following registers in Function 1 of the chipset.
(a) Set the bit “3” of Index “48” to One.
(b) Set bits 3:0 of Index “C7” to 0101h.
(c) Read the Index “02” to know bits are set (it must read 0x05).3.Implement Vendor ID and Sub System ID in Index 2C,2D,2E ,2F (Function 1)
(a) Set the bit “3” of Index “48” to One.
(b) Set the Index “C8 – C9” to reflect your Vendor ID.
(c) Set the Index “CA – CB” to reflect your Sub System Vendor ID.
(d) Index “2C – 2F” reads the value written in Index “C8 – CB”.
Now aren't you glad that VIA puts regularly updated INF patches and AGP GART drivers on their website? In order to give the HEsl a fair chance we contacted ServerWorks and worked with them to find the best method to go about fixing this issue.
They managed to produce some assembly code that we could run which would change the chip id for the North Bridge, which should enable proper AGP support by Windows 2000. Unfortunately, after multiple reinstalls as well as mucking around with the test board's BIOS our efforts were fruitless.
The assembly code managed to get the chip id changed and the AGP bridge to be recognized, however installing NVIDIA's Detonator drivers resulted in a black screen upon booting to Windows 2000 every time.
This is another major disappointment since there should be no reason that ServerWorks can't do what ALi, Intel, SiS and VIA all do in order to provide proper support for their chipsets under Windows 2000: release a driver/INF patch.
As it stands now, you pretty much have to deal with the poor graphics performance of the solution. How bad is it? Take a look at the SPECviewperf performance numbers we produced:
ServerWorks believes that there is a solution to this, why it hasn't been implemented in the current motherboard designs or in the form of a driver patch is beyond our understanding however.
Tyan: the King of Server Boards
Tyan has a very bright history for making impressive workstation/server class motherboards and when armed with the HEsl chipset they produced yet another goliath of a motherboard.
The Tyan Thunder HEsl was the motherboard we used to represent the ServerSet III HEsl in the comparison and as you can tell by its picture, it's quite feature filled. Like previous members of Tyan's Thunder line, the Thunder HEsl comes with on-board SCSI, LAN and sound.
The board did have a few quirks, one being that Tyan didn't seem to follow ServerWorks' DIMM layout specifications properly as there are issues regarding which DIMM slots you install memory in if you're only using two (remember you must install DIMMs in pairs). When we installed DIMMs in two of the memory slots in a supported configuration indicated by the manual, the system would no longer boot into Windows 2000, simply choosing the remaining set of DIMM slots remedied the situation. There were no problems with all four DIMMs populated which was how we tested.
The Competition
In order to provide for a good comparison we picked a total of three competing dual processor platforms to pit the Thunder HEsl against.
The OR840 from Intel, which makes use of the Intel 840 chipset.
The MSI 694D Pro using the Apollo Pro 133A chipset. This is the same board we used in our original VIA Apollo Pro 133A with Dual Processors Review.
And finally we have an engineering sample of the first dual processor board based on the VIA Apollo Pro 266 chipset: Iwill's DVD266-R.
Since the DVD266-R was in early beta, there were issues with the fourth DIMM slot working reliably. So in order to obtain a 512MB memory configuration to match the rest of the contenders we used Mushkin's latest 256MB PC2100 DDR modules which worked perfectly at PC2100 CAS2.
Since the two processors in a SMP system will be competing for the same FSB and memory bandwidth, the more memory bandwidth a platform has, the less of a bottleneck that will be for the CPUs and the more efficient (in theory) the SMP setup will be. It's time to find out if that holds true.
The Test
Windows 98SE / 2000 Test System |
||||||
Hardware |
||||||
CPU(s) |
2 x Intel Pentium III 733 | |||||
Motherboard(s) | Intel
OR840 (i840) Iwill DVD266-R (Apollo Pro 266) MSI 694D Pro (Apollo Pro 133A) Tyan Thunder HEsl (ServerSet III HEsl) |
|||||
Memory |
512MB
PC133 Corsair SDRAM (Micron -7E CAS2) |
|||||
Hard Drive |
Seagate Cheetah X15 (18.4GB) 15,000 RPM Ultra160 SCSI |
|||||
CDROM |
Phillips 48X |
|||||
Video Card(s) |
NVIDIA GeForce 2 GTS 32MB DDR (default clock - 200/166 DDR) |
|||||
Ethernet |
Linksys LNE100TX 100Mbit PCI Ethernet Adapter |
|||||
Software |
||||||
Operating System |
Windows 2000 Professional SP1 |
|||||
Video Drivers |
|
|||||
Benchmarking Applications |
||||||
Gaming |
Unreal
Tournament 4.32 Reverend's Thunder.dem |
|||||
Productivity |
BAPCo SYSMark 2000 |
Home/SOHO Performance
Click Here
for a description of Business Winstone 2001
You wouldn't expect us to start off the performance analysis of a high-end multiprocessor chipset with scores from Business Winstone, but with the new 2001 version of the benchmark the tests are actually quite useful to analyzing overall system performance.
As with most Winstone benchmarks, Business Winstone 2001 is still mainly disk limited, resulting in the relatively small performance differences that you see here. The fact that we used such a high performance SCSI drive in the test systems however does help to minimize some of the effects of this limitation.
Without too much surprise, the i840 chipset comes out holding up the rear in this test. The reasons are fairly simple; remember that out of the three memory types compared here (PC133, PC2100 DDR and PC800 RDRAM) the i840's dual channel PC800 RDRAM has the highest overall latency.
With the types of applications that Business Winstone 2001 switches through, the 1.06GB/s of memory bandwidth allowed for by the PC133 specification is generally enough and won't limit the performance of the system. In fact, business applications will prefer low latency operation to high bandwidth solutions provided that they are given enough memory bandwidth to start with. For this reason we see that the i840, with its 3.2GB/s of memory bandwidth is approximately 3% slower than the two VIA solutions: the DDR based Apollo Pro 266 and the PC133 SDRAM based Apollo Pro 133A.
It's interesting to see that the ServerSet III HEsl takes an early lead in a benchmark that we just finished saying didn't really take advantage of much memory bandwidth. Luckily the HEsl has the best of both worlds, the latency of PC133 SDRAM combined with the bandwidth of PC2100 DDR SDRAM giving it the 6.5% advantage over the i840 and a small 3% performance advantage over the two VIA solutions.
While Business Winstone 2001 isn't a benchmark that is representative of what the HEsl chipset is best at, it's a good starting point because there will be some situations where the HEsl is doubling as a high-performance workstation as well as a personal system and there's no reason that the chipset should not be able to handle such "simple" tasks.
Click Here for a description of Content Creation Winstone 2001
Content Creation Winstone 2001 paints a slightly different picture. The Apollo Pro 133A falls below the rest of the contenders here, the only difference between this benchmark and the previous one is that there is a stronger focus on memory bandwidth because of the nature of these "content creation" applications in comparison to business applications.
Because of this the lower latency PC133 SDRAM isn't enough to keep the Apollo Pro 133A up towards the top of the charts mainly because the i840's dual channel RDRAM and the Pro 266's DDR SDRAM offer a considerable improvement in terms of bandwidth.
The ServerSet III HEsl once again proves that the best of both worlds (low latency + high bandwidth) is enough to propel it to the top of the charts. However if you will notice, the difference between the highest performing HEsl and the i840/Pro 266 is negligible. The only tangible performance increase is seen when compared to the Apollo Pro 133A. It seems as if 1.06GB/s of memory bandwidth isn't enough for dual processors.
We should all be quite familiar with SYSMark 2000 by now. The benchmark script runs no more than a single application at once and is clearly not a benchmark that can be used to measure multiprocessor performance or test for memory bandwidth utilization as a good deal of the benchmarks aren't incredibly bandwidth intensive.
The only real performance difference we see here, again, lies with the 133A but even it is only 2% slower than the fastest performers in this test.
Now that we've gotten those initial tests out of the way, let's look at some more multiprocessor oriented benchmarks and some more memory bandwidth stressors.
Dual Processor Inspection Tests
The Dual Processor Inspection Tests are actually a part of eTesting Labs' (formerly ZDBOP) High-End Winstone 99. The benchmark runs through multiprocessor versions of Bentley's Microstation SE (CAD/modeling software), Adobe's Photoshop 4.0 and Microsoft's Visual C++ 5.0. The overall score is produced which you can see compared below as well as individual performance scores in each of the three benchmark categories.
It's not too surprising that the numbers all pretty much fall in line with one another since we are using the same amount of memory and equally clocked CPUs on all four systems. The biggest performance difference here is no more than 3% between the highest performing solution, the HEsl and the lowest performing contender, the 133A.
The HEsl continues to hold the lead over the competition however the lead isn't spectacular at all. If you'll remember from our review of the Apollo Pro 266 chipset, the Pentium III is not a very memory bandwidth hungry processor at all and although the added memory bandwidth is definitely more useful with two processors the fact of the matter is that a doubling of memory bandwidth isn't going to be able to do much for the Pentium III.
Looking at the individual components of the Dual Processor Inspection Tests, the Microstation SE benchmark shows very little variation among the processors here. Microstation and many CAD/modeling packages don't really benefit all that much from having dual processors, especially when talking about the Pentium III. In fact a single 1GHz Athlon would be faster than any of these dual 733MHz Pentium III systems.
Photoshop 4 MP & Visual C++ MP
Photoshop is a much different situation than Microstation in that dual processors not only help out performance-wise, but they are actually very appreciated. Again we see that the performance range is only about 3% in this benchmark as well, mainly because of the fact that we're dealing with fairly similarly configured systems.
The only platform with less than 2GB/s of peak available memory bandwidth is the Apollo Pro 133A and it performing slightly below the rest of the systems but not by much at all as we just mentioned.
This particular test sequence definitely favors dual processor CPUs as a 1.2GHz AMD Athlon with DDR SDRAM is about 30% slower than these lowly dual 733s. In terms of favoring one platform over another however, this test shows that there's very little difference between the four.
The final member of the Dual Processor Inspection Test Suite is Visual C++. The coders out there know exactly how time consuming compiling large projects can be especially when you're doing multiple things at once. This is another situation in which having dual processors really helps, simply because of the additional computational power. However it is clear that having some additional memory bandwidth does not hurt either.
For starters let's take note of the fact that the performance range under the Visual C++ 5 test is just over 9%. Taking into account that all of these systems have the same CPUs, with the same hard drive and the same amount of memory, this illustrates that close to a 10% performance gain can be had simply by increasing the amount of available memory bandwidth.
A very likely possibility being that in a situation like this, where multiple processors provide a considerable performance improvement (dual 733s in this test are faster than a single 1.2GHz Athlon), is the fact that the processors are competing heavily for what limited memory bandwidth they have to share and the more they have the more productive each individual CPU can be.
This would explain why the chipsets perform the way they do, except for the HEsl which rises to the top presumably because of a combination of its high bandwidth memory subsystem and other more efficient parts of the chipset's design; possible reasons for this include the intelligent caching the North Bridge does, leaving more useable bandwidth for the CPUs and simple performance tweaks that ServerWorks has picked up on over the years while optimizing for workstation/server class systems.
Photoshop 6 Performance
One of the biggest questions we get is how to build a powerful machine for heavy-duty image editing and while the Photoshop 4 MP benchmark from earlier on in this review provided some indication to the performance differences between the platforms we wanted to take a look at a more thorough comparison.
We turned to a little known benchmark called PSBench combined with Photoshop 6.0 to perform a time trial test of 21 filters and modifications to a 50MB TIFF file measuring 4830 x 3624. While this is a tad on the extreme side, we wanted to see truly what differences existed between these platforms if any and based on that, conclude whether professional image editing programs such as Photoshop are more likely to respond to more memory bandwidth or faster processors.
The details of the benchmark are found below which is a table of the average of three time trials for each filter/modification performed on the image. If you'd like to try your hands at the PSBench test script yourself, you can find it here. Remember that there are three PSBench scripts, a 10MB, 20MB and a 50MB test; we used the latter.
|
ServerSet III HE |
i840 |
Apollo Pro 133A |
Apollo Pro 266 |
Filter/Action |
Time to complete in seconds (lower is better) | |||
Rotate 90 |
5.43 |
9.13 |
8.2 |
9.17 |
Rotate 9 |
11.33 |
15.6 |
14.77 |
16.4 |
Rotate .9 |
10.2 |
14.47 |
13.6 |
15.7 |
Gaussian Blur 1 pixel |
5.63 |
8.8 |
8.2 |
8.3 |
Gaussian Blur 3.7 pixels |
6.43 |
9.27 |
9.27 |
9.9 |
Gaussian Blur 85 pixels |
15.43 |
19.9 |
19.97 |
20.13 |
50%, 1 pixel, 0 level Unsharp Mask |
4.5 |
7.53 |
7.33 |
7.8 |
50%, 3.7 pixel, 0 level Unsharp Mask |
6.6 |
9.47 |
9.47 |
9.8 |
50%, 10 pixel, 5 level Unsharp Mask |
6.67 |
9.57 |
9.73 |
10.13 |
Despeckle |
7.5 |
10.5 |
10.3 |
11.1 |
RGB-CMYK |
31.63 |
34.33 |
34.3 |
35.2 |
Reduce Size 60% |
2.93 |
4.4 |
4.3 |
4.1 |
Lens Flare |
19.03 |
23.83 |
23.57 |
24 |
Color Halftone |
37.07 |
44.27 |
44.13 |
44.1 |
NTSC Colors |
11.7 |
12.8 |
12.57 |
13.13 |
Accented Edges Brush Strokes |
34.6 |
36.3 |
35.83 |
36.57 |
Pointillize |
60.3 |
64.83 |
64.07 |
65.47 |
Water Color |
73.63 |
75 |
75.5 |
75.83 |
Polar Coordinates |
37.5 |
44.83 |
44.8 |
45.07 |
Radial Blur |
588.97 |
604.2 |
597.73 |
598.53 |
Lighting Effects |
8 |
10.97 |
12.97 |
11.43 |
Interestingly enough, the two VIA platforms and the i840 perform within 1% of one another. The 133A manages to be slightly faster than the other two, quite possibly because of latency advantages (remember that PC133 SDRAM still has a lower latency than DDR SDRAM which has a lower latency than RDRAM) as well as pure platform maturity. But when you look at the big picture, the 133A is still no more than 1% faster than the Pro 266 or the i840.
In spite of how the other three platforms performed, the HEsl managed to distinguish itself from the competition by around 8%. Since there's very little tangible information that we know about the inner workings of the HEsl chipset we can only attribute this performance advantage to what we do know.
We do know that Photoshop was quite appreciative of the 133A's PC133 SDRAM (CL2) which was used on the HEsl as well. The fact that the HEsl has an effective 128-bit (64-bit interleaved) memory interface using this low-latency PC133 SDRAM could explain part of the performance advantage but there is still a portion that is up for grabs. Again, this could be attributed to efficient management of the memory/FSB bandwidth that is available through the use of caches and other performance tweaks.
Remember that the HEsl North Bridge is a 644-pin chip making it larger than both the i850's MCH and the AMD 761 North Bridge. They have to be using that added die space for something…
Introducing Constant Computing
Since we've already introduced a few new benchmarks in this review, why stop now? If you're anything like the majority of the AnandTech team, you downright abuse your computer. You're running multiple applications at once while checking your mail, sometimes streaming audio/video, communicating using NetMeeting or ICQ and are constantly torturing it with a barrage of applications. Unfortunately it's always been very difficult to represent this type of usage behavior in a benchmark, the closest we've ever gotten was with Winstone and even then it wasn't close enough although a good try.
Luckily, a company called CSA Research has allowed us to change all of that. While we won't get into the politics of how CSA Research came to be in their current situation, basically the group used to work for Intel doing, you guessed it, performance simulations. One thing lead to another and the two parted ways, leaving CSA Research with this benchmarking technology and nothing to do with it. So they did what will go down in history as one of the mottos of the 1990 - 2000 era, they put it on a website.
The package is called Benchmark Studio and its key component is called Office Bench. The beauty of the way Office Bench works is that it not only performs the normal tasks any Business Winstone-like benchmark would (working with MS Word, Excel and Power Point) but it also can work with Benchmark Studio to simulate other types of load. The types of load that the Benchmark Studio can simulate range from accessing databases to checking email and streaming video. Using the Benchmark Studio interface you can completely customize how many instances of each type of load you'd like to create and when they loop.
In order to paint a complete performance picture we picked three customized settings. The first being a plain run of Office Bench without any additional forms of dynamic load. We repeated the tests up to 15 times in order to get rid of any variation in the results (the tests are fairly short as you'll be able to see). The second test featured a total of 13 instances of load using the stress modules in Benchmark Studio and the final torture test featured a total of 30 concurrent tasks that were running while the Office Bench script executed. You can experiment with Benchmark studio yourself by downloading it from www.csaresearch.com.
Keep in mind that these scores are in seconds, meaning the lower the score the better the performance.
The first test setup featured no additional tasks that were running while the Office Bench script executed. The first thing that you notice is the fact that the 133A platform was noticeably slower than all of the other platforms. One of the beauties of Benchmark Studio is that it's not nearly as disk limited as Winstone is (only approximately 5% of the time was spent waiting on the disk) meaning that the performance differences between platforms is more pronounced.
More pronounced doesn't necessarily mean more realistic although it can in some cases, but this definitely helps us perform our comparison.
The fastest platform in this relatively light test is the Apollo Pro 266, completing the script approximately 1% faster than the next runner up, the i840. Those two platforms can be considered to be essentially equals in this test. The reason that the i840 is able to pull its weight here instead of falling behind is due to its dual channel RDRAM memory subsystem which helps reduce the effects of RDRAM's high latency on the Pentium III platform.
The HEsl manages to distance itself from the 266/840 by a few percent and slips into third place. However being in third place means very little other than it's just 0.4s slower than the i840 but definitely faster than the 133A.
Let's throw some more tasks at these platforms, how about a total of 13 concurrent ones?
Heavy Multitasking
The times to completion obviously increase as we increase the load that these systems are put under however the standings remain relatively similar. The Apollo Pro 133A is still put to shame by the fact that the three higher bandwidth chipsets can complete the same tasks in around 85% of the time.
Once again, the HEsl, 840 and Pro 266 chipsets are all in the same performance ballpark. It seems as if with these three chipsets we are hitting the limits of what two Pentium IIIs can effectively use in terms of memory bandwidth.
While there's a need for more than the 1.06GB/s offered by the 133A when dealing with two CPUs, there doesn't seem to be as dramatic of a difference between the two 2.1GB/s solutions and the sole 3.2GB/s solution. Granted that they are all using different types of memory but their similarities are great enough, especially with the low-latency/high-bandwidth HEsl platform in the mix, that we should be able to make some generalizations based on their behavior.
Throwing even more concurrent tasks at the platforms doesn't change things any more, indicating that our original assumptions were true as the HEsl, 840 and Pro 266 are performing relatively close to one another. While the 133A doesn't have quite enough memory bandwidth, the remaining three platforms have a little more than what is useable by the processors.
Useable Memory Bandwidth
We've been talking about useable memory bandwidth throughout this review and to conclude our performance comparison we'll have a look at some STREAM results indicating exactly how much memory bandwidth the CPU is actually making use of. Let's start out with some Integer-STREAM results:
We would expect the i840 to come out on top because of its dual channel RDRAM memory bus offering a total of 3.2GB/s of memory bandwidth. Of that 3.2GB/s the system managed to pull through a total of 582MB/s during this Integer-STREAM test. While that may seem disappointing don't forget that the i840 also took the first place, meaning that all of the other platforms offered even less.
The HEsl came in second place with 535MB/s which was also very expected. Remember that DDR SDRAM isn't 100% efficient, meaning you're always going to get more bandwidth out of a SDR solution capable of pushing the same theoretical bandwidth figures.
The Apollo Pro 266 obviously comes in third followed by the 133A with 402MB/s.
Next we have an even more memory-bandwidth intensive measurement using FPU calculations.
Here the standings remain the same although the top three positions get a small boost in terms of bandwidth because of the bandwidth hungry nature of most FPU calculations.
Final Words
If you've made it this far, give yourself a round of applause. We understand that there was quite a bit of information to digest in this review but now we can finally put it all together and make some conclusions.
As a high-performance graphics workstation, the ServerSet III series of chipsets should definitely be avoided until the issues we ran into can be corrected. For occasional graphics work or when using PCI adapters you should be fine, however for anything that requires good AGP performance, the ServerSet III should be avoided for now.
In terms of the competition, it is clear that the VIA Apollo Pro 133A was a good entry into the low-cost multiprocessor market when it was introduced as a SMP solution last year however it has since been replaced as such. The most promising out of all of the chipsets compared here was actually the VIA Apollo Pro 266 because it offered performance that was truly on par with the higher bandwidth solutions and will retain it's sibling's promise of being a cost effective offering.
The Intel 840 chipset is still quite powerful and is a very attractive option. The only thing holding it back really is the fact that very few 840 based motherboards exist and it is still relatively expensive to outfit an 840 motherboard with over 1GB of RDRAM; although we will admit that it is much more reasonable of a proposition now than it was 12 months ago.
This brings us back to the original topic of this review, the ServerWorks ServerSet III HEsl. Without a doubt this is definitely one of the most interesting chipsets we have seen in a while and it does get the job done. The majority of the performance tests were won by the ServerSet III HEsl with less theoretical bandwidth than the i840 chipset and while using regular PC133 SDRAM in comparison to the Pro 266's DDR SDRAM. That is really the reason for the beauty of the chipset, the fact that it can offer such high performance while still using conventional PC133 SDRAM.
The design is quite elegant, especially with the North Bridge's integrated MADP controller that provides the dual 64-bit 133MHz SDRAM ports. It is almost frightening to think about what a full blown HE based solution would be able to provide in terms of raw memory bandwidth.
The downside to this all is that the HEsl does not come cheap. The Tyan Thunder HEsl we expect will be priced much higher than $600 since that is the maximum price for most LE based motherboards. Realistically, the Thunder HEsl is an $800 board that puts it way out of the price range of many power users making the Apollo Pro 266 an even more attractive and cost effective option.
Even if the price were right, there is an unfortunate problem, which is something that ServerWorks can't do much about. And that is the fact that the Pentium III simply isn't a memory bandwidth hungry processor and can't begin to use the potential of what ServerWorks can offer it. There is another Intel processor that can however, and that is the Pentium 4. You can expect the next-generation of ServerWorks technology to be quite impressive and it will hopefully be done justice by being paired with a processor that can take advantage of it.
Looking at the even bigger picture however, there is something that is due out shortly which will be able to make a very big impact on the market. As we mentioned throughout this review, there are situations in which our dual Pentium III 733 test bed offered performance greater than a single 1.2GHz Athlon. But imagine the performance advantage a pair of Athlons would be able to offer. The AMD 760MP chipset is still a few months away from mass production but don't be too surprised if you start hearing about it sooner than that. If dual Pentium IIIs can provide this kind of a performance increase, it's even scarier to think of what dual Athlons can do.
The market may be in a slump right now, but with NV20, Palomino and AMD 760MP on the way, it's about to be upgrade season again.