Original Link: https://www.anandtech.com/show/435

Pro/ENGINEER Performance of High End x86 CPUs (Athlon vs P3)

by Anand Shimpi & Dan Kroushl on January 3, 2000 3:38 AM EST

Posted in
CPUs

1 Comments

One of the most common assumptions made here at AnandTech, as well as elsewhere, is about the performance of the latest and greatest processors in “professional applications.” You see it everywhere - look at the tons of Athlon reviews published and the numerous CPU reviews that go up on a daily basis. The relationship between floating point performance and superior performance in what are stereotyped to be “professional applications” is made constantly and often without much data to back it up.

In actuality, quite a few factors outside of floating point performance affect the group of applications commonly known as “professional applications” and on top of that, each particular type of “professional application” responds in a different way to each one of those factors depending on the particular application. As you can probably guess, it is very difficult to make a generalization as to which one CPU is the best for professional applications. In order to answer that question, you would have to benchmark just about every available program with that classification and compare the results. In the end you’d have a table of benchmarks that would point out one obvious fact - there is no winner take all in every application used in the professional world.

In some cases, you have applications that are truly video card dependent, and, therefore, CPU performance isn’t as big of a factor. In other situations you have applications that are more dependent on a fast L2 cache and memory bus rather than an extremely strong floating point unit, but then again there are some applications that are the exact opposite.

Unlike testing business applications or games, working with professional applications is much more specific and it is very difficult to make a generalization about performance in the area as a whole. At the same time, like business applications, in the professional world you often have specific applications that are catered to specific tasks like a Word Processor or Spreadsheet program is in the business world. A user will find an application that suits his particular needs and stick to it. This is exactly why you see some users that prefer MS Office while others would rather use Lotus Smart Suite or Corel Office. The same situation exists in the professional world: while some users prefer applications such as 3D Studio MAX others prefer Maya.

The difference between the professional world and the business world is that your job doesn’t normally depend on how fast you can run Office or Smart Suite. We consistently point out the fact that business application performance is not a major point to stress when looking at the performance of today’s CPUs, since there is only so much you need from a processor when dealing with business applications. The exact opposite is true for professional users. Their jobs generally depend on being able to run specific applications, and, often times, they are left with a very difficult question to answer, “Which CPU helps me do my job faster?”

The application that we have decided to focus on in this comparison is Parametric Technology Corporation’s (PTC) Pro/ENGINEER. According to PTC, who develops, markets and supports the Pro/ENGINEER software, “Pro/ENGINEER is the de facto standard for mechanical design automation.”

Unlike AutoCAD and other 2D CAD packages, Pro/E is a 3D solid modeler with a parametric, feature-based, fully associative architecture. It has the capability to provide a complete product development solution, from conceptual design and simulation through manufacturing.

In function, Pro/ENGINEER is similar to UG (Unigraphics), SDRC I-DEAS, SolidWorks, Mechanical Desktop, and IronCAD. If you’re familiar with any one of those programs then you already have some insight as to what Pro/ENGINEER does and is used for.

Originally, Pro/ENGINEER was written for the UNIX operating system and was primarily used with SGI workstations. In fact, in 1988, about ninety percent of Pro/ENGINEER users had a Silicon Graphics workstation on their desk. Eventually, other UNIX vendors such as Sun, IBM, DEC and HP developed products for use with Pro/ENGINEER.

In 1996, Windows NT workstations started surfacing with their relatively fast OpenGL graphics hardware. Since then, the NT workstation has provided the price and performance combination that has made it the system of choice for new workstation purchases. Currently, PTC supports Alpha, Intel, Sun, HP and PowerPc based systems. However, as of now, no AMD based systems are officially supported.

Pro/ENGINEER works with Windows 95 and Windows 98 as well. Support for LINUX has been considered by PTC, but no release date has been set.

In terms of an installed user base, according to PTC, their product development software is currently being used at over 27,000 companies worldwide with a total number of seats installed at over 230,000. These numbers have doubled within the past two years. With a strong following, it isn’t surprising to see that there is a demand for performance benchmarks on the Pro/ENGINEER platform.

The Need

In the past, obtaining a high performing NT Pro/E workstation required a journey into the realm of Digital’s Alpha processor, more specifically the 21164. The 21164 running at 600MHz, at the time, was something that no x86 CPU could touch in Pro/E.

If you’re familiar with the architecture of the Alpha, you’ll know that although it has a native port of Windows NT, most NT applications do not have native ports for the Alpha platform. In order to make the Alpha a viable alternative, Digital developed a binary translator for the Alpha called FX!32 The purpose of FX!32 was to recompile, on the fly, x86 binaries into the native Alpha binaries as well as optimize them for performance with further use.

Naturally, this binary translation process was not as efficient as running native Alpha code, but it was still faster than anything in the native x86 market thus making it the best option for Pro/E users running Windows NT. With the announcement that FX!32 would not be supported under Windows 2000, the extremely large Pro/E user base running NT was suddenly left with an ultimatum to find a true x86 alternative to the Alpha.

Luckily, times have changed, and the Intel Pentium III as well as AMD’s recently released Athlon are both capable of boasting relatively high performing FPUs, much more powerful than those that were present in their respective CPUs when the 21164 was first introduced. At the same time, setting up a Pentium III or an AMD Athlon system is noticeably cheaper than going with an Alpha workstation under Windows NT; in fact, quite a few of AnandTech’s readers use their tweaked P3 and Athlon systems under NT as powerful workstations without having to spend thousands of dollars on the systems.

The clock speed battle Intel and AMD are currently engaged in is pushing the levels of performance even higher, and with the x86 platform becoming an even greater contender in the high end workstation market it isn’t surprising to see Pro/E users turning to Intel and AMD to solve their performance problems at a reasonable cost.

While all of this is going on, the Alpha 21264, the successor to the 21164, has been dominating the charts for quite some time. At 667MHz, the 21264 has made it extremely difficult for any x86 CPU to rise to the top under Pro/E. Only with a huge increase in clock speed could the current contenders in the x86 market begin to defeat the 21264 at 667MHz under Pro/E.

As we just mentioned, with both companies on the verge of announcing 800MHz+ parts, this dream could come true very quickly.

Pro/E’s Demands

As we alluded to earlier, Pro/E is a very demanding application. The models created within Pro/E are generally quite large and manipulating them is what puts the greatest strain on the systems that run Pro/E.

For starters, the application and the specific tasks that its users put it through can result in very demanding memory requirements. The typical Pro/E workstation uses at least 256MB of RAM, but seeing a workstation with over 1GB of RAM is not uncommon.

Common Pro/E assemblies can contain up to 5000 components, which is a challenge for any workstation to deal with. Just reorienting an assembly to get a good look at the area in which you are going to reference your next component can take over a minute. An assembly cross-section can take 20 minutes, a global interference check, 30 minutes. Add up all of those minutes and you get hours and hours of thumb twiddling. And, not to mention, sore thumbs.

So in the end, we have an application that is very demanding on system requirements both from the memory perspective and from the CPU perspective, and, at the same time, we are looking to make use of the latest x86 processors as the basis for an affordable workstation capable of driving this very demanding application.

In order to find out which systems perform the best in any application under any situation we always turn to application specific benchmarks. If a user wants to know what video card runs a game like Quake III Arena or Unreal Tournament better than the rest, they look at timedemo scores, and if a user wants to know what CPU is best suited for a Pro/E workstation they turn to BENCH99, SPECapc, and the OCUS scores. Just as demo001 and UTbench are not necessarily familiar benchmarks to all Pro/E users, BENCH99, SPECapc and the OCUS are not necessarily familiar benchmarks to all Quake III and Unreal Tournament fanatics. In order to establish what these industry standard benchmarks are, let’s take a look at where their results are published and what those results represent.

Pro/E Benchmarks

Pro/E benchmarks are not too different from the Winstone and SYSMark benchmarks that AnandTech readers are used to. Both Winstone and SYSMark test the performance of various applications by running a variety of different “real world” tasks on a set of sample data. Whether that sample data is a word processing document, a spreadsheet or even an image file is dependent upon the particular application that is being benchmarked. Pro/E is no different. In the case of Pro/E, the sample data comes in the form of a pre-designed “part” that is being manipulated during the course of the test.

With Winstone and SYSMark, results are usually reported as a number that illustrates how well the system being benchmarked compares to a baseline system. For example, a Content Creation Winstone 2000 score of 10.0 indicates performance equal to that of the Content Creation Winstone 2000 base test machine, and a score of 20.0 means performance double of that of the base machine. Performance is calculated according to how long it takes for the system being tested to run through the various parts of the benchmark.

Similarly enough, Pro/E benchmarks generally report performance in terms of the time required to complete various calculations and manipulations dealing with the test part used in the benchmark. So understanding what these benchmarks represent isn’t too difficult, at least for someone who has already been exposed to performance benchmarks of this nature.

The first and foremost authority on Pro/E performance is Pro/E: The Magazine. The monthly publication reaches over 35,000 qualified users and, on an annual basis, Pro/E: The Benchmark is published. According to them, “Pro/E: The Benchmark edition is the most frequently referenced and valued workstation Benchmark in the Pro/ENGINEER community.”

Pro/E: The Benchmark compares Pro/E performance based on the results of a benchmark known as BENCH, with BENCH99 being the current edition of the benchmark. BENCH99 consists of two parts: CPU/IO99 and GBENCH99.

CPU/IO99 has 9 tests which measure CPU and I/O performance. GBENCH99 consists of 11 tests which measure graphics performance. Both perform operations on a single part. The part itself is not as complicated as the models that most Pro/E users use on a daily basis, but the graphics operations performed in the GBENCH include among other things, clipping and texturing. According to Daniel Kroushl, one of the many Pro/E users out there “it would be a rare for the average user to use this functionality.”

With respect to the average user, the overall results are overly weighted toward graphics performance. Anyone with a Pro/E license can run this benchmark, but first you have to come up with $50 to get the test files.

For the purpose of comparing x86 CPU performance, specifically comparing the performance of the Athlon to the Pentium III under Pro/E, BENCH99 as a benchmark did not seem to offer the best method of comparison, although CPU/IO99 alone would admittedly have served our task to compare the CPUs.

The next benchmark is provided for by the Standard Performance Evaluation Corporation (SPEC), and the benchmark is known as SPECapc for Pro/E Revision 20.

The SPECapc for Pro/E Revision 20 consists of 17 tests. The model used in the benchmark is a realistic rendering of a complete photocopy machine consisting of approximately 370,000 triangles. This is a very complex model on which very complicated graphics tests are performed.

In response to the complexity of the SPECapc benchmark, Kroushl, a Pro/E professional, states, “…but for 95% of Pro/E users, the SPECapc is overkill. Most of us will never approach the combination of model size and graphics complexity that is demonstrated with this benchmark.”

This benchmark tends to reward the workstations with the most expensive graphics cards with the best results and is thus a poor candidate for a strictly CPU comparison.

Anyone with a Pro/E license can run this benchmark. The download is free, but, for optimal results, approximately 512MB of RAM is recommended.

The final benchmark that we investigated in our research was the OCUS R20 benchmark. Designed by Olaf Corten, the OCUS benchmark consists of 17 tests which are broken into 4 sub-categories: CPU, Graphics, GUI and Disk.

Operations are performed on a single part, which is similar in complexity to the BENCH99 part. The tests are also similar to BENCH99. The main difference between these benchmarks is that the graphics operations that the OCUS R20 completes are more in line with what the average user would perform on a daily basis. The result is that the OCUS benchmark is an excellent benchmark as a result of its dependency on not only a strong graphics subsystem but a strong CPU as well, without biasing results towards strengths in either category. If the graphics card is kept at a constant, the OCUS makes for an excellent benchmark of CPUs, or a wonderful real world test of graphics cards under Pro/E, if the CPU is kept constant.

Choosing a new workstation configuration based on the results of this benchmark should automatically point you toward a system with good performance. Anyone with a Pro/E license can run this benchmark. The test itself requires approximately 150MB of RAM and would run perfectly on a system with 384MB – 512MB of RAM, which is far from being considered extreme in the Pro/E workstation world where systems commonly approach and surpass 1GB memory configurations.

What sets the OCUS benchmark apart from the others is that those that run it are encouraged to send the results in so that they can be posted on the OCUS web page. For this reason, the OCUS results are also the most current among the benchmarks. Usually, new results are posted on a weekly basis. BENCH99 is updated yearly. SPECapc is updated several times a year. The most refreshing thing that you will see are test results for the systems you all build, not just the pre-configured Alpha and Xeon workstations but things like home built AMD K6 and Athlon systems as well as tweaked out Pentium III systems.

The other benchmarks post only results performed by manufacturers whose systems are supported by PTC. They are mainly showcases for the new product offerings by the major workstation manufacturers. How often is it that you have the opportunity to compare your home built system to something manufactured by the NTSIs and SGIs of the industry? The webpage of Olaf Corten, the creator of the OCUS benchmark, provides you with this very opportunity thus making OCUS R20 truly “the [Pro/E] benchmark you can do yourself!” as quoted from Corten’s site.

However, the OCUS does have its drawbacks. The most glaring of which is the lack of any common testing configurations and procedures. Visiting the OCUS benchmark results page illustrates one major flaw with the way the benchmark results are displayed, little more than the CPU and memory size is ever disclosed.

Any number of things can effect the results of a benchmark of this type. Among them, differing software builds, screen resolutions, operating systems and video cards. The result is a lot of information that is not in a particularly useful format. Even with these drawbacks, the OCUS R20 benchmark has become a very valuable decision making tool for both the Pro/E user and system administrator.

The OCUS R20 benchmark gave us the opportunity to rectify some of the downsides to the reporting of scores, since the benchmark is open for public use. We immediately went to work, setting up a few mid-rage NT workstations and outfitting them with everything from Intel’s Celeron and Pentium III up to AMD’s Athlon and Kryotech’s SuperG system with an Athlon running at 1000MHz.

We ran all of our tests at the exact same settings, under the same configuration for each setup, and thus set the AnandTech standard on how we were going to run the OCUS R20 benchmark for Pro/E. The results are easily comparable to systems from other Pro/E users simply by following the configuration and settings referenced in our table documenting The Test.

If any Pro/E users would like to compare their systems to the systems that we benchmarked in this comparison, be sure to use the same amount of memory, run at the same resolution, and use the same video cards that we used in the tests. If you take care to make sure that these variables remain as close as possible to ours, then you should have no problem making a direct comparison between your benchmarks and what we’re running in house.

By documenting all of our test settings we hope to further promote the use of the OCUS R20 benchmark as a standard method of comparing performance in Pro/E as well as promote standard configurations under which to run the tests in order to make comparing scores across multiple systems and platforms much easier.

Before we get to the test description and performance benchmarks themselves let’s take a look at the specifics of the OCUS R20 benchmark.

The OCUS R20 Test Descriptions

As we mentioned above, the OCUS R20 benchmark consists of 17 tests which are broken into 4 sub-categories, CPU, Graphics, GUI and Disk. Below are descriptions of the 17 tests, including what sub-category they belong to and how many times the particular test is repeated (number in parentheses).

1. Wireframe Redraws (160 FRONT-DEF)

This test causes the display of the part to switch between the 2D front view and a 3D isometric view. The task is performed 160 times in wireframe mode. Wireframe mode is like looking at a part that is completely transparent, while seeing all of the edges. This mode is difficult to work in because you can’t really tell if you are looking at the front of back of the part. But it is the fastest to work in because there is no need to remove or gray out hidden lines as in the No Hidden and Hidden line modes. Although this test is included in the Graphics subtotals, it can be argued that this is mainly a CPU dependent task.

2. Shaded Mouse Spins (90)

This test causes the part to be shaded and spun into 90 different positions. This test is included within the Graphics subtotals.

3. Shaded Redraws (400 FRONT-DEF)

This test is similar to test #1. The display of the part is switched between the 2D front view and a 3D isometric view. The task is performed 400 times in Shaded mode. This test is included within the Graphics subtotals.

4. Shade Calculations (80)

Although displaying a shaded image is mainly a graphics dependent function, creating the initial shaded image requires CPU dependent calculations and therefore is included within the CPU subtotals. This test performs the shade calculations 80 times.

5. Regenerations (8)

This test causes the part to regenerate all 148 of it’s features and is repeated 8 times. This test is included within the CPU subtotals

6. Menu Redraws (200)

This test is included in the GUI subtotal and goes through 200 menu picks to test how fast the menus can be retrieved and drawn.

7. Saves (60)

This test is the only test in the Disk subtotal section. It causes the part to be saved 60 times to the hard drive. But before the part is saved, the screen is redrawn in wireframe. Since wireframe redraws are mainly CPU dependent, this isn’t really a good test to measure disk performance. This will be fixed in future benchmark scripts.

8. Dialogue Box Redraws (300)

This test is similar to test # 6 and is included within the GUI subtotals. A dialogue box is displayed and then redrawn 300 times to test how fast it can be redrawn.

9. Wireframe Mouse Zooms (180)

While it can be argued that wireframe redraws ( test #1) are mainly a CPU dependant task, wireframe mouse zooms are most certainly dependent on graphics. In this test, the part is displayed in a 3D isometric view and zoomed in and out 360 times. This test is included within the Graphics subtotals

10. Patterned X-Section Creation

This test causes a cross section of the part to be created and then patterned. This test is included within the CPU subtotals

11. Family Table Verify

This test causes the part to regenerate each of it’s ten family table instances. Each instance of the generic test part varies in it’s overall depth so that ten variations of the test part are created and placed within memory. This test is included within the CPU subtotals

12. Assembly Creation

This test causes an assembly to be created by assembling the part to the assembly coordinate system several times. This test is included within the CPU subtotals.

13. Explode Translations (15)

This test causes the assembly to be exploded as it moves the assembly components about the screen 15 times. It is included within the Graphics subtotals.

14. Screen Updates (20)

This test repaints the screen 20 times and is included within the Graphics subtotals.

15. Model Tree Expansions (50)

This test causes the model tree to expand to show all of the features of each assembly component. Then the model tree reverts back to it’s normal state. This test is performed 50 times and is included within the GUI subtotals.

16. Automatic Regenerations (20)

This test regenerates the assembly 20 times and is included within the CPU subtotals.

17. Perspective Views (20)

This test creates a perspective view 20 times and is included within the CPU subtotals.

Screen shot of the benchmark in action

The Test

Windows NT SP6 Test System
Hardware
CPU(s)	Intel Pentium III 800 Intel Pentium III 700 Intel Pentium III 600E Intel Pentium III 500E	Intel Celeron 500 Intel Pentium III 500	Kryotech SuperG AMD Athlon 1000MHz AMD Athlon 800 AMD Athlon 700 AMD Athlon 600 AMD Athlon 500
Motherboard(s)	AOpen AX6BC Pro-II		Gigabyte GA-7IX
Memory	128MB PC133 Corsair SDRAM x 3 (384MB)
Hard Drive	IBM Deskstar DPTA-372050 20.5GB 7200 RPM Ultra ATA 66
CDROM	Phillips 48X
Video Card(s)	NVIDIA Quadro DDR Reference Board 3DLabs Oxygen GVX1 AGP
Ethernet	Linksys LNE100TX 100Mbit PCI Ethernet Adapter
Software
Operating System	Windows NT4 SP6
Video Drivers	NVIDIA Quadro DDR - Detonator 3.65
Video Resolution	1280 x 1024 x 32 @ 75Hz
Benchmarking Applications
Professional	OCUS R20 for Pro/E Revision 20

Quick CPU Comparison Chart

CPU "Codename"	Intel Pentium III "Coppermine"	Intel Pentium III "Katmai"	Intel Celeron "Mendocino"	AMD Athlon "K75"	AMD Athlon "K7"	Kryotech SuperG (AMD Athlon)
Core Frequency	500 - 800MHz	450 - 600MHz	300 - 500MHz	750 - 800MHz	500 - 700MHz	1000MHz
L1 Cache Size	32KB	32KB	32KB	128KB	128KB	128KB
L1 Cache Speed	Core Frequency	Core Frequency	Core Frequency	Core Frequency	Core Frequency	Core Frequency
L2 Cache Size	256KB	512KB	128KB	512KB	512KB	512KB
L2 Cache Speed	Core Frequency	1/2 Core Frequency	Core Frequency	2/5 Core Frequency	1/2 Core Frequency	2/5 Core Frequency
Physical Interface	Slot-1, Socket-370	Slot-1	Slot-1, Socket-370	Slot-A	Slot-A	Slot-A
Bus Protocol	GTL+	GTL+	GTL+	EV6	EV6	EV6
System Bus Frequency	100MHz 133MHz	100MHz	66MHz	100MHz x 2	100MHz x 2	100MHz x 2
System Bus Bandwidth	800MB/s 1.06GB/s	800MB/s	528MB/s	1.6GB/s	1.6GB/s	1.6GB/s
Memory Bus Frequency	100MHz 133MHz	100MHz	66MHz	100MHz	100MHz	100MHz
Memory Bus Bandwidth	800MB/s 1.06GB/s	800MB/s	528MB/s	800MB/s	800MB/s	800MB/s

	Total Score
Sorted by Total Score	sec (lower is better)
Kryotech SuperG Athlon 1GHz	341
Intel Pentium III 800E	362
AMD Athlon 800	374
Intel Pentium III 700E	394
AMD Athlon 700	405
Intel Pentium III 600E	443
AMD Athlon 600	452
Intel Pentium III 500E	511
AMD Athlon 500	523
Intel Pentium III 500	571
Intel Celeron 500	650

The Total Score from the OCUS tests illustrates how the test system faired overall, and to no surprise, the 1GHz Athlon came in at the top of the list. The surprising element here is that it is only about 6% quicker than the Pentium III 800 which is a much cheaper solution. Keep in mind that this isn't a Pentium III 800 on an i820 motherboard with thousands of dollars of RDRAM, it's a regular 800 running at a 100MHz FSB on a BX board with SDRAM.

Clock for clock, the Pentium III manages to beat out the Athlon by about 3%. According to most Pro/E users, a performance difference of 3% isn't huge but we've come to expect the Athlon to come out on top in professional level tests such as the OCUS. The old Pentium III 500 (512KB L2) and the Celeron 500 come in as the slowest two out of the group, and fall noticeably behind the Athlon 500. It is the newer Pentium III E CPUs that the Athlon is falling short of.

What factor is causing the Pentium III to come out ahead on a clock for clock basis? Let's take a look at the breakdown of the scores to see where that 3% performance advantage is coming from.

	CPU TOTAL	Shade Calculations (40)	Regenerations (8)	Patterned X-Section Creation	Family Table Verify	Assembly Creation	Automatic Regenerations	Perspective Views
Sorted by CPU Score	sec	sec	sec	sec	sec	sec	sec	sec
Kryotech SuperG Athlon 1GHz	148	22	26	22	15	27	15	21
AMD Athlon 800	165	24	29	24	17	31	17	23
Intel Pentium III 800E	166	26	28	23	18	30	18	23
AMD Athlon 700	181	27	31	26	19	33	19	26
Intel Pentium III 700E	184	28	31	26	20	33	21	25
AMD Athlon 600	206	30	36	30	22	37	22	29
Intel Pentium III 600E	208	32	35	28	24	36	23	30
AMD Athlon 500	241	35	42	34	27	42	27	34
Intel Pentium III 500E	245	36	42	34	28	42	29	34
Intel Pentium III 500	266	38	46	35	30	46	30	41
Intel Celeron 500	299	45	52	46	31	52	30	43

Here we have a different picture. On a clock for clock basis, the Athlon outperforms the Pentium III by about 1 - 2%. While this isn't a huge advantage, it is definitely a change from the 3% Intel advantage we saw in the overall scores.

The old Pentium III and Celeron 500 still lag far behind the rest of the contestants making it clear that the faster L2 cache of the newer Pentium III Es and the wider L2 cache bus (256-bit vs 64-bit) come in very handy in Pro/E.

	GRAPHICS TOTAL	Wireframe Redraws (160 FRONT-DEF)	Shaded Mouse Spins (2100 positions)	Shaded Redraws (400 FRONT-DEF)	Mouse Zooms (360)	Explode Translations	Screen Updates
Sorted by Graphics Score	sec	sec	sec	sec	sec	sec	sec
Intel Pentium III 800E	112	25	18	13	15	20	21
Kryotech SuperG Athlon 1GHz	113	25	18	16	15	20	19
Intel Pentium III 700E	121	28	17	15	15	22	24
AMD Athlon 800	123	28	19	17	16	22	21
AMD Athlon 700	132	31	19	17	17	24	24
Intel Pentium III 600E	133	32	17	15	16	26	27
AMD Athlon 600	142	35	19	18	16	27	27
Intel Pentium III 500E	150	37	19	17	15	30	32
AMD Athlon 500	161	40	21	19	17	32	32
Intel Pentium III 500	174	47	19	18	15	36	39
Intel Celeron 500	201	53	24	24	20	39	41

Here is the shocker. At similar clock speeds, the Pentium III E outperformed the Athlon by about 10% in the graphics tests. Because of this, the 800MHz Pentium III managed to take the lead over the 1GHz Athlon. While Intel's SSE instructions/SIMD FP optimizations could be at work here the fact that the regular Pentium III 500 is beaten by AMD's Athlon 500 indicates that the faster L2 cache and wider L2 cache bus of the Pentium III E is providing for a large amount of the performance advantage held by the CPU.

	GUI TOTAL	Menu Redraws (200)	Dialogue Box Redraws (300)	Model Tree Expansions
Sorted by GUI Score	sec	sec	sec	sec
Intel Pentium III 800E	41	12	16	13
Kryotech SuperG Athlon 1GHz	42	13	18	11
Intel Pentium III 700E	43	12	17	14
AMD Athlon 800	43	12	18	13
AMD Athlon 700	47	14	19	14
Intel Pentium III 600E	49	14	18	17
AMD Athlon 600	55	16	22	17
Intel Pentium III 500E	55	15	21	19
AMD Athlon 500	61	18	24	19
Intel Pentium III 500	69	19	26	24
Intel Celeron 500	80	23	31	26

Once again the advantage goes to the Pentium III E here. Combined with an overall advantage in the Graphics tests, these two categories account for the Pentium III E's 3% advantage in the total score. The advantage here could be a combination of a fast L2 cache and SSE/SIMD FP optimizations, but the most obvious explanation comes from the faster L2 cache/wider L2 cache bus width.

For comparison purposes we included two more video cards in our tests, the Diamond FireGL1 and 3DLabs' Oxygen GVX1 which are fairly popular cards in the high end market. The DDR Quadro we used in the tests and the FireGL1 performed quite similarly in the tests, while 3DLabs' GVX1 fell behind in the tests by a considerable degree.

In an effort to answer the question of how much L2 cache is enough, we disabled the L2 cache on both the Celeron and the Pentium III E to see the after effects.

Taking away the Celeron 500's 128KB of full speed on-die L2 cache resulted in a 71% drop in overall performance as is measured by the Total Score of the OCUS benchmark. This is the first very obvious indication that having a high-speed L2 cache makes a very noticeable difference in performance.

Doing the same to the Pentium III E which is outfitted with twice as much full speed on-die L2 cache as the Celeron resulted in a similar drop of 77%. This suggests that the Celeron's 128KB L2 cache may be all that Pro/E needs, at least if your usage patterns can be accurately categorized by the OCUS benchmark. In both cases, the Graphics and GUI tests suffered significantly more than the CPU tests. This leads to the conclusion that the CPU tests must be less sensitive to L2 cache size than the Graphics and GUI tests.

With the L2 cache disabled on both the Pentium III 500E and the Celeron, the results end up displaying the advantage offered by the 100MHz FSB over the 66MHz FSB. An overall improvement of 18% is quite impressive, with the graphics scores receiving the biggest improvement of 21% as a result of a move to the 100MHz FSB.

Moving to the 133MHz FSB should yield an additional improvement of 12% or so. While we didn't test the Pentium III on the i820/i840 platforms, the combination of the faster memory bus and the higher bandwidth of RDRAM should give the i820 + Pentium III 800 setup a very noticeable advantage under Pro/E. We will be taking a look at the i820/i840 under Pro/E in the future.

How much of a performance improvement does the Pentium III E offer over the original Pentium III? According to the total score comparison, an 11% improvement is what can be expected from simply moving to the Pentium III E. This indicates that Pro/E prefers the faster/wider L2 cache of the Pentium III E to the larger albeit slower/narrower L2 cache of the original Pentium II.

Conclusion

If you must have the fastest Pro/E system out today, then the Kryotech SuperG does come out to be the fastest overall Pro/E performer, at least out of our roundup. Even in comparison to OCUS benchmarks published at Corten's site, it seems as if the SuperG is capable of toppling even the Alpha 21264 running at 667MHz. Is it worth the added cost however? That's up to the individual user to decide, but from our experience the answer would have to be a plain, no.

The Pentium III at 800MHz comes very close to the 1000MHz SuperG in the tests and makes it very difficult to justify the added cost of the Kryotech system just to achieve the fastest performance under Pro/E. In overall performance the Athlon is on the heels of the Pentium III E, but falls short as the faster L2 cache and the wider L2 cache bus of the Pentium III E give Intel the advantage.

The OCUS benchmark is yet another perfect wake-up call that brings to attention the fact that although the Athlon could easily compete against and topple the old Pentium III, the newer Pentium III is giving AMD some serious competition in certain situations. If AMD is falling behind by 3% in overall Pro/E performance with an L2 cache running at 2/5 of the core clock, when the Thunderbird hits with its full speed L2 cache, AMD should be able to pull very far ahead in the Pro/E world.

From a current standpoint, the Athlon makes for an excellent Pro/E workstation. An overclocked Athlon 500 would be both a cost effective and a high performing solution for Pro/E users that would also be able to run the entire library of x86 software unlike the 21264. If you're more of an Intel fan, then Intel's FC-PGA Pentium III (Currently available in 500 and 550MHz parts) would make for the perfect solution. With the 500E and 550E being very strong overclockers, and neither one retailing for more than $400, these nice overclockers would make for a pretty fast Pro/E workstation without the incredible cost of going with a true 600MHz+ Pentium III E.

The difference between the Athlon and the Pentium III E under Pro/E is negligible, a conclusion we wouldn't have expected to come to when we first saw the Athlon last August. Intel's Pentium III E will be a thorn in AMD's side until they can free themselves of the barrier that their external L2 cache forms around the Athlon. For a Pro/E user looking for a good x86 solution, both the Athlon and Pentium III E are good solutions, just remember to opt for one of these two instead of the older Pentium IIIs or the Celerons, they are worth the added cost.

Special thanks goes to Daniel Kroushl for co-wrting this article and Olaf Corten for creating the OCUS R20 benchmark

Pro/ENGINEER Performance of High End x86 CPUs (Athlon vs P3)

The Need

Pro/E’s Demands

Pro/E Benchmarks

The OCUS R20 Test Descriptions

Log in

Don't have an account? Sign up now