Name: Matrox's Comeback Kid - The Parhelia-512
Item: Matrox's Comeback Kid - The Parhelia-512
Author: Anand Lal Shimpi

Original Link: https://www.anandtech.com/show/911

Matrox's Comeback Kid - The Parhelia-512

VIEW ARTICLE

by Anand Lal Shimpi on May 14, 2002 9:00 AM EST

Posted in
GPUs

0 Comments

It happens in all walks of life but especially in our own beloved industry it is the case that people love to root for the underdog. Whether it's AMD vs. Intel or Intel vs. VIA, there is almost a certain level of compassion that is felt for the company without the upper hand. There isn't a sector where this is any more prevalent than in the graphics business where at one point in recent history there were as many as six major competitors but now only two.

When 3dfx was king it seemed like everyone was on a hunt to find the Voodoo2-killer that would offer greater performance at a comparable price. And when NVIDIA finally did that it didn't take long for the mob to turn on today's giant. It was around the release of the GeForce2 that we started to see the first signs that NVIDIA was getting a bit too big to be considered the underdog anymore, and we all know what happens then.

Since that point many companies have tried to dethrone the undisputed king of the 3D graphics but none have as of yet succeeded; and it's understandable why. With three design teams working in parallel, employees from some of the most talented graphics firms in the industry (3dfx, Appian, Matrox, PixelFusion, etc…), an extremely high employee retention rate (over 95% employees have been with the company for the past 5 years), a gifted set of software engineers (there are more software than hardware engineers at NVIDIA) and an incredible amount of capital it is very clear how NVIDIA is able to stay on top.

Is NVIDIA worried about the advances from companies like 3DLabs with their P10 VPU? Of course, there's always reason for concern and any successful company is always mindful of the competition but there's very little chance for any of the competing parts being released over the next couple of months to take significant market share away from NVIDIA. What can happen however is that during the months between NVIDIA's product releases there is usually opportunity for a company to get their foot in the door with a well-timed product launch.

ATI exploited such an opportunity with the Radeon 8500 and much more successfully later on with the Radeon 8500LE 128MB. 3DLabs is taking advantage of the gap as well with their P10 VPU and as you can all probably guess by now, the other company with plans to do the same is Matrox.

Matrox has been very quiet on the 3D front for two years now but they've always maintained that high-performance 3D graphics is too lucrative of a business to simply ignore. A few weeks ago Matrox invited me to fly out to San Francisco for a meeting that would last no more than 90 minutes (a flight that spans the length of the country for this North Carolina based editor). I agreed to the meeting under the conditions that what I'd be seeing would be worth the trip out there and Matrox assured me that it would be. AnandTech has had a working relationship with Matrox for more than four years and never during those four years has Matrox ever exuded such confidence in a product before.

What is this product? It's Matrox's comeback kid, the Parhelia-512.

What is a Parhelia?

The first question everyone asks when they hear the word is "What is a parhelia?" A parhelia is a natural phenomenon that occurs when light is reflected and refracted by ice crystals in the air. The resulting effect, generally referred to as a halo can take the form of a parhelia when one or two false suns can appear in the sky. The parhelia effect also goes by the name sundog however the Matrox Sundog is definitely not a very marketable name for a GPU and thus Parhelia was chosen. The fact that one of Matrox's senior employees happens to be a bit of a space-buff help secure the Parhelia name for their return to the performance 3D market.

Luckily (for both Matrox and the end users), the Parhelia-512 is in no-way a derivative of the G400 core. The Parhelia-512 is the result of over two years of work by Matrox engineers, after missing many of their competitor's product cycles the Parhelia-512 is ready for its public debut.

Before we analyze the architecture of Matrox's first true GPU let's have a look at its specs:

- 0.15-micron GPU manufactured at UMC
- 80 Million transistors
- 4 pixel rendering pipelines, can process four textures per pipeline per clock
- 4 programmable vect4 vertex shaders
- 256-bit DDR memory bus (up to 20GB/s of memory bandwidth w/ 312.5MHz DDR)
- up to 256MB of memory on board
- AGP 4/8X support
- Full DX8 pixel and vertex shader support

As usual, we're light on the specs here so we can dive into them in greater detail when appropriate throughout the article.

Just as we did in our 3DLabs P10 VPU overview we will take you through the 3D rendering pipeline of the Parhelia-512 and explain its intricacies. In order to get the most out of this review we'd strongly suggest you read this quick one-page explanation of what's going on in the 3D pipeline.

The Parhelia Pipeline

Click to Enlarge

We already explained where the Parhelia part of the name comes from but what about the '512' suffix? Luckily this is a bit easier to explain; as you'll soon see, there are a number of situations where the number 512 appears when looking at the architecture of the Parhelia. Matrox's decision to name the GPU the Parhelia-512 is akin to NVIDIA calling the NV10 the GeForce256.

At the very start of the Parhelia-512's pipeline (after the AGP interface) you have the vertex processors. Matrox has outfitted the Parhelia-512 with four vertex shader units that they refer to as "128-bit Vertex Shader Engines." If you multiply 128 by the 4 units you'll get that magical 512 number. The 128-bits comes from the fact that each one of these units can work on four 32-bit floating point numbers at the same time provided that they are packaged as a 4-operand vector. Each one of these vertex shader engines is comparable to each one of the two vertex units engines in the GeForce4 or four of the vertex processors in the 3DLabs P10. This means that right off the bat, the Parhelia-512 has twice the vertex throughput of the GeForce4 at equivalent clock speeds.

The Parhelia's vertex shader units are fully DX8.1 compliant and offer a bit more flexibility than even DX8.1 requires which is why you will see them referred to as Vertex Shader 2.0 compliant (DX9). This flexibility is useful however from a developer's perspective, unless the entire pipeline (vertex and pixel portions) is DX9 compatible then there is not much added value. Remember that the entire point of the vertex shaders is to prepare the vertices for operations to be performed on them by the pixel shaders, if the pixel shaders aren't as flexible/programmable or if they are not also floating-point units then you can improve the vertex shaders all you'd like and you'd still be bottlenecked by the pixel shaders.

After the vertices come out of the vertex shader engines they are fed to Matrox's primitive engine that begins assembling the triangles and removing vertices that don't fit within the boundaries of the screen. This is where you'd normally find a fairly powerful occlusion culling logic to remove data that won't be seen by the user before actually rendering the pixels. For example, the Radeon 8500's HyperZ and the GeForce4's Visibility Subsystem engines come into play here. However, in a tradeoff that Matrox made to offer some of the other features of the Parhelia-512 the GPU does not have a system nearly as elaborate as any of their competitors.

The Parhelia-512 does have "Fast Z-Clear" logic that is used to quickly set the Z-buffer to an array of all zeros much like competing ATI and NVIDIA GPUs. Unfortunately the GPU does not have any comparable occlusion culling technologies or none that are nearly as advanced as ATI's Hierarchical Z or Z-Compression. With the amount of memory bandwidth that the Parhelia-512 can offer the lack of any elegant memory management technology isn't too big of a problem until you start getting into more complex games. If you are applying an extensive pixel shader program (50 - 100 instructions) to pixels and spending valuable clock cycles in doing so, if the pixel ends up never being displayed then a significant portion of your execution resources have been wasted. This will become much more common as games that take advantage of DX8 pixel shader functions become available and can become an Achilles' heel of the Parhelia architecture.

The Parhelia Pipeline (continued)

Moving on down the pipeline we have the four pixel rendering pipelines of the Parhelia-512. This quad-pipeline approach is similar to what NVIDIA introduced with the GeForce256 or ATI with the Radeon 8500 or even 3DLabs with the P10; it's a choice that makes sense and thus Matrox has stuck to it. Where Matrox does differ from the competition is in the Parhelia's ability to process four textures per pipeline per clock as opposed to two in all competing products. By being able to process four textures per pipeline per clock the Parhelia-512 can offer significantly higher performance in next-generation games that make heavy use of multiple-textures. This is a safe bet by Matrox since it's much easier for a developer to use more texture layers than it is to use pixel shader programs given the complexity of writing pixel shader programs and the very few cards in the hands of users with full DX8 pixel shader compliance. This will change in the future but for now it does make a lot of sense.

Each one of these "quad texturing units" is flexible enough to allocate processing resources depending on the application at hand. For example, in a predominantly dual-textured game such as Quake III Arena, the Parhelia-512 can use the unused texturing resources to perform 8-tap anisotropic and trilinear filtering at virtually no performance hit; granted that this is more of a feature for today's games than tomorrows.

In each one of the quad texturing units the texture coordinates are calculated, the textures are loaded and filtered and finally the pixels are sent on to the pixel shaders of the Parhelia-512.

These pixel shaders are no more programmable than what's in the GeForce4 meaning that they are still effectively register combiners and not fully programmable. At the same time they work on integer data and not 32-bit floating point values which is required for DX9 compliance. The reason the Parhelia cannot claim these two key features is because of, once again, a lack of die-space. As the chip is built on a 0.15-micron process with 80 million transistors, Matrox had to make a number of tradeoffs in order to pack excellent performance under current and future DX8 applications; one of those tradeoffs happens to be pixel shader programmability. Just as 3DLabs mentioned to us during our P10 briefing, in order to make the 3D pipeline entirely floating-point you need to be on at least a 0.13-micron process which won't be mature enough (at TSMC at least) until this fall to use for a mass production GPU.

Compared to a GeForce4, the Parhelia-512's pixel shading stage is superior in that it has five pixel shader stages in each rendering pipeline (compared to the GeForce4's two). This gives the Parhelia-512 the ability to multipass much less frequently than the competition as it is not only able to process 5 pixel shader operations in a single pass per pipeline but it can also process 10 pixel shader operations across two pixel pipelines in a single pass if necessary. And as you know, the fewer passes made the more bandwidth and resources are conserved.

That brings the basic 3D pipeline of the Parhelia to an end before the data is finally sent out to the 256-bit DDR memory bus (256 x 2 equals that magic 512 number again). But there are two very important parts of the extended pipeline that we haven't mentioned yet so let's tackle those next.

Hardware Displacement Mapping

Ever since the G200 it seems as if Matrox always has one feature that is very compelling and has a lot of promise that is supported by their hardware yet by the time developers have caught on to the technology the time for the card has been long gone. Everyone will remember that one screenshot of the water in Expendable that put Environmental Mapped Bump Mapping (EMBM) and the Matrox G400 on everyone's radar. The truly neat technology of the Parhelia-512 is what Matrox calls Hardware Displacement Mapping (HDM).

A detailed terrain is made using more polygons.

The goal behind HDM is to generate more realistic 3D environments and characters by using more geometry (more vertices) but doing so in the most simple/compact manner possible. Let's say you were trying to provide a detailed 3D map of the surface of Mars; in order to make the map more detailed and more realistic, you'd simply increase the polygon count of the scene to help model every little detail of the surface of the planet. Now if you're just staring at this scene then you can dramatically increase the polygon count without worrying too much but if this environment were a level in Unreal Tournament 2003 then you'd run into a serious performance issue if you cranked up the number of polygons too high.

HDM tackles the first problem when attempting to generate more detailed 3D environment and characters - generating more geometric detail. The way this works is by taking a source mesh that features a number of triangles on it and applying a displacement map to it. A displacement map is just like a 2D texture except instead of storing color values at each point on the map it stores displacement values at each point on the map (hence the name). When you apply a texture map to a polygon each pixel within the polygon assumes the color of the corresponding pixel on the texture map. Similarly, when you apply a displacement map to a polygon each pixel within the polygon is displaced by the value of the corresponding pixel on the displacement map. For example, if coordinate (1, 1) on a displacement map indicates a displacement of +10 then the corresponding coordinate on the mesh that is mapped will be 10 units higher. The below graphic can help you better understand this process:

Displacement mapping can also be used on characters; if you have a single character model, applying different displacement maps to the model can change the way the character looks without having to generate multiple models. A displacement map with larger values around the center of the map could give a character a large belly while other manipulations could give a character more rotund thighs.

HDM Continued: Depth-Adaptive Tessellation

The Parhelia-512's HDM engine doesn't stop there however; one of the biggest benefits of this technology is seen with Depth-Adaptive Tessellation. As you'll remember from our article on ATI's Truform technology tessellation takes a low triangle count mesh and divides up the larger triangles into smaller ones to create a much higher triangle count. These smaller triangles can then be used to form better looking curved surfaces since a ball made out of four triangles looks more like a pyramid but make it out of 30 triangles and now you're talking.

The mesh on the right is a tessellated version of the mesh on the left.

Depth-Adaptive Tessellation is a very self-explanatory term but we'll explain it in detail to help you understand its benefits (this image may help). Since the example of a 3D terrain is the easiest to convey we'll use it again. Once the vertices making up a mesh of triangles are sent to the GPU, the mesh is tessellated to increase its detail. However during the tessellation process, depth-adaptive tessellation allows for various levels of detail (LODs) to be defined across the surface of the mesh; the farther away from the user's viewpoint the mesh is, the lower the level of detail is adjusted. Next a displacement map is mapped onto the tessellated mesh and each vertex is displaced by the appropriate value dictated by the map.

Now let's start walking through the scene. As we're standing on one corner of the terrain, only a certain portion of the terrain is clearly visible to us so that portion is defined as having the highest LOD possible (the most tessellated mesh and thus the highest polygon count possible while still performing well). The part of the terrain that isn't visible to us is assigned decreasing LODs that decrease the amount of tessellation and thus the polygon count of the parts of the scene that aren't clearly visible to the user. As you walk through the scene, the LODs change depending on your position and thus the entire environment is adaptively tessellating the base polygon meshes depending on the user's position in the environment. This way you always maximize polygonal detail while not wasting performance on unnecessarily tessellating parts of the environment that don't require great polygonal detail.

This concept is very similar to mip-mapping when it comes to textures but simply applied to displacement maps instead. Matrox has licensed this technology to Microsoft for use in DX9 and you will definitely see other vendors implement similar functions into future GPUs.

A major benefit of HDM is that using technologies such as Depth-Adaptive Tessellation you can produce a very detailed terrain using a low polygon count base mesh and a very small displacement map (multiple KBs in size). This saves traffic across the memory and AGP buses while allowing for extremely detailed scenes to be produced.

The HDM engine in the Parhelia-512 falls in at the very beginning of the pipeline during the vertex setup and shading stages.

Elegant Anti-Aliasing

To put it plain and simple, anti-aliasing (AA) as it is done today is not very elegant at all. The vast majority of pixels that are fed through the anti-aliasing engines in GPUs don't even fall along the edge of a polygon and thus they aren't creating any visible jagged lines. Unfortunately, detecting the edges of polygons and only anti-aliasing those pixels is a very complex thing to do and requires a significant transistor and die space investment. Matrox felt that anti-aliasing was important enough to spend the die space and transistor count on it and thus they came up with an intelligent edge-AA scheme they call Fragment Antialiasing (FAA).

The Parhelia's FAA engine determines an edge pixel by looking at the pixel and dividing it up with 16x sub-pixel accuracy. Each pixel is divided into 16 parts and the colors of each of the parts is compared to determine whether the pixel is not covered, fully covered or partially covered. All pixels determined to be partially covered (and thus on the edge of a triangle) are sent to the FAA unit and are then anti-aliased. The 16x sub-pixel accuracy is where Matrox comes up with the 16x FAA name for the technology.

The beauty of this method is that the pixels that aren't on the edge of a polygon are written to the frame buffer without being sent through the FAA engine thus saving precious bandwidth. According to Matrox, what they call "fragment pixels" account for less than 5 to 10% of the pixels in a scene; you can imagine the type of performance savings you get by employing this fragment AA scheme.

Because of the way FAA works, the higher you crank up the resolution the fewer fragments pass through the FAA engine resulting in a lower performance hit at higher resolutions. This is clearly the way AA should be done from a pure performance and efficiency standpoint; we'll reserve judgment on the quality until we spend more time with a card but from what we saw during our briefing with Matrox, it is definitely comparable to offerings from ATI and NVIDIA.

Unfortunately all isn't perfect with the Parhelia's FAA engine; there are situations where the fragment detection doesn't work perfectly and the result are annoying artifacts in the game. For those situations your only option is to either turn off FAA or resort to Matrox's supersampling AA algorithm which will take a performance hit similar to that of the Radeon 8500. There is no way to predict under what games FAA will result in artifacts but it does happen. Matrox mentioned to us that out of approximately 40 games they tested, around 5 - 7 exhibited artifacts with FAA enabled. As you can expect, we'll attempt to compile a list as soon as possible to help point out what games do and don't work properly with FAA enabled.

Still the king of Image Quality

Matrox has always been held in high-regard for the excellent analog output of their graphics cards. Although many G400 users have since moved on to faster performing graphics cards, most of them miss the crisp display output of their beloved cards. In fact, the only users that don't are those lucky enough to have DVI displays that deal with a digital signal from the GPU where quality is not lost. But for the vast majority of users, analog displays are still an unfortunate reality.

We've explained in previous articles why the analog signal output is generally poor at higher resolutions but Matrox has done some of their own testing to help explain exactly what the limitations of competing graphics cards are.

When the video output signal leaves the RAMDAC it is sent through a series of low-pass filters. As their name implies, a low-pass filter allows low frequency voltages to pass through the filter while preventing higher frequencies from getting through. These filters accomplish two things: 1) they help meet FCC regulations by making sure that only the necessary frequencies get through the VGA output, and 2) they make certain that higher frequency signals do not adversely affect the lower frequency signals that actually matter.

A low-pass filter is generally made up of passive components such as resistors, capacitors and inductors. Because of this a low-pass filter cannot amplify a signal, it can only act as a gatekeeper - allowing certain frequencies to pass and restricting others.

An ideal frequency response curve

The highest frequency that can pass through a low-pass filter is known as the cutoff frequency. Unfortunately a simple low-pass filter isn't perfect; if you set a cutoff frequency at 400MHz you will get frequencies higher than 400MHz passing through. There are two ways of combating this; you can either set the cutoff frequency slightly under the actual frequency you want to cut off at or you can use a higher order filter.

The simplest (and cheapest) is the first approach, setting the cutoff frequency a bit lower than you actually want to cut off at. This results in the following phenomenon in comparison to the ideal frequency response shown above:

The more expensive approach is to use a higher order filter. The order of a filter is directly determined by the components that make up a filter. Without getting into the actual requirements for filters of various orders (it is determined by the (j*w) terms in your transfer function - Vout/Vin) just think of it this way, the more of a certain type of passive component you have in a specific configuration, the higher the order of the circuit.

The benefit of a higher order filter is that the dropoff after hitting the cutoff frequency is much steeper. This means that less of the frequencies you don't want getting through actually make it through the filter. It's a much more elegant approach yet it is more expensive since you have to use more PCB space and more inductors and capacitors.

Parhelia's Secret - 5th Order Filters

Most GeForce3/4 cards use a 3rd order filter on the analog outputs. If you're performing the modding trick you're actually reducing the order of the filter from a 3rd order filter down to a 2nd order filter, causing the following phenomenon:

The 3rd order filter has a steeper slope, but a similarly designed 2nd order filter can actually let more frequencies in

In this case, the move down to a 2nd order filter actually helps since you're letting more of the higher frequencies (at higher resolutions) through without changing the cutoff frequency.

The plethora of inductors and capacitors shown above make up the Parhelia's 5th order filters

The Parhelia-512 cards themselves will be outfitted with carefully designed 5th order low-pass filters on the outputs. This will dramatically improve image quality at higher resolutions, especially those closer to the cutoff frequency. Below is an example of how a 5th order filter can compare to the hypothetical 2nd/3rd order filters shown above:

Parhelia vs. GeForce4 vs. Radeon 8500 - Image Quality

Matrox has performed some of their own tests on the Parhelia's analog signal output in comparison to a PNY GeForce4 and an ATI Radeon 8500 card. The results are very interesting:

You can use this table to help you understand what frequencies various resolutions correspond to:

Signal Frequencies for Resolutions @ 85Hz
Resolution	Frequency
640x480	37MHz
800x600	58MHz
1024x768	95MHz
1280x1024	159MHz
1600x1200	233MHz
1920x1440	336MHz
2048x1536	382MHz

Here you can see that both the GeForce4 and the Radeon 8500 do not exhibit a uniform voltage at all of the frequencies. It is important to note that since there seems to be an amplification of the output voltage this graph isn't entirely the result of a poorly designed low-pass filter circuit on the Radeon 8500 and GeForce4 cards. It could very well be that in order to attempt to compensate for cheaper filter circuitry the RAMDACs output higher voltages at higher frequencies on purpose. The point to take away from this chart is that the Parhelia's voltage output remains relatively constant regardless of frequency. But this doesn't tell you much about image quality, luckily Matrox provided some other interesting graphs:

This graph shows the transient rise time of the three cards; what you should look for here is how long it takes the voltage to settle once it has reached the appropriate voltage level. The Parhelia-512 card settles almost immediately, while the Radeon 8500 never really recovers after its initial overshoot. A non-stable signal here will result in blurry text at these high resolutions.

We're going to try and reproduce Matrox's results on our own for the entire line of ATI and NVIDIA cards we have in house to do a true image quality comparison among video cards for a future article.

Triple Head & Surround Gaming

Since they were the pioneers of mass-market desktop multimonitor solutions, it should come as no surprise that the Parhelia-512 shines (no pun intended) when it comes to multimonitor functionality.

The chip itself features two 400MHz RAMDACs that have been optimally positioned on the GPU's die itself to allow for proper shielding from other high speed components on the GPU. The card will also feature 2 external TMDS transmitters for dual DVI outputs on all cards (the cards will ship with two DVI-to-VGA adapters). A third RAMDAC clocked at around 230MHz is used for a third output (up to 1600x1200) to enable Triplehead output and something Matrox likes to call Surround Gaming.

Surround Gaming is very simple for a game developer to support (it currently works on all Quake III engine games and Epic is making it work with Unreal Tournament 2003) and it enables you to increase your field of view angle and effectively gain peripheral vision through the use of two flanking monitors alongside your main display.

The effect is fairly realistic; say you're walking through a door in a first person shooter, you'll be able to see, out of the corner of your eye, impending doom as an opponent waits beside the door with a shotgun. This is made possible through the increased FOV and the fact that you have monitors on either side of your main monitor that help act as your eye's peripheral vision in a game.

The feature is mostly a novelty one considering the relatively few people have three monitors or the deskspace for three of any significant size. The triplehead output can be useful for professionals that still don't have enough desktop space with two monitors.

10-bit Color Components

While 3DLabs' announced support for 10-bit color components (10:10:10:2 RGBA) they also announced support for 16-bit color components (as well as a programmable depth) to enable 64-bit color. The benefit of 10-bit color components vs. the present-day 8-bit components can be seen in certain situations and the fact that Matrox coupled the output with 10-bit DACs to offer higher fidelity image output will definitely make things look better on analog displays but it's clear that the industry is demanding much more than 10-bit components:

"I have been pushing for a couple more bits of range for several years now, but I now extend that to wanting full 16 bit floating point colors throughout the graphics pipeline…so intermediate solutions like 10/12/10 RGB formats aren't a good idea." - John Carmack, April 2000 .plan update

Again the limiting factor here for Matrox is die-space otherwise we'd surely see support for full 16-bit floating point colors throughout the pipelines.

The Parhelia's 10-bit color mode can be enabled through their control panel and it will require a quick reboot to take effect. Unfortunately the mode currently causes MS Word (as well as other applications that don't properly support 2-bit alpha channels) to crash and thus can't be left enabled all of the time.

The card also supports hardware accelerated text anti-aliasing for modes such as Cleartype under Windows XP. Matrox is claiming an impressive performance boost in 2D performance.

Final Words

There is indeed quite a bit of technology behind the Parhelia-512 so let's start summing up the pros and cons:

Matrox's design goal with the Parhelia-512 was to produce the fastest part for the coming generation of DX8 games as possible. Their "bet" was that DX9 games would take just as long as DX8 games took to surface and thus the Parhelia-512 could be brought to market and would perform quite well on present and near-future titles in comparison to NV25/NV30 and R200/R300. We think that Matrox made the best bet they could possibly make in their situation and its probably the right one. This won't give Matrox the market share that NVIDIA enjoys but it will bring them back into the market with a high-performing part. How will the Parhelia-512 perform?

- In "simple" games like Quake III Arena, the Parhelia-512 will definitely lose out to the GeForce4 Ti 4600. By simple we mean games that generally use no more than two textures and are currently bound by fill rate. NVIDIA's drivers are highly optimized (much more so than Matrox's) and in situations where the majority of the Parhelia's execution power is going unused, it will lose out to the Ti 4600. This can change by turning on anisotropic filtering and antialiasing however, where the balance will begin to tilt in favor of the Parhelia.

- In stressful DX8 games, Matrox expects the Parhelia-512 to take the gold - either performing on par or outperforming the GeForce4 Ti 4600. Once again, as soon as you enable better texture filtering algorithms and antialiasing the Parhelia-512 should begin to seriously separate itself from the Ti 4600. The quad-texturing capabilities of the core as well as the 5-stage pixel shaders will be very handy in games coming out over the next several months.

- The Parhelia-512 has the ability to take the short-term performance crown away from NVIDIA.

The main strengths from the Parhelia-512 come from its quad-texturing units, its impressive memory bandwidth and its 5-stage pixel shader pipelines. Features such as Hardware Displacement Mapping and extreme attention to image output quality complete the package. Matrox's Fragment Anti-Aliasing algorithm seems quite promising however that is entirely dependent on what situations result in noticeable artifacts.

There are some limitations to the Parhelia-512 architecture however that cannot go unmentioned:

- On a 0.15-micron process, the Parhelia-512 is a very large chip much like 3DLabs' P10. With a chip this large it will be difficult to attain high clock speeds. The first versions of the P10 are expected to run at ~250MHz but it will take a higher clock to make the Parhelia-512 competitive.

- The lack of any serious Z-occlusion culling technology is a major disappointment. If you've noticed, occlusion culling is something that ATI and NVIDIA are continuing to improve on. The next-generation Radeon and NVIDIA's NV30 will both have extremely sophisticated forms of occlusion culling built into the hardware. This tradeoff can become a killer for Matrox in situations where complex pixel shader programs are applied to pixels that should have been occluded.

- The lack of a fully programmable floating-point pixel pipeline will hurt the Parhelia-512 in the eyes of developers as they start writing for DX9 hardware later this year. This isn't as big of a hit for end users as long as you upgrade your graphics card more frequently than once every two years.

And then there's the issue of price; when it's finally shipping in June the fastest Parhelia-512 cards will carry a price tag of ~$450. There will be cheaper cards that offer lower performance but you shouldn't expect the Parhelia-512 to compete with the GeForce4 Ti 4200 anytime soon.

In the end, the Parhelia-512 has the potential of being the king of the hill between now and the release of NV30; and it is by far the best effort Matrox has ever put forth in the graphics industry. Those wanting extremely high-quality image quality and triple-head output will have nowhere else to turn and this time around they will be able to enjoy high-performance 3D acceleration as well.

However the success of Matrox isn't dependent on the Parhelia-512; as we've seen in the past it is dependent on how well they follow up the technology. The Parhelia-512 cannot turn into another G400 where the market is left for two years without a serious update. Matrox does have a solid product and a winner on their hands but they have to do their best to not only execute it well but execute its successor even better.

Matrox assures us that they have a solid roadmap going forward but we will not see them move to NVIDIA's aggressive 6-month product cycles. In the end, Matrox is a worthy competitor to have back in the game and they couldn't have done a better job making an entrance than with the Parhelia.