Original Link: https://www.anandtech.com/show/735
There is no question that memory bandwidth limitations are on the minds of every major graphics chip producer. Current generation products reflect the fear that the door is quickly closing on the amount of information that can be passed over a conventional video card's memory bus. ATI was first to tackle the memory bandwidth problem with the introduction of their HyperZ technology, technology which proved to provide up to a 40% performance increase in our tests. Although HyperZ was initially found only in the high end Radeon DDR card, it was not long before subsequent Radeon products carried HyperZ technology into nearly every price range. Just recently we saw NVIDIA take similar bandwidth saving steps with their Lightspeed Memory Architecture in the new GeForce3 card; a technology which will surely find its way into lower costing NVIDIA parts in time. Also in the spotlight in past months was 3dfx's Gigapixel technology which promised to bring memory bandwidth requirements down to a fraction of what they are currently. Naturally, NIVDIA's recent purchase of 3dfx assets leaves Gigapixel technology in the hands of NVIDIA.
With all the bandwidth saving technologies currently out there, the truth of the matter is that the optimizations now in place are not even close to the point that prevents the memory bus from acting as the largest bottleneck. Even the 40% speed boost that HyperZ provided was not enough to get the effective fill rates of video cards (the fill rate of a whole product) any where near the promised theoretical fill rate (fill rates of just the graphics processor alone). It is clear that in future products either these optimizations will have to get drastically better or memory technology get exponentially faster in order for video card performance to keep increasing at the rate it has done in the past.
From a manufacturing perspective, it is impossible to count on new memory technology to save the day. Currently there is nothing out there that can inexpensively increase the memory performance of today's fastest DDR memory chips. Although a new, cheap, fast memory technology could theoretically be discovered tomorrow, one can not base a product on technology that does not yet exist.
At the same time, there is a limitation as to how much bandwidth savings can be implemented in the traditional method of rendering. Often times, even slight decreases in memory bandwidth consumption are very difficult to engineer and put in practice. It is for this reason that many in the video card industry are predicting a change, moving us away from the traditional mode of rendering into one where memory bandwidth savings are paramount.
Spearheading the move away from conventional rendering has been Imagination Technology. For years now Imagination has predicted the memory bandwidth crunch that appears to have recently hit home with every major video card manufacturer. Continuing their trend away from conventional rendering paradigms, today STMicroelectronics releases Imagination's latest stab at tile based rendering, this time initially sold under the Hercules name of video cards. Called the Kyro II and promising high-end performance at a fraction of the price, let's see if the latest generation of tile base rendering can outperform an extremely aged rendering platform.
Immediate Mode Rendering: the status quo
Imagination Technology has been in the tile based rendering market for quite some time. Sold under the PowerVR chip name, the first true tile based rendering system that Imagination created was the PowerVR Series 2 chip. The PowerVR Series 2 marked a significant deviation from the traditional immediate mode rendering which has been the mainstay of 3D graphics for over 40 years.
In immediate mode rendering, a video card is programed to render each polygon in a scene without any knowledge of what lies in the scene as a whole. Taking 4 sets of coordinates (x, y, z, and a color value), every forward facing polygon in a scene must be rendered, shaded, and textured. A z-buffer then checks the forward (z) location of the current polygon against every other polygon in the scene. If the z-value of the current polygon is greater than the z-value of the polygons stored in the z-buffer, the current polygon is stored in the frame buffer. If the z-buffer determines that the polygon is actually behind another polygon, the fully shaded and textured polygon that is not visible is thrown out.
Since the check against the z-buffer occurs after the pixel is already shaded and textured, many pixels that will never be visible in the final scene are fully rendered, shaded and textured only to be thrown out. This process, which results in many pixels being processed that will never be seen or used, is called overdraw, and occurs in nearly every scene rendered on an immediate mode renderer. Typical estimates place the current overdraw in many 3D games today at anywhere from 2 to 4, with an average around 3. This means that on average 2 pixels are actually rendered and then thrown out, with only the topmost pixel being displayed on screen.
As you can imagine, this form of rendering is extremely inefficient. Since fully shaded and textured pixels are thrown out, a good portion of the work an immediate mode rendering video card performs is actually in vain, as the pixels will never see the light of day. It is a direct result of this that current video cards are suffering memory bandwidth limitations, as the data for pixels that will never be used have to travel over the already busy memory bus.
Tile Based Rendering: PowerVR's Solution
The PowerVR approach, as mentioned above, is known as tile based rendering and differs significantly from immediate mode rendering when it comes to constructing a 3D scene. The tile based rendering approach attempts to eliminate any redundant processing in the 3D pipeline, thereby significantly reducing the memory bandwidth limitations that currently plague the video card industry and their immediate mode rendering approach.
Rather than process one polygon at a time without knowledge of other polygons in the scene, a tile based renderer first groups polygons together in groups called display lists. These display lists allow a scene to be broken into smaller blocks, known as tiles, which are then rendered independently.
The first advantage to rendering smaller portions of a scene at once is that it allows operations to be performed on-chip without having to access external memory. This allows all z-calculations to be performed without having to access an external z-buffer via the memory bus. Naturally, this eliminates the expensive z-buffer reads and writes that occur constantly on immediate mode renderers.
Rendering small tiles instead of a complete scene also means that pixels that are not visible can be thrown out before the rendering process beings. Since each tile consists of a display list that includes each polygon in that tile, hidden surface removal can occur before any textures are applied. Once again this significantly reduces the amount of information that must travel over the memory bus, as textures for non-visible surfaces do not need to be processed. Also located on chip is a tile buffer which acts as a fame buffer for an individual tile. This allows blending to be performed without costly memory reads and writes.
With all the benefits associated with a tile based renderer, one may wonder why every graphics processor out there does not utilize this method of rendering. In most cases, it is the amount of work required to get a tile based renderer working that keeps a product from being produced. Designing a tile based renderer requires a completely different chip design compared to immediate mode renderers. Even Imagination has experienced their share of problems with early tile based rendering chips. Luckily, it seems that after some trial and error, they have tile based rendering down on the Kyro II chip.
Instead of going completely tile based, it seems that the big 3 (3dfx, NVIDIA, and ATI) had been going after hidden surface removal methods of their own. Whether or not these optimizations, which have been focused around the z-buffer, will provide the bandwidth relief that is needed is still up in the air. If you ask NVIDIA or ATI, they will claim that optimizations are the way to go. Imagination, on the other hand, says there is no way that these optimizations can compare to the memory bandwidth savings provided by a tile based architecture.
Despite the advantages that a tile based system offers, the method has come under fire recently. Most notably, the lead programmer at Epic Games, Tim Sweeney, recently mentioned that implementing a T&L subsystem on a tile based renderer was next to impossible.
We approached STMicroelectroncis, the producer of the Kyro II chip, and asked them their response to the statement. They assured us that the idea that T&L support on a tile based rendering platform is not possible is completely untrue. In addition, the folks at STMicroelectronics said that they have a meeting with Tim at GDC, hinting that they will prove Tim incorrect with a demonstration of a tile based system with T&L. Note, however, that the Kyro II still lacks T&L support.
The Chip
The latest generation tile based renderer has been named Kyro II and is based off the PowerVR Series 3 technology. You may recall, last June we looked at the original Kyro processor. Let's see what the second incarnation of this chip entails.
First off, the Kyro II is not actually a completely new chip at all. The only difference between the original Kyro and the Kyro II seems to lie in clock speed, as both are PowerVR Series 3 based. Interesting enough, the Kyro II boast 15 million transistors, up from the 12 million on the original Kyro. When we asked STMicroelectroncis why the Kyro II features more transistors than the original Kyro, they responded by saying that the additional transistors were added to allow the Kyro II to hit a higher clock speed. Exactly how more transistors results in a higher clock speed, STMicroelectronics could not explain. Our best guess is that they've lengthened the pipeline and reduced the amount of work that is done in each step of the pipeline, much like Intel did with the Pentium 4. This is only a guess, however, and we have no way of confirming or denying that this is the case.
Regardless of how they reached a higher clock speed, there is no question that the Kyro II beats the pants off the original Kyro when it comes to clock speed. Up from a 125 MHz core, the Kyro II now features a 175 MHz clock speed. Since the core clock and the memory clock of the PowerVR Series 3 chips are synchronous, the memory clock is 175 MHz as well. It is interesting to note that originally the Kyro II was speced at a 166 MHz core and memory clock but tests conducted with board manufacturer Hercules/Guillemot showed that the chips could hit 175 MHz reliably.
The way that STMicroelectronics could assure a higher clock speed was by implementing a process shrink. The original Kyro chip was based on a 0.25 micron fabrication process, which resulted in limited clock speed. The new Kyro II is produced on one of STMicrolectronic's 0.18 micron fabs, not only shrinking die size but also increasing maximum clock speed while simultaneously decreasing power consumption. In fact, the new Kyro II running at 175 MHz dissipates only 4 watts of heat. Compare this to the 0.18 micron GeForce2 GTS with its 25 million transistors and 8 to 9 watts heat dissipation and one can see where simpler is better.
Like the original Kyro, the Kyro II features a 128-bit SDR memory bus that can support 16MB, 32MB, or 64MB configurations. Although SDR memory is typically associated with slow performance, this is not the case with tile based renderers. Since the memory bandwidth required to render a complex scene with a tile based renderer is drastically less than that required on an immediate mode rendering system, extreme amounts of memory bandwidth are not required. Therefore, it is likely that the Kyro II in its current configuration would actually not benefit at all by moving to a more spacious DDR memory bus.
Also like the Kyro, the Kyro II features two pixel pipelines capable of processing a single texture unit per clock. Once again, the Kyro II does not need additional pipelines in order to keep up with a card with more pipelines and texture units per clock because of the card's efficiency. There is no question, however, that adding additional texture units per pipe or increasing the pipelines will result in a substantial increase in performance, as memory bandwidth will not be holding the card back.
Missing from the Kyro II feature set is a T&L engine. Claiming that the current generation of CPUs are far superior at T&L calculations than any graphics part can be, STMicroelectronics choose to leave T&L off the Kyro II. Perhaps STMicroelectronics is referring to the fact that the non-programmable T&L we've seen on GeForce2 and Radeon cards has not been taken much advantage of, but the programmable unit of the GeForce3 will be much more useful and much more powerful. Whether or not this will hurt the Kyro, we will see in our CPU scaling tests, but it certainly would not hurt to put a T&L engine on the chip. Look for the next PowerVR chip to feature such an engine.
The Kyro II still boasts what STMicroelectronics calls Internal True Color which promises to make 16-bit color gameplay look better. As a result of texturing and z-buffering being performed on-chip, they can be done in full 32-bit color without the large performance penalty that traditional architectures must incur. Further, the internal 32-bit rendering occurs regardless of the frame buffer's color depth. The penalty that most architectures incur for 32-bit rendering is a result of memory bandwidth constraints that are in turn a result of the constant z-buffer accesses and unnecessary overdraw. In an ideal world, with infinite memory bandwidth, traditional 3D architectures would not slow down at all when rendering in 32-bit color.
So why not render in 32-bit mode all the time? If it were that simple, the KYRO probably would always operate in 32-bit mode. The fact remains that a 32-bit frame buffer and textures still take up twice as much memory as 16-bit ones. While the KYRO is able to render each tile on-chip, it is still necessary to put the completed tile in the frame buffer and also to read textures from memory, so the memory footprint (not bandwidth) requirements for 32-bit color are still double what they are for 16-bit, regardless of the rendering architecture in use.
The obvious question is then, why not use 16-bit frame buffers with 32-bit internal rendering all the time? As the screen shots below show, full 32-bit still looks better since the 16-bit image is dithered down from the internal 32-bit. This is especially apparent where fine gradients in color appear on screen. Note that there is a significant reduction in dithering for the 16-bit image of the Kyro compared to most cards 16-bit rendering.
16-bit shown above, 32-bit is below
The images above are JPEG compressed and thus have some quality loss
compared to the originals.
Click here to download a zip
file (300KB)with the original images in BMP format.
For the full effect, the images should be viewed full screen
The raw specs of the Kyro II processor are rather unimpressive. With a 175 MHz clock that is capable of processing two textures per clock, the Kyro II's raw fillrate is only 350 megapixels per second. The fillrate number is actually much closer to the original NVIDIA RIVA TNT2 Ultra than any current generation graphics processor. As we have seen in the past, numbers can be very misleading.
Rather than use the raw fillrate number of 350 megapixels per second, one has to take into account the overdraw that we discussed before. There are two ways to arrive at an effective fill rate number that takes into account overdraw. Either look at the number of pixels actually rendered on screen in a given amount of time or look at the number of pixels that would be rendered in the case of an immediate mode rendering engine.
With the first way of looking at the situation, a tile rendering architecture, such as that used in the Kyro II, the number of pixels rendered on screen will match the total number of pixels rendered. Thus, the effective fill rate here is the same as the theoretical fill rate. Since an immediate mode renderer often calculates up to 4 times the information necessary to render a scene, one can essentially divide the theoretical fill rate number of these cards by the amount of overdraw to arrive at the effective fill rate. Through out the rest of this article, this will be the "effective fill rate" we are referring to.
For marketing reasons, it's much easier for STMicroelectronics to push the second method of arriving at an effective fill rate number. This means taking the amount of overdraw (which is eliminated on the Kyro II's tile based rendering system) and multiplying by the theoretical fill rate of the tile based rendering card in order to get an "effective" fill rate. The latter is what STMicroelectronics choose to do, giving the Kyro II an effective fill rate number. The idea is to arrive at a fill rate number that can be directly compared to that of an immediate mode renderer. Assuming an overdraw of 4, which is considered a bit high by many in the industry, the Kyro II earns an effective fill rate of 1400 megapixels per second, far above the GeForce2 GTS's 800 megapixel per second rating. Assuming a more conservative overdraw estimate of around 3, the Kyro II boasts a 1050 megapixel per second; a number that is still very impressive.
The Kyro II also boasts 8-layer multitexturing in a single pass. Since texturing is performed on-hip, multitexturing becomes much more efficient in certain circumstances. Consider the GeForce 2 GTS, which can apply 2 texels to 4 pixels in a single pass. If the number of textures for a single pixel exceeds 2, then the GeForce 2 GTS will have to render the pixel in two passes. Those two passes mean that geometry data be sent again for the second pass. On the other hand, the Kyro II is capable of applying up to 8 textures to a pixel in a single pass. Another way in which tile rendering reduces memory bandwidth requirements. Note that this does not mean that the Kyro II can apply 8 textures in a single clock - in fact it can only do one texture per pixel in a single clock.
The Kyro II still contains the original's 300 MHz RAMDAC. Added this time around was S3TC texture compression support. This feature was left out of the initial Kyro release because licensing of the technology from S3 was not completed in time for the Kyro's introduction.
The Triangle: Imagination, STMicroelectronics, and Hercules/Guillemot
The relationship between the companies involved in bringing the Kyro II to market is quite complex. It is important to understand what role each company plays in the manufacturing triangle.
At the base of the triangle is Imagination Technologies. A British company, It is Imagination that actually created the technology behind the PowerVR chips. Pioneering the tile based rendering scheme, they have continued to work towards perfecting this type of technology. Although they hold the rights to the PowerVR technology, they physically make no chips or boards.
This is where STMicroelectronics comes in. STMicroelectronics actually licenses the PowerVR technology from Imagination and owns the rights to produce PowerVR chips. This partnership started with the PowerVR Series 2, where STMicroelectronics produced these chips for use not only in the desktop market but also in the Sega Dreamcast console.
It was STMicroelectronics that produced the chips used in original Kyro boards. The problem was that, although STMicroelectronics is a manufacturing powerhouse, they do not have the resources to develop and market boards. As a result of this, STMicroelectronics has turned to 3rd party manufacturers to build board level products that use the Kyro series of chips. The problem that they encountered with the original Kyro processor was that no major card maker agreed to produce and sell Kyro based boards. This made it extremely difficult to find Kyro cards, let alone one from a major board manufacturer. Without retail or OEM presence, PowerVR technology was destined to fail, no matter how good it was.
Finally Hercules/Guillemot comes into the picture. In a partnership announced last Friday, Hercules/Guillemot agreed to produce Kyro based products for the retail market. Selling along side their popular 3D Prophet series cards, STMicroelectronics' partnership with Hercules/Guillemot promises that the Kyro II will be sold side by side NVIDIA based cards. In addition to the 3D Prophet 4500, Hercules/Guillemot's Kyro II based card, the company will also produce and sell a board with the original Kyro, the 3D Prophet 4000, outfitted with 32MB of memory.
This partnership is very important for STMicroelectronics because, although the Kyro graphics processor performed extremely well for its price, poor retail availability, product delays, and buggy drivers prevented it from being a force in the video card market. By signing an agreement with Hercules/Guillemot, the 2nd largest retailer of NVIDIA based products, STMicroelectronics gains the experience and market presence that can only be provided by a major card manufacturer. In addition, STMicroelectronics worked hand in hand with Hercules/Guillemot's driver team and was able to build a new driver base with out any problems. Finally, the agreement means that products will actually come on time: the 3D Prophet 4500 is slated for an end of March or beginning of April ship date.
Keep in mind that STMicroelectronics does not have an exclusive agreement with Hercules/Guillemot, so it is probably just a matter of time before we see other Kyro II based boards on the market. STMicroelectronics would not name any other names at this time, but it is safe to assume that manufacturers that had a product using the original Kyro will follow with a Kyro II card in the near future. However, for now, it is Hercules/Guillemot or bust.
The Card
As mentioned before, the result of STMicroelectronics' partnership with Hercules/Guillemot is the 3D Prophet 4500. The card, which is shown here in prerelease form, is expected to change only in board layout by the time it hits market. Our prerelease card is a modified STMicro reference design will most likely be shrunk by the time the product hits shelves.
Judging by size of the chip, the Kyro II graphics processor is rather small, although no die sizes have been given out by STMicroelectronics. This is no surprise since the Kyro II features a mere 15 million transistors, compared to the 25 million transistor GeForce2 GTS.
On the 3D Prophet 4500, the Kyro II core is cooled by a blue circular heatsink/fan, reminiscent of the Blue Orb. As we mentioned previously, the Kyro II only dissipates 4 watts of heat, placing it at the heat level of the GeForce2 MX, a chip that we have seen run not only without out a fan but without cooling of any sorts. Hercules/Guillemot claims that the heatsink/fan, which is bonded to the core via thermal tape, is necessary to get the chip up to the 175 MHz speed it will be selling at with the 3D Prophet 4500. If this really is the case, expect competing Kyro II products to feature only a heatsink or no cooling at all with a core clock running at 166 MHz.
The pre-production card we had featured 64MB of memory, as this is the only configuration the 3D Prophet 4500 will be sold in. We have seen in the past that the jump from 32MB to 64MB of video memory makes very little to no difference in real game performance, although it can come in handy in texture heavy situations. It is most likely the case that Hercules/Guillemot choose to go with 64MB of memory simply as a selling point: many users who see a 64MB card on the shelf right next to a 32MB one will assume that the one with more memory is better.
The memory on our board was contained in eight, 8MB Hyundai SDR SDRAM chips rated at 5.0 ns. This means that the chips are rated up to 200 MHz, well above the 175 MHz clock speed that the memory operates on. As production begins to ramp up, it will not be surprising to see these 5.0 ns chips replaced with 5.5 ns ones, given that the yields on 5.5 ns memory are good.
The front of the board reveals two silk screens for chips not placed on our board. The first, smaller silk-screen is for placement of a Silicon Image TMDS controller for flat panel support. The Kyro II does not feature an internal TMDS, so flat panel support had to be provided by a third party manufacturer.
The second, larger silk-screen, located more centrally, is for a Chrontel video-out chip. The board will also be sold with optional video-out that will include this Chrontel chip as well as S-video and composite-out ports.
Pricing for the board is very competitive, with the base 3D Prophet 4500, sold with 64MB of memory, coming in at a suggested retail price of $149.99. The same card with TV-out is slated to cost $20 more, bringing the price of the card up to $169.99.
With what Hercules/Guillemot promises will be extreme performance, which we will look at for ourselves in a moment, one has to wonder if the company is starting to compete with themselves. As mentioned before, Hercules/Guillemot is the 2nd largest producer of NVIDIA based cards. As it stands now, their very popular 3D Prophet II MX is priced at around $100 and the 3D Prophet II GTS is now around $230. Falling between these two price points, the base 3D Prophet 4500 gets dangerously close to the 3D Prophet II MX's price range. Hercules/Guillemot said they did not see a problem with this, as "There is nothing like it [the 3D Prophet 4500] for its price." As the card prices drop, we do expect Hercules/Guillemot to move more away from their powerful GeForce2 MX line, replacing it with the Kyro II based 4500. This will be especially true as the 3D Prophet II 4000, based on the original Kyro, begins to hit shelves at around $79.
Lastly, we imagine that many of you are wondering about the name, as we sure did. Does the 3D Prophet 4500 sound reminiscent of another recently released board? Perhaps the Voodoo4 4500 comes to mind. Well, we asked Hercules/Guillemot about this and the tongue in cheek response was that 4500 was actually STMicroelectronic's internal code name for the Kyro II. True or not, we believe there is a bit more to the name than just that, although we are puzzled why any company would want to name an item after a product that ultimately failed.
The Drivers
The driver set that came with our Hercules/Guillemot 3D Prophet 4500 was extremely compact. With settings contained in no more than 2 tabs, with very few sub items, it was easy to find and set any property of the card.
The display tab contained information regarding the current monitor setup, allowed for altering of screen position, and provided a gamma correction slider. Not as fancy as we have seen on some tweaked NVIDIA drivers (ELSA comes to mind), but sufficient for the vast majority of user's needs.
The "advanced" features of the card were accessible via the "3d Optimisation" tab. The above screen shows settings for D3D game play. There are a set of two preset profiles at the top of the D3D screen, one for "Speed" and one for "Quality." The screen above shows what pops up if the "Custom" mode is selected.
The OpenGL settings were very similar to the D3D settings, with presets defined.
The Test
Windows 98 SE Test System |
|||||||
Hardware |
|||||||
CPU(s) | AMD Athlon-C (Thunderbird) 1.0GHz (133MHz) | ||||||
Motherboard(s) | ASUS A7V133 | ||||||
Memory | 128MB PC133 Corsair SDRAM (Micron -7E Chips) | ||||||
Hard Drive |
IBM Deskstar DPTA-372050 20.5GB 7200 RPM Ultra ATA 66 |
||||||
CDROM |
Phillips 48X |
||||||
Video Card(s) |
3dfx
Voodoo4 5500 AGP 64MB ATI
Radeon 32MB DDR Hercules/Guillemot
3D Prophet 4500 NVIDIA
GeForce2 Ultra 64MB DDR |
||||||
Ethernet |
Linksys LNE100TX 100Mbit PCI Ethernet Adapter |
||||||
Software |
|||||||
Operating System |
Windows 98 SE |
||||||
Video Drivers |
|
||||||
Benchmarking Applications |
|||||||
Gaming |
idSoftware
Quake III Arena demo001.dm3 |
Quake III Arena Performance
At 640x480x32 we see two aspects of a card, for the most part. First off is how well the drivers are performing. Historically poor drivers have led to poor performance at low resolutions. Also many times blamed for poor performance at low resolutions is the lack of a T&L engine.
In the case of the Kyro II, the drivers do not seem to be holding the card back. In addition, the card does not seem to flinch any when put up against T&L enabled cards such as the ATI Radeon DDR and the GeForce2 MX. The Kyro II performs very strong for a first showing. Could the drivers and lack of T&L still be holding the card back, even with its very strong 640x480x32 performance? Lets see what other resolutions show.
Indeed, if you thought the Kyro II's performance was impressive before, you are in for a shock with the 1024x768x32 performance. Not stressing a nonexistent T&L engine and not held back at all by early drivers, the Kyro II is able to actually outperform the popular (and more expensive) NVIDIA GeForce2 GTS by a noticeable amount.
Scoring slightly over 100 FPS in Quake III Arena, the Kyro II beats the GeForce 2 GTS by 5%. It also beats the comparatively priced GeForce2 MX by 80% and the Radeon DDR by 24%. The performance of the card, even this early on into the testing, was enough to do more than just catch our eye. We were frankly shocked when we saw that the Kyro II based Hercules/Guillemot 3D Prophet 4500 performed better than the 64MB GeForce2 GTS and is priced almost $90 less. In fact, the Kyro II came within 11% of the powerful NVIDIA GeForce2 Pro.
Not much changes as the resolution increases in Quake III Arena. At 1600x1200x32, the Kyro II holds its position right between the GeForce2 GTS and the GeForce2 Pro. Running only 11% slower than the GeForce2 Pro, the Kyro II looks like it is primed to take low cost video cards to a new level. If the Kyro II's Quake III Arena performance is any indication of the potential of tile based rendering, immediate mode renderers should run for the hills.
MDK2 Performance
It is here, in MDK2 at 640x480x32, that the Kyro II is punished for its lack of T&L. One can clearly see that cards with T&L, all but the Kyro II and the Voodoo5 5500 in this group, perform much better on the whole at this low resolution. Unfortunately, we will most likely have to wait until the next generation PowerVR chip is released before we see this number jump, as it cannot be fixed via a simple driver update.
As the resolution moves up, the Kyro II is not penalized as much for the lack of T&L as fill rate becomes more of a limiting factor. Here the Kyro II steadily beats the GeForce2 MX once again, but this time falls short of the GeForce2 Pro that it once had its sights on. The Kyro II is beaten by the GeForce2 Pro in this case by 72%, whereas in Quake III Arena it only fell 11% short of the GeForce2 Pro's speed. Once again, we will have to wait for a T&L capable PowerVR chip before we see its MDK2 performance increase significantly.
MDK2 at 1600x1200x32 tells pretty much the same story as at 1024x768x32: the Kyro II is held by back its lack of T&L. Not to say that performance in this case is not impressive given its price - the Kyro II still beats out the GeForce2 MX, this time by a huge 74%. There is no question that the extra $30 or so that the Kyro II costs is worth it by far.
UnrealTournament Performance
In Unreal Tournament at 640x480x32, the Kyro II does what some would consider the impossible, especially for a card in the sub $150 price range: it beats the $340 GeForce2 Ultra.
The Kyro II does the same thing, meaning that not only is the minimum frame rate at 640x480x32 higher than any other card, so is the average frame rate.
As the resolution increases, the Kyro II cannot maintain its top of chart position. The performance of the card still remains very strong, falling into a group that consists of the GeForce2 GTS 64MB and the Radeon DDR 64MB.
The average frame rate of the card at 1024x768x32 reflects the results observed in the minimum frame rate measurements.
The Kyro II does not perform as well in Unreal Tournament at 1600x1200x16 as we have seen it do in the past. Here, the Kyro II still ready handedly beats the GeForce2 MX, the Radeon SDR, and the Radeon DDR 64MB, but falls short of the GeForce2 series cards.
Same results in the average frame rate here as in the minimum frame rate.
Serious Sam Performance - Fill Rates
This review marks the addition of two new benchmarks into the AnandTech video card testing lineup. First we will take a look at Serious Sam, a game developed by Croteam that is currently in beta test form. The game, which features very impressive benchmarking features, can not only give the frame rates of game playback, but also the sustained frame rates as well as fill rates.
One tool that the Serious Sam engine possesses is the ability to measure the actual fill rate of a card. This has been something that we previously had not been able to do, meaning that we relied simply on manufacturers numbers. As the above shows, those numbers are extremely misleading.
Remember how we mentioned before that when overdraw is taken into account, the effective fill rate of a conventional video card is essentially the theoretical fill rate divided by the overdraw amount? Well, the Serious Sam tests show exactly this. Take the GeForce2 Ultra, for example. Theoretically, this card has a 1000 megapixel per second fill rate given its clock speed and rendering pipe. What we see in actuality, however, is that the GeForce2 Ultra is only able to fill 375 megapixels per second. This means that given the synthetic Serious Sam fill rate tests, the GeForce2 Ultra is only 37.5% effective. One can attribute this to overdraw as well a memory bandwidth limitations
The Kyro II, on the other hand, features what many would consider a lowly 350 megapixel per second fill rate. However, when the tests are run, the Kyro II scores a fill rate that is only 22 megapixels per second less than the GeForce2 Ultra. Coming out at 352.89 megapixels per second, the Kyro II's effective fill rate matches its theoretical fill rate, something we cannot say about any other card on the market. According to the Serious Sam benchmarks, the Kyro II is actually 100% efficient.
This is quite exciting to see. Previous fill rate numbers have been misleading, to say the least. The Kyro II's tile based rendering architecture, however, opens a new path in fill rates where effective fill rates actually match theoretical ones. Can you imagine a Kyro II card that actually featured the theoretical fill rate of 1000 megapixels per second that the GeForce2 Ultra features? Out of this world would be the only way to describe it.
The focus of the Serious Sam synthetic benchmark should be to observe fill rate efficiency, as the benchmark assumes a certain amount of overdraw that may or may not be present in actual game play. With this in mind, let's see how the Kyro II fares in the mulitexture fillrate benchmark in terms of efficiency
The fact that the Kyro II is able to hit 175 megapixels per second shows that once again that the Kyro is 100% efficient. In the case of the multitexture benchmark, it seems that 2 textures are applied to the polygons being rendered. This would bring the fill rate of the Kyro II down to 175 megapixels per second, exactly the score it is able to get.
As we pointed out before, the ability to reach 100% efficiency in this synthetic test says a great deal about the promise of tile based rendering. No longer will theoretical fill rates never see the light of day, as the Kyro II and its tile based architecture allows for effective fill rates to match theoretical ones.
Serious Sam Performance - Game Play
The Kyro II fares well at 640x480x32 in Serious Sam. Once again, a T&L engine is really the best way to boost the Kyro II's performance at low resolutions. Let's see how it does when we up the ante a bit.
The 149.99 Kyro II once again does the impossible, beating the $340 GeForce2 Ultra, not just by a slight margin, but by an impressive 6%. Showing the complexity that must be present in Serious Sam in the form of overdraw, the Kyro II and its 100% effectiveness in the Serious Sam synthetic fill rate benchmark rises to the top of the charts. It is clear that tile rendering systems do have their place in modern day 3D rendering, and that place lies above today's immediate mode renders.
Serious Sam at 1600x1200x32 results in a similar conclusion: not only is the Kyro II a great card for the price, it is also the fastest in Serious Sam. Running 10% faster than the GeForce2 Ultra, the Kyro II's efficiency pays off in a big way. Giving the price, the performance of the Kyro II is nothing short of breathtaking.
Mercedes-Benz Truck Racing Performance
The second new game we added to our benchmark suite is Mercedes-Benz Truck Racing. We thought it was important that we add a second, and more recent D3D game into the lineup. Before we head to the benchmarks, it is necessary to point out a few things.
First off, the asterisk next to the Voodoo5 5500 is there to show that the Voodoo5 5500 was tested with the 24-bit texture option off. Failure to turn this check box off on the Voodoo5 5500 resulted in extremely poor performance regardless of the resolution.
The second thing to point out is that it was in Mercedes-Benz Truck Racing that we experienced some driver problems with the Kyro II. During the benchmark, we noticed textures that were supposed to be invisible actually appearing in some frames during playback. As a picture is worth a thousand words..
In the first picture above, one can see tire tracks that should clearly be behind the truck. The second picture shows brake lights, which once again should not be visible in the rendered scene, coming through the front of the truck.
Looks like the Kyro II still needs some driver work. Regardless of the rendering problems, let's see how the card performs in this game.
The Kyro II takes a position it is not used to assuming when in MB Truck Racing at 640x480x32. This time around, we can attribute the poor performance not only to the lack of T&L but also to the driver problem described above.
Unfortunately, not much changes for the Kyro II when the resolution is increased to 1024x768x32. Keep in mind that these results should be taken with more than just a grain of salt, as the demo playback was clearly not working 100% correctly with the Kyro II drivers.
Finally, at 1600x1200x32, the Kyro II remains near the bottom of the charts. Still beating out the Radeon SDR as well as the Voodoo5 5500, the Kyro II could be limited by its broken drivers. Let's hope STMicroelectronics can fix this problem in the near future.
FSAA Image Quality and Performance
The Kyro II is able to do up to 4x super-sampling FSAA, however the drivers provided with our card only allowed FSAA mode to be turned on under OpenGL. Let's see how the Kyro II looks with FSAA on and off under Serious Sam.
The results are comparable to NVIDIA's FSAA 2x2 image quality. Note the smooth edges of the door when in FSAA mode. For the full effect, it is necessary to view the images full screen.
Now turning to Serious Sam to provide us with FSAA scores, let's see how the Kyro II does when having to render each frame at four times the resolution.
It is clear that the Kyro II maintains its performance advantage in Serious Sam with FSAA enabled. Once again, the Kyro II is able to out perform the GeForce2 Ultra, this time by 12%. Running at 60 frames per second, the Kyro II is nothing short of amazing.
Of the cards that could make it up to 1024x768x32 with FSAA 4x enabled, the Kyro II once again comes out on top. Beating the GeForce2 Ultra by 25%, the Kyro II is extremely fast.
The reason that the Kyro II is able to perform so well with FSAA enabled is due to the efficiency of the card. In a traditional card, when a scene is rendered at 4 times the resolution to provide 4x FSAA, memory bandwidth is the limiting factor. With the Kyro II, however, memory bandwidth is not a limitation, allowing the card to perform very will when FSAA is turned on.
16-bit vs 32-bit Performance
As a result of the Kyro II's internal 32-bit rendering, we were not surprised to find that this card actually had the least performance increase when going from 32-bit to 16-bit. Doing this comparison, however, is a bit unfair, since the 16-bit quality of the Kyro II is noticeably better than the 16-bit quality of competing cards. It is unfortunate, however, that almost no speed can be gained by falling back on 16-bit color mode. At least the speed that is gained will look better than the alternatives.
Same story when going from 1600x1200x32 to 1600x1200x16. The only card to show less of a performance increase than the Kyro II is the ATI Radeon DDR, which gains a mere 2 FPS by decreasing color depth.
Conclusion
We are all extremely lucky that Kyro II based products will not only be available, but also easy to get by the end of March or the beginning of April. The performance of the Kyro II based 3D Prophet 4500 is nothing short of stunning given its price: a mere $149.99.
It has been a while since we have had a truly high powered graphics card dip below the $200 price mark. In the past, stripped-down versions of higher performance parts were sold to cost-conscious consumers, oftentimes leaving them with sub-par performance. The Kyro II changes all that.
With its tile based rendering algorithm, the Kyro II provides blazing fast performance considering the price and was actually able to beat products almost $200 more than the cost of a Kyro II based board. Throughout the benchmarks, the Kyro II based 3D Prophet 4500 simply dominated everything else in its price range. The Kyro II was ready and able to tackle any game we sent its way.
Our one concern lies with the display errors we experienced in Mercedes-Benz Truck Racing. Although the visual abnormalities were only noticed in this benchmark, it is possible that other games may be effected by the improper rendering. We can only hope that STMicroelectronics recognizes the problem and provides a quick fix, as this would make the Kyro II near perfect.
Regardless of the driver problem encountered, the Kyro II is truly amazing and is a true testament of the potential of tile based rendering. Recall the fact that in order to get this high level of performance, all STMicroelectronics had to do was increase the clock speed of the original Kyro and it becomes clear that tile based rendering has an extreme amount of potential.
The Kyro II based 3D Prophet 4500 brings high end gaming performance to a segment that it has not seen in quite some time. As an inexpensive 3D solution, a Kyro II based product may be just what the doctor ordered until a fully programmable graphics processor becomes necessary, an event that will most likely take at least six to eight months. We look forward to what those six to eight months will do to the tile based PowerVR series, but we know that now a Kyro II based card is the card of choice for those out there with even a slight constraint on their budget.