Original Link: https://www.anandtech.com/show/2009



Introduction

Last week, we took a first look at the new PhysX add-in physics accelerator from AGEIA. After our article was published, AGEIA released an update to their driver that addresses some of the framerate issues in Ghost Recon Advanced Warfighter. While our main focus this time around will be on BFG's retail part, we will explore the effectiveness of this patch and go a little further in-depth with the details behind our performance analysis.


In addition to the BFG retail PhysX card and Ghost Recon update, we will take a look at a few demos that require the PhysX card to run. While there aren't any games scheduled to come out in the near future that will take this new technology to the extreme, it will be nice to get a glimpse into the vision AGEIA has for the future. Getting there will certainly be a hard road to travel. Until more games come out that support the hardware, we certainly can't recommend PhysX to anyone but the wealthy enthusiasts who enjoy the novelty of hardware for hardware's sake. Even if PhysX significantly enhances the experience of a few games right now, it will be a tough sell to most users until there is either much wider software support, good games which require the hardware, or a killer app with a PhysX hardware accelerated feature that everyone wants to have.

As for games which will include PhysX hardware support, the only three out as of this week are Tom Clancy's Ghost Recon Advanced Warfighter (GRAW), Rise of Nations: Rise of Legends (ROL) and City of Villains (COV). Rise of Legends came out last week, and we have been extensively testing it. Unfortunately, PhysX hardware support will only be added in an upcoming patch for which we have no real ETA.

We worked very hard to test City of Villains, and we finally succeeded in creating a repeatable benchmark. The specific content in City of Villains which supports the AGEIA PhysX PPU (physics processing unit) is a series of events called the Mayhem Missions. This is a very small subset of the game consisting of timed (15 minute) missions. Currently these missions are being added in Issue 7 which is still on the test server and is not ready for primetime. Full support for PhysX was included on the test server as of May 10th, so we have benchmarks and videos available.

Before we jump into the numbers, we are going to take a look at the BFG card itself. As this is a full retail part, we will give it a full retail workup: power, noise, drivers, and pricing will all be explored. Our investigations haven't turned up an on-chip or on-board thermistor, so we won't be reporting heat for this review. Our power draw numbers and the size of the heat sink lead us to believe that heat should not be a big issue for PhysX add-in boards.



BFG PhysX and the AGEIA Driver

Let us begin with the BFG PhysX card itself. The specs are the exact same as the ASUS card we previewed. Specifically, we have:

130nm PhysX PPU with 125 million transistors
128MB GDDR3 @ 733MHz Data Rate
32-bit PCI interface
4-pin Molex power connector


The BFG card has a bonus: a blue LED behind the fan. Our BFG card came in a retail box, pictured here:


Inside the box, we find CDs, power cables, and the card itself:



As we can see here, BFG opted to go with Samsung's K4J55323QF-GC20 GDDR3 chips. There are 4 chips on the board, each of which is 4 banks of 2Mb x 32 RAM (32MB). The chips are rated at 2ns, giving a maximum clock speed of 500MHz (1GHz data rate), but the memory clock speed used on current PhysX hardware is only 366MHz (733MHz data rate). It is possible that lower than rated clock speeds could be implemented to save on power and hit a lower thermal envelope. It might be possible that a lower clock speed allows board makers to be more aggressive with chip timing if latency is a larger concern than bandwidth for the PhysX hardware. This is just speculation at this point, but such an approach is certainly not beyond the realm of possibility.


The pricing on the BFG card costs about $300 at major online retailers, but can be found for as low as $280. The ASUS PhysX P1 Ghost Recon Edition is bundled with GRAW for about $340, while the BFG part does not come with any PhysX accelerated games. It is possible to download a demo of CellFactor now, which does add some value to the product, but until we see more (and much better) software support, we will have to recommend that interested buyers take a wait and see attitude towards this part.

As for software support, AGEIA is constantly working on their driver and pumping out newer versions. The driver interface is shown here:

Click to enlarge


There isn't much to the user side of the PhysX driver. We see an informational window, a test application, a diagnostic tool to check or reset hardware, and a help page. There are no real "options" to speak of in the traditional sense. The card itself really is designed to be plugged in and forgotten about. This does make it much easier on the end user under normal conditions.

We also tested the power draw and noise of the BFG PhysX card. Here are our results:

Noise (in dB)
Ambient (PC off): 43.4
No BFG PhysX: 50.5
BFG PhysX: 54.0


The BFG PhysX Accelerator does audibly add to the noise. Of course, the noise increase is nowhere near as bad as listening to an ATI X1900 XTX fan spin up to full speed.

Idle Power (in Watts)
No Hardware: 170
BFG PhysX: 190


Load Power without Physics Load
No Hardware: 324
BFG PhysX: 352


Load Power with Physics Load
No Hardware: 335
BFG PhysX: 300


At first glance these results can be a bit tricky to understand. The load tests were performed with our low quality Ghost Recon Advanced Warfighter physics benchmark. Our test "without Physics Load" is taken before we throw the grenade and blow up everything, while the "with Physics Load" reading is made during the explosion.

Yes, system power draw (measured at the wall with a Kill-A-Watt) decreases under load when the PhysX card is being used. This is made odder by the fact that the power draw of the system without a physics card increases during the explosion. Our explanation is quite simple: The GPU is the leading power hog when running GRAW, and it becomes starved for input while the PPU generates its data. This explanation fits in well with our observations on framerate under the games we tested: namely, triggering events which use PhysX hardware in current games results in a very brief (yet sharp) drop in framerate. With the system sending the GPU less work to do per second, less power is required to run the game as well. While we don't know the exact power draw of the PhysX card itself, it is clear from our data that it doesn't pull nearly the power that current graphics cards require.



Benchmarking Physics

We've had a lot of responses about the benchmarking procedures we used in our first PhysX article. We would like to clear up what we are trying to accomplish with our tests, and explain why we are doing things the way we are. Hopefully, by opening up a discussion of our approach to benchmarking, we can learn how to best serve the community with future tests of this technology.

First off, average FPS is a good measure of full system performance under games. Depending on how the system responds to the game over multiple resolutions, graphics cards and CPU speeds, we can usually get a good idea of the way the different components of a system impact an applications performance.

Unfortunately, when a new and under used product (like a physics accelerator) hits the market, the sharp lack of applications that make use of the hardware present a problem to consumers attempting to evaluate the capabilities of the hardware. In the case of AGEIA's PhysX card, a sharp lack of ability to test applications running with a full compliment of physics effects in software mode really hampers our ability to draw solid conclusions.

In order to fill in the gaps in our testing, we would usually look towards synthetic benchmarks or development tools. At this point, the only synthetic benchmark we have is the boxes demo that is packaged with the AGEIA PhysX driver. The older tools, demos and benchmarks (such as 3DMark06) that use the PhysX SDK (formerly named Novodex) are not directly supported by the hardware (they would need to be patched somehow to enable support if possible).

Other, more current, demos will not run without hardware in the system (like CellFactor). The idea in these cases would be to stress the hardware as much as possible to find out what it can do. We would also like to find out how code running on the PhysX hardware compares to code running on a CPU (especially in a multiprocessor environment). Being able to control the number and type of physics objects to be handled would allow us to get a better idea of what we can expect in the future.

To fill in a couple gaps, AGEIA states that the PhysX PPU is capable of handling over 533000 convex object collisions per second and 3X as many sphere collisions per second. This is quite difficult to relate back to real world performance, but it is appears to be more work than a CPU or GPU could perform per second.

Of course, there is no replacement for actual code, and (to the end user) hardware is only as good as the software that runs on it. This is the philosophy by which we live. We are dedicated first and foremost to the enthusiast who spends his or her hard earned money on computer hardware, and there is no substitute for real world performance in evaluating the usefulness of a tool.

Using FPS to benchmark the impact of PhysX on performance is not a perfect fit, but it isn't as bad as it could be. Frames per second (in an instantaneous sense) is one divided by the time it takes to render a single frame. We call this the frametime. One divided by an average FPS is the average time it takes for a game to produce a finished frame. This takes into account the time it takes for a game to take in input, update game logic (with user input, AI, physics, event handling, script processing, etc.), and draw the frame via the GPU. Even though a single frame needs to travel the same path from start to finish, things like cueing multiple frames for rendering to the GPU (usually 3 at most) and multithreaded game programming are able to hide some of the overhead. Throw PhysX into the mix, and ideally we can offload some of this work somewhere else.

Here are some examples of how frametime can be affected by a game. These are very limited examples and don't reflect the true complexity of game programming.

CPU limited situations:
CPU: |------------ Game logic ------------||----
GPU: |---- Graphics processing ----|       |----

The GPU must wait on the CPU to setup the next frame before it can start rendering. In this case, PhysX could help by reducing the CPU load and thus frametime.

Severely GPU limited situations:
CPU: |------ Game Logic ------|             |---
GPU: |-------- Graphics processing --------||---

The CPU can start work on the next frame before the GPU finishes, but any work after three frames ahead must be thrown out. In the extreme case, this can cause lag between user input and the graphics being displayed. In less severe cases, it is possible to keep the CPU more heavily loaded while the frametime still depends on the GPU alone.

In either case, as is currently being done in both City of Villains and Ghost Recon Advanced Warfighter, the PhysX card can ideally be added to create additional effects without adding to frametime or CPU/GPU load. Unfortunately, the real world is not ideal, and in both of these games we see an increase in frametime for at least a couple frames. There are many reasons we could be seeing this right now, but it seems to not be as much of a problem for demos and games designed around the PPU.

In our tests of PhysX technology in the games which currently make use of the hardware, multiple resolutions and CPU speeds have been tested in order to determine how the PhysX card factors into frametime. For instance, it was very clear in our initial GRAW test that the game was CPU limited at low resolutions because the framerate dropped significantly when running on a slower processor. Likewise, at high resolutions the GPU was limiting performance because the drop in processor speed didn't affect the framerate in a very significant way. In all cases, after adding the PhysX card, we were easily able to see that frametime was most significantly limited by either the PhysX hardware itself, AGEIA driver overhead, or the PCI bus.

Ideally, the PhysX GPU will not only reduce the load on the CPU (or GPU) by unloading the processing of physics code, but will also give developer the ability to perform even more physics calculations in parallel with the CPU and GPU. This solution absolutely has the potential to be more powerful than moving physics processing to the GPU or a second core on a CPU. Not only that, but the CPU and GPU will be free to allow developers to accomplish ever increasingly complex tasks. With current generation games becoming graphics limited on the GPU (even in multi-GPU configurations), it seems counterintuitive to load it even more with physics. Certainly this could offer an increase in physics realism, but we have yet to see the cost.



City of Villains (Beta) Tests

Update: We would like to clarify that the beta version of City of Villains was stated as non-optimized in the release notes. The test server is open to any CoV subscriber and it does support PhysX hardware, but it is very possible for performance improvements to be made to the implimentation before their production release of the Issue 7 patch. In fact, we have been told by AGEIA that performance on the final code will absolutely be better than on the test server. We look forward to testing and reporting on PhysX support under production CoV code/servers as soon as possible.

As with any MMOG, City of Villains has been difficult to test. The fact that PhysX hardware is only supported in a very specific area of gameplay makes it even more complicated. Luckily, we found a way around these issues. As City of Villains makes use of instanced areas for missions (multiple parties entering a single zone will each play the game in their own copy of that area), we were able to eliminate any outside influence from other players and go on a mission solo. Also key in our ability to test the Mayhem Missions (the part of City of Villains in which the PhysX hardware is used) is the fact that it is currently on the test server.

Normally, before a gamer gets a Mayhem Mission, he or she will have to do a bunch of other missions, level up a bunch of times, and unlock a contact called a "broker." The broker gives you access to "newspaper missions." After doing 3 of the newspaper missions, a broker will offer you a special job: a heist. In the future, and on the test server, this is where mayhem missions will be available. Each mayhem mission, once begun, has a 15 minute time limit. This limit can be extended by destroying things in the city, but there is no way to get enough time to actually benchmark multiple configurations of hardware. Doing 3 other newspaper missions and getting another mayhem mission isn't an option because this takes a huge amount of time (the missions can be different every time as well).

Fortunately, NCSoft offers the ability to copy your character to the test server at any time, multiple times. Our solution was to do enough missions to get to a point where we could be offered a mayhem mission (even though the missions are currently only on the test server). Then we copy our character over to the test server a bunch of times. Now we are able to accept the same mission any time we want.

This is great for testing a feature like PhysX, but doing a full test of the game is still hard. Working with a team fighting a huge number of enemies is a key element in the game. It's something that just can't be reliably repeated barring the assembly of an AnandTech guild built solely for choreographing, reenacting, and benchmarking scenes hundreds of times. For now, this is all we need. We included a screenshot of the game options with and without a PhysX card installed:




We created a couple videos to show the difference between hardware accelerated and software physics. Here are the numbers that resulted from our tests.

City of Villains

City of Villains

What we see clearly shows that the PhysX card is putting quite a strain on performance. Yes we get more mail or packing peanuts spewing forth from our rampant destruction. Yes, we do think it looks great with more stuff going on. But we are really not happy with the performance we are seeing here. It appears that our GPU is not overly loaded at these resolutions, as the difference between 800x600 and 1600x1200 gives very little in the way of performance gain without PhysX hardware installed. This implies that the bottleneck in performance is in the rest of the system. This is confirmed by the fact that performance drops considerably when running the test on a much slower CPU.

Whether on a low resolution with a fast CPU or a high resolution with a slow CPU, the PhysX hardware gives us low average framerates, very low minimum instantaneous framerates, and adds a bit of stutter to the movement of the game. Unlike the Ghost Recon test we showed last week, playing the game with the PhysX card feels much less satisfying. Multiple frames seem to take a long time to render (rather than just one or two) giving movement a choppy feel to it.

That being said, it is very important that we point out the fact that we are benchmarking code running on a test server. NCSoft makes it clear that code running on this server is not optimized and players may see degraded performance. We hope very sincerely that the performance issues will be resolved; right now it just isn't worth it.

So why include these numbers at all if it's on a test server? Well, the fact that they back up the results we saw in GRAW last week do lend some credibility the tests. What we seem to be seeing is that games which use the PhysX processor to tack on special effects physics take a performance hit when those effects are triggered. Of course, that only holds if the driver didn't fix the performance issues we saw in Ghost Recon.



Ghost Recon Advanced Warfighter Tests

And the short story is that the patch released by AGEIA when we published our previous story didn't really do much to fix the performance issues. We did see an increase in framerate from our previous tests, but the results are less impressive than we were hoping to see (especially with regard to the extremely low minimum framerate).

Here are the results from our initial test, as well as the updated results we collected:

Ghost Recon Advanced Warfighter

Ghost Recon Advanced Warfighter

Ghost Recon Advanced Warfighter

Ghost Recon Advanced Warfigher

There is a difference, but it isn't huge. We are quite impressed with the fact that AGEIA was able to release a driver so quickly after performance issues were made known, but we would like to see better results than this. Perhaps AGEIA will have another trick up their sleeves in the future as well.

Whatever the case, after further testing, it appears our initial assumptions are proving more and more correct, at least with the current generation of PhysX games. There is a bottleneck in the system somewhere near and dear to the PPU. Whether this bottleneck is in the game code, the AGEIA driver, the PCI bus, or on the PhysX card itself, we just can't say at this point. The fact that a driver release did improve the framerates a little implies that at least some of the bottleneck is in the driver. The implementation in GRAW is quite questionable, and a game update could help to improve performance if this is the case.

Our working theory is that there is a good amount of overhead associated with initiating activity on the PhysX hardware. This idea is backed up by a few observations we have made. Firstly, the slow down occurs right as particle systems or objects are created in the game. After the creation of the PhysX accelerated objects, framerates seem to smooth out. The demos we have which use the PhysX hardware for everything physics related don't seem to suffer the same problem when blowing things up (as we will demonstrate shortly).

We don't know enough at this point about either the implementation of the PhysX hardware or the games that use it to be able to say what would help speed things up. It is quite clear that there is a whole lot of breathing room for developers to use. Both the CellFactor demo (now downloadable) and the UnrealEngine 3 demo Hangar of Doom show this fact quite clearly.



Playing Demos on PhysX

Even though we can't benchmark CellFactor or Hangar of Doom in any useful way, they can't be left out when talking about the usefulness of PhysX hardware. It is very clear that using the AGEIA PhysX technology in a game can yield some impressive results. It is just as clear that current production games, while adding compelling visual effects, suffer a performance penalty that is difficult to justify.

What we don't know is just how much more physics a PhysX card and do than the CPU or GPU already in every computer. We just haven't found a real world scenario in which to test the same large physics load on both the CPU and the PPU. None of the games that support PhysX include the ability to enable advanced physics features in the absence of hardware. The one small demo we do have that can run in either hardware or software mode does show a good improvement with hardware, but this test diagnostic app isn't designed as a performance analysis tool (nor is it a real world example of anything).

Click to enlarge

Now that the CellFactor demo is downloadable, there is a little more value in picking up the hardware. Even though there is only one level to play with, the CellFactor demo is quite enjoyable in a multiplayer situation. It's not $300 USD worth of goodness, but it is a step in the right direction. It is rather impressive on a technical level, but with the full version of the game nowhere near release, the success of PhysX can't rely on CellFactor. We have a short video (3.7MB) available, although you might prefer the 400 MB video available on the CellFactor web site.

Click to enlarge

Hangar of Doom is a demo based on Epic's UnrealEngine 3. This engine will power Unreal Tournament 2007, as well as a whole host of other games. Currently, UT2007 won't be requiring the PhysX hardware, but that shouldn't stop licensees from being able to take full advantage of it. While this demo isn't as complex as CellFactor, it demonstrates some neat ideas about the destructibility of objects in a game (planes fall apart when shot down). Again, we have a short video (2.1MB) available for download.

If you would like to try grabbing all six videos (13MB including the two from the original PhysX article) using a BitTorrent client, you may find that to be a faster solution (depending on how many people are seeding the files). Just download the torrent file if you're interested.

Unfortunately, even though these demos are very interesting and compelling, developers are not targeting levels of interactivity on this scale for the near future. With the current multiplayer trend, it doesn't make sense for developers to allow gameplay to rely on hardware that many users won't have. It isn't possible to have different gameplay in multiplayer environments. Effects are a different story, and thus the first games to support PhysX do have a tacked on feel to them.

Truly innovative uses of the AGEIA's technology are out there, but we are stuck with a chicken and egg problem. Publishers don't want to require the hardware until a large install base exists, and end users won't buy the hardware until a good number of titles support it.



Final Words

Our first few weeks playing with PhysX have been a bit of a mixed bag. On one hand, the technology is really exciting, game developers are promising support for it, two games already benefit from it, and the effects in supported games and demos do look good. On the other hand, only two games really currently support the hardware, the extent of the added content isn't worth the price of the hardware, promises can be broken, we've observed performance issues, and hardware is only really as good as the software that runs on it.

Playing the CellFactor demo for a while, messing around in the Hangar of Doom, and blowing up things in GRAW and City of Villains is a start, but it is only a start. As we said before, we can't recommend buying a PPU unless money is no object and the games which do support it are your absolute favorites. Even then, the advantages of owning the hardware are limited and questionable (due to the performance issues we've observed).

Seeing City of Villains behave in the same manner as GRAW gives us pause about the capability of near term titles to properly support and implement hardware physics support. The situation is even worse if the issue is not in the software implementation. If spawning lots of effects on the PhysX card makes the system stutter, then it defeats the purpose of having such a card in the first place. If similar effects could be possible on the CPU or GPU with no less of a performance hit, then why spend $300?

Performance is a large issue, and without more tests to really get under the skin of what's going on, it is very hard for us to know if there is a way to fix it or not. The solution could be as simple as making better use of the hardware while idle, or as complex as redesigning an entire game/physics engine from the ground up to take advantage of the hardware features offered by AGEIA.

We are still excited about the potential of the PhysX processor, but the practicality issue is not one that can be ignored. The issues are two fold: can developers properly implement support for PhysX without impacting gameplay while still making enhancements compelling, and will end users be able to wait out the problems with performance and variety of titles until there are better implementations in more games?

From a developer standpoint, PhysX hardware would provide a fixed resource. Developers love fixed resources as one of the most difficult aspects of PC game design is targeting a wide range of system requirements. While it will be difficult to decide how to best use the hardware, once the decision is made, there is no question about what type of physics processing resources will be afforded. Hopefully this fact, combined with the potential for expanded creativity, will keep game developers interested in using the hardware.

As an end user, we would like to say that the promise of upcoming titles is enough. Unfortunately, it is not by a long shot. We still need hard and fast ways to properly compare the same physics algorithm running on a CPU, a GPU, and a PPU -- or at the very least, on a (dual/multi-core) CPU and PPU. More titles must actually be released and fully support PhysX hardware in production code. Performance issues must not exist, as stuttering framerates have nothing to do with why people spend thousands of dollars on a gaming rig.

Here's to hoping everything magically falls into place, and games like CellFactor are much closer than we think. (Hey, even reviewers can dream... right?)

Log in

Don't have an account? Sign up now