Original Link: https://www.anandtech.com/show/2352
Unreal Tournament 3 CPU & High End GPU Analysis: Next-Gen Gaming Explored
by Anand Lal Shimpi & Derek Wilson on October 17, 2007 3:35 AM EST- Posted in
- GPUs
It's been a long time coming, but we finally have Epic's first Unreal Engine 3 based game out on the PC. While the final version of Unreal Tournament 3 is still a little farther out, last week's beta release kept us occupied over the past several days as we benchmarked the engine behind Rainbow Six: Vegas, Gears of War and Bioshock.
Used in some very beautiful games, Epic's Unreal Engine 3 has been bringing us some truly next-generation game titles and is significantly more demanding on the CPU and GPU than Valve's Source engine. While far from the impossible-to-run that Oblivion was upon its release, UE3 is still more stressful on modern day hardware than most of what we've seen thus far.
The Demo Beta
Although Unreal Tournament 3 is due out before the end of the year, what Epic released is a beta of the UT3 Demo and thus it's not as polished as a final demo. The demo beta has the ability to record demos but it can't play them back, so conventional benchmarking is out. Thankfully Epic left in three scripted flybys that basically take a camera and fly around the levels in a set path, devoid of all characters.
Real world UT3 performance will be more strenuous than what these flybys show but it's the best we can muster for now. The final version of UT3 should have full demo playback functionality, with which we'll be able to provide better performance analysis. The demo beta also only ships with medium quality textures, so the final game can be even more stressful/beautiful if you so desire.
The flybys can run for an arbitrary period of time, we standardized on 90 seconds for each flyby in order to get repeatable results while still keeping the tests manageable to run. There are three flyby benchmarks that come bundled with the demo beta: DM-ShangriLa, DM-HeatRay and vCTF-Suspense.
As their names imply, the ShangriLa and HeatRay flybys are of the Shangri La and Heat Ray deathmatch levels, while the vCTF-Suspense is a flyby of the sole vehicle CTF level that comes with the demo.
Our GPU tests were run at the highest quality settings and with the -compatscale=5 switch enabled, which puts all detail settings at their highest values.
Our CPU tests were run at the default settings without the compatscale switch as we're looking to measure CPU performance and not GPU performance.
The Test
Test Setup | |
CPU | Intel Core 2 Extreme QX6850 (3.33GHz 4MB 1333FSB) |
Motherboard | Intel: Gigabyte GA-P35C-DS3R AMD: ASUS M2N32-SLI Deluxe |
Video Cards | AMD Radeon HD 2900 XT AMD Radeon X1950 XTX NVIDIA GeForce 8800 Ultra NVIDIA GeForce 8800 GTX NVIDIA GeForce 8800 GTS 320MB NVIDIA GeForce 7900 GTX |
Video Drivers | AMD: Catalyst 7.10 NVIDIA: 163.75 |
Hard Drive | Seagate 7200.9 300GB 8MB 7200RPM |
RAM | 2x1GB Corsair XMS2 PC2-6400 4-4-4-12 |
Operating System | Windows Vista Ultimate 32-bit |
UT3 Teaches us about CPU Architecture
For our first real look at Epic's Unreal Engine 3 on the PC, we've got a number of questions to answer. First and foremost we want to know what sort of CPU requirements Epic's most impressive engine to date commands.
Obviously the GPU side will be more important, but it's rare that we get a brand new engine to really evaluate CPU architecture with so we took this opportunity to do just that. While we've had other UE3 based games in the past (e.g. Rainbow Six: Vegas, Bioshock), this is the first Epic created title at our disposal.
The limited benchmarking support of the UT3 Demo beta unfortunately doesn't lend itself to being the best CPU test. The built-in flybys don't have much in the way of real-world physics as the CPU spends its extra time calculating spinning weapons and the position of the camera flying around, but there are no explosions or damage to take into account. The final game may have a different impact on CPU usage, but we'd expect things to get more CPU-intensive, not less, in real world scenarios. We'll do the best we can with what we have, so let's get to it.
Cache Scaling: 1MB, 2MB, 4MB
One thing we noticed about the latest version of Valve's Source engine is that it is very sensitive to cache sizes and memory speed in general, which is important to realize given that there are large differences in cache size between Intel's three processor tiers (E6000, E4000 and E2000).
The Pentium Dual-Core chips are quite attractive these days, especially thanks to how overclockable they are. If you look back at our Midrange CPU Roundup you'll see that we fondly recommend them, especially when mild overclocking gives you the performance of a $160 chip out of a $70 one. The problem is that if newer titles are more dependent on larger caches then these smaller L2 CPUs become less attractive; you can always overclock them, but you can't add more cache.
To see how dependent Unreal Engine 3 and the UT3 demo are on low latency memory accesses we ran 4MB, 2MB and 1MB L2 Core 2 processors at 1.8GHz to compare performance scaling.
From 1MB to 2MB there's a pretty hefty 12 - 13% increase in performance at 1.8GHz, but the difference from 2MB to 4MB is slightly more muted at 4 - 8.5%. An overall 20% increase in performance simply due to L2 cache size on Intel CPUs at 1.8GHz is impressive. We note the clock speed simply because the gap will only widen at higher clock speeds; faster CPUs are more data hungry and thus need larger caches to keep their execution units adequately fed.
In order to close the performance deficit, you'd have to run a Pentium Dual-Core at almost a 20% higher frequency than a Core 2 Duo E4000, and around a 35% higher frequency than a Core 2 Duo E6000 series processor.
FSB Scaling: 1066MHz, 1333MHz
It's not all about on-die cache with UT3; we wanted to see if the L2 cache dependency also extended to needing a fast memory subsystem as well. Intel's Core 2 CPUs still rely on an aging front side bus to make the journey out to main memory, so we toyed with increasing the FSB from 1066MHz up to 1333MHz to see how large of an impact existed on UT3.
Our original investigations into FSB performance showed that the move to 1333MHz wasn't a big deal, yielding low single-digit performance improvements in our usual tests.
Unreal Tournament 3 appears to be no different, even with four cores consuming data at 2.66GHz we're looking at a 3% increase in performance on average. The 1333MHz FSB isn't really necessary, while it would make more of a difference at smaller cache sizes, Intel just doesn't offer CPUs with small caches and 1333MHz FSBs.
Large caches? Absolutely. Faster FSB? Not necessary.
Multi-Core Gaming is Upon Us
Epic's Unreal Engine 3 is well threaded, it must be in order to run on the multi-core Microsoft and Sony consoles, and thus us multi-core PC users benefit.
Intel has been waiting for a real reason to push quad-core gaming for a while now, will UT3 be that reason?
We don't often look at single-core performance given how cheap dual-core CPUs are today, but it's important to look at where we've come from over the past couple of years.
One to two cores gives us an impressive 60% increase in performance on average, if we look back at our first dual-core processor review none of our gaming tests showed any performance increase from one to two cores. From 0 - 60% in two years isn't bad at all.
The performance improvement from 2 to 4 cores isn't anywhere near as impressive, but still reasonable. In our first two tests we see a 9% increase and the third one gives us a 20% boost, for an average 13% jump in performance.
If 3D games follow the same trend that we've seen over the past two years, it'll be another two years from now before we really see significant performance increases from quad-core processors. If in 2009 we hardly bother with dual-core chips because quad-core is so prevalent, you'll not hear any complaining from us.
Who Cares about Clock Speeds?
So far we've figured out that UT3 likes large caches, sees a huge benefit from two cores (and a minor improvement from 4) but what about raw clock speed? We took an unlocked Intel Core 2 Duo processor and ran it at 333MHz increments from 2.0GHz up to 3.33GHz, plotting performance vs. frequency on the chart below in all three flybys:
At 1024 x 768, a reasonably CPU bound resolution, the curve isn't as steep as you'd expect. Over a 66.5% increase in clock frequency, overall performance goes up less than 28%. Things like L2 cache size and microprocessor architecture in particular seem to matter more here than raw clock speed.
AMD vs. Intel - Clock for Clock
Now it's time to tackle the touchy subject: how do AMD and Intel stack up to one another? First off, let's look at identical clock speeds to compare architectures.
At 3.0GHz, granted at a CPU-bound resolution, Intel holds a 26 - 31% performance advantage over AMD. Intel's Core 2 processors have historically done better clock for clock than AMD's K8, so it's not too much of a surprise, but an important mark in the sand.
We then cranked up the resolution to 1920 x 1200, and increased the world detail slider up to 5 to give us a more realistic situation for this clock speed comparison. The results were a bit surprising:
Despite being a mostly GPU-bound scenario, Intel still managed a 9% performance advantage over AMD at 3.0GHz. We suspect that there's something fishy going on as the test is quite GPU-bound, yet going from Intel to AMD yields a reasonable performance drop.
We looked at a 3.0GHz Athlon 64 X2 and compared it to its closest Intel price competitor, the Core 2 Duo E6550 (2.33GHz) at our high res settings:
The Intel performance advantage drops to 7% on average, but it's much larger than it should be given that we're dealing with a GPU-bound scenario. Note that difference between 2.33GHz and 3.0GHz on Intel is next to nothing, thus proving the GPU-limited case, so we're either dealing with an Unreal Engine 3 issue related to either the AMD CPUs or the nForce 590 SLI chipset/drivers we used. We've let Epic know, but for now it looks like UT3 definitely prefers Intel's Core 2, even when GPU-bound.
Overall CPU Comparison
Because of the AMD performance issues we've encountered, even the Athlon 64 X2 6400+ isn't really competitive in our overall CPU tests. The 6400+ is marginally faster than the Core 2 Duo E4500, despite being priced higher than the E6750.
On the Intel side, the sweet spot for performance looks to be the Core 2 Duo E6550, or if you want to go cheaper, the E4500. Remember what we discovered about the impact of L2 cache on performance: you need around 20% more clocks to make up for a 2MB L2 deficit on Intel's CPUs, and about 35% to make up for a 3MB deficit.
High End GPU Performance
While a few titles based on the Unreal Engine 3 have already made their way onto the scene, the detail and atmosphere of Unreal Tournament 3 really show off what developers can do with this engine if they take the time to understand it. Between its first party titles and Bioshock, Epic certainly makes a solid statement that its Unreal Engine 3 is an excellent choice for leading edge graphics.
For this test, we are looking at performance scaling on high end video cards ($300+) across multiple resolutions and on multiple maps. We will absolutely be revisiting this game with midrange hardware and multiGPU configurations. In this analysis, we focus on the performance of the Suspense capture the flag map flyby. This is the most graphically intense flyby we have, and the other two maps we tested tended to exhibit similar relative performance between cards.
With our high end hardware, we've pulled out the 1920 x 1200 tests, as this is very likely to be the resolution paired with one of these parts.
The NVIDIA GeForce 8800 GTX and Ultra both outperform the AMD Radeon HD 2900 XT, which is to be expected: the 2900 XT costs much less. But the performance gap here is not huge, and the 2900 XT gets major points for that. It handily outperforms its direct competition, the 8800 GTS (both 640MB and 320MB perform nearly identically). Not surprisingly, the X1950 XTX bests the 7900 GTX, and both of these parts perform worse than the 8800 GTS cards.
If we look at the scaling graph for Suspense, we can see that the GTS cards remain above 40fps even at 2560x1600. This is quite impressive, especially for the low memory GTS, but we do have to keep in mind that this is a flyby in a demo version of the game and we may see changes in performance between now and the final version.
Also intriguing is the fact that the high end NVIDIA hardware seems to become CPU limited at less than 1600x1200. This leads to the fact that AMD's Radeon HD 2900 XT actually outperforms the 8800 Ultra at 1280x1024. The 8800 Ultra does seem to scale very well with resolution, while the 7900 GTX drops off quickly and under performs through out the test.
While the rest of this data is very similar to what we've already presented, we did go to the trouble of running the numbers. In order to present a complete picture of what we've seen on the less demanding levels, here is the rest of our data:
Final Words
We're just getting started with our UT3 performance analysis, but already there are some interesting conclusions to be had. Quite possibly the biggest takeaway from this comparison is the dramatic improvement in multi-threaded game development over the past couple of years. Starting from a point where none of our game benchmarks were multi-threaded just two years ago, here we are today with the latest and greatest from Epic, and seeing huge gains from one to two cores, and promising improvements when moving to four cores.
Quad-core gaming is still years away from being relevant (much less a requirement), but the industry has come a tremendous distance in an honestly very short period of time. We're more likely to have multi-threaded games these days than 64-bit versions of those titles, mostly thanks to the multi-core architecture in both the Xbox 360 and PlayStation 3. Like it or not, but much of PC gaming development is being driven by consoles, the numbers are simply higher on that side of the fence (even though the games themselves look better on this side).
On the GPU side, NVIDIA of course does quite well with the 8800 lineup, but the real surprise is how competitive AMD is with the Radeon HD 2900 XT. There may be some salvage yet for that architecture, if AMD could only bring out a midrange part that actually offered compelling performance...