Original Link: https://www.anandtech.com/show/4444/amd-llano-notebook-review-a-series-fusion-apu-a8-3500m
The AMD Llano Notebook Review: Competing in the Mobile Market
by Jarred Walton & Anand Lal Shimpi on June 14, 2011 12:01 AM ESTWhat Took So Long?
AMD announced the acquisition of ATI in 2006. By 2007 AMD had a plan for CPU/GPU integration and it looked like this. The red blocks in the diagram below were GPUs, the green blocks were CPUs. Stage 1 was supposed to be dumb integration of the two (putting a CPU and GPU on the same die). The original plan called for AMD to release the first Fusion APU to come out sometime in 2008—2009. Of course that didn't happen.
Brazos, AMD's very first Fusion platform, came out in Q4 of last year. At best AMD was two years behind schedule, at worst three. So what happened?
AMD and ATI both knew that designing CPUs and GPUs were incredibly different. CPUs, at least for AMD back then, were built on a five year architecture cadence. Designers used tons of custom logic and hand layout in order to optimize for clock speed. In a general purpose microprocessor instruction latency is everything, so optimizing to lower latency wherever possible was top priority.
GPUs on the other hand come from a very different world. Drastically new architectures ship every two years, with major introductions made yearly. Very little custom logic is employed in GPU design by comparison; the architectures are highly synthesizable. Clock speed is important but it's not the end all be all. GPUs get their performance from being massively parallel, and you can always hide latency with a wide enough machine (and a parallel workload to take advantage of it).
The manufacturing strategy is also very different. Remember that at the time of the ATI acquisition, only ATI was a fabless semiconductor—AMD still owned its own fabs. ATI was used to building chips at TSMC, while AMD was fabbing everything in Dresden at what would eventually become GlobalFoundries. While the folks at GlobalFoundries have done their best to make their libraries portable for existing TSMC customers, it's not as simple as showing up with a chip design and having it work on the first go.
As much sense as AMD made when it talked about the acquisition, the two companies that came together in 2006 couldn't have been more different. The past five years have really been spent trying to make the two work together both as organizations as well as architectures.
The result really holds a lot of potential and hope for the new, unified AMD. The CPU folks learn from the GPU folks and vice versa. Let's start with APU refresh cycles. AMD CPU architectures were updated once every four or five years (K7 1999, K8 2003, K10 2007) while ATI GPUs received substantial updates yearly. The GPU folks won this battle as all AMD APUs are now built on a yearly cadence.
Chip design is also now more GPU inspired. With a yearly design cadence there's a greater focus on building easily synthesizable chips. Time to design and manufacture goes down, but so do maximum clock speeds. Given how important clock speed can be to the x86 side of the business, AMD is going to be taking more of a hybrid approach where some elements of APU designs are built the old GPU way while others use custom logic and more CPU-like layout flows.
The past few years have been very difficult for AMD but we're at the beginning of what may be a brand new company. Without the burden of expensive fabs and with the combined knowledge of two great chip companies, the new AMD has a chance but it also has a very long road ahead. Brazos was the first hint of success along that road and today we have the second. Her name is Llano.
The Llano A-Series APU
Although Llano is targeted solely at the mainstream, it is home to a number of firsts for AMD. This is AMD's first chip built on a 32nm SOI process at GlobalFoundries, it is AMD's first microprocessor to feature more than a billion transistors, and as you'll soon see it's the first platform with integrated graphics that's actually worth a damn.
AMD is building two distinct versions of Llano, although only one will be available at launch. There's the quad-core, or big Llano, with four 32nm CPU cores and a 400 core GPU. This chip weighs in at 1.45 billion transistors, nearly 50% more than Sandy Bridge. Around half of the chip is dedicated to the GPU however, so those are tightly packed transistors resulting in a die size that's only 5% larger than Sandy Bridge.
CPU Specification Comparison | ||||||||
CPU | Manufacturing Process | Cores | Transistor Count | Die Size | ||||
AMD Llano 4C | 32nm | 4 | 1.45B | 228mm2 | ||||
AMD Llano 2C | 32nm | 2 | 758M | ? | ||||
AMD Thuban 6C | 45nm | 6 | 904M | 346mm2 | ||||
AMD Deneb 4C | 45nm | 4 | 758M | 258mm2 | ||||
Intel Gulftown 6C | 32nm | 6 | 1.17B | 240mm2 | ||||
Intel Nehalem/Bloomfield 4C | 45nm | 4 | 731M | 263mm2 | ||||
Intel Sandy Bridge 4C | 32nm | 4 | 995M | 216mm2 | ||||
Intel Lynnfield 4C | 45nm | 4 | 774M | 296mm2 | ||||
Intel Clarkdale 2C | 32nm | 2 | 384M | 81mm2 | ||||
Intel Sandy Bridge 2C (GT1) | 32nm | 2 | 504M | 131mm2 | ||||
Intel Sandy Bridge 2C (GT2) | 32nm | 2 | 624M | 149mm2 |
Given the transistor count, big Llano has a deceptively small amount of cache for the CPU cores. There is no large catch-all L3 and definitely no shared SRAM between the CPU and GPU, just a 1MB private L2 cache per core. That's more L2 cache than either the 45nm quad-core Athlon II or Phenom II parts.
Intel's Sandy Bridge die is only ~20% GPU
The little Llano is a 758 million transistor dual-core version with only 240 GPU cores. Cache sizes are unchanged; little Llano is just a smaller version for lower price points. Initially both quad- and dual-core parts will be serviced by the same 1.45B transistor die. Defective chips will have unused cores fused off and will be sold as dual-core parts. This isn't anything unusual, AMD, Intel and NVIDIA all use die harvesting as part of their overall silicon strategy. The key here is that in the coming months AMD will eventually introduce a dedicated little Llano die to avoid wasting fully functional big Llano parts on the dual-core market. This distinction is important as it indicates that AMD isn't relying on die harvesting in the long run but rather has a targeted strategy for separate market segments.
Architecturally AMD has made some minor updates to each Llano core. AMD is promising more than a 6% increase in instructions executed per clock (IPC) for the Llano cores vs. their 45nm Athlon II/Phenom II predecessors. The increase in IPC is due to the larger L2 cache, larger reorder and load/store buffers, new divide hardware, and improved hardware prefetchers.
On average I measured around a 3% performance improvement at the same clock speed as AMD's 45nm parts. Peak performance improved up to 14% however most of the gains were down in the 3—5% range. This is arguably the biggest problem that faces Llano. AMD's Phenom architecture debuted in 2007 and was updated in 2009. Llanos cores have been sitting around for the past 3-4 years with only a mild update while Intel has been through two tocks in the same timeframe. A ~6% increase in IPC isn't anywhere near close enough to bridge the gap left by Nehalem and Sandy Bridge.
Note that this comparison is without AMD's Turbo Core enabled, but more on that later.
The GPU
While the Llano CPU cores may be in need of a major overhaul, Llano's GPU is as new as it gets. Technically based off of AMD's Redwood core (Radeon HD 5570) with some enhancements, Llano's GPU is codenamed Sumo.
The DX11 GPU features five SIMD arrays, each with 80 cores for a total of 400 shader processors. Similar to the updates we saw with this year's Northern Islands GPUs, Sumo does add UVD3 support to the Redwood architecture. Of course since Sumo shares the same die as the Llano CPU cores it is built on GlobalFoundries' 32nm process, making this the first AMD GPU fabbed at GlobalFoundries and not TSMC.
For everything behind the memory controller Sumo is virtually identical to Redwood. Where Sumo differs is in its memory interface. Although Llano is AMD's first performance oriented APU, it's still constrained by a 128-bit wide DDR3 memory interface. That dual-channel memory interface has to be shared by all four Llano cores as well as the Sumo GPU and as a result, arbitration is very important.
AMD shared a few choice details about the Llano memory controller architecture. To begin, AMD guarantees more than 30GB/s of bandwidth is available between the GPU and the memory controller—in other words, the path from GPU to the memory controller won't become a bottleneck. The GPU/memory controller link (i.e. within the APU die) can apparently scale up to as much as 50GB/s to support future APUs with even faster memory interfaces. Note that unlike previous integrated graphics solutions, there is no support for dedicated external memory—this is a pure shared memory architecture.
Second, and most importantly, AMD can dynamically prioritize memory bandwidth between the CPU and GPU. In most cases, when both processors are heavily consuming data, the GPU is given priority over the CPU. Given today's workloads, prioritizing the GPU for memory accesses makes sense when it's running full tilt. The chances of you stressing all four CPU cores and running at full GPU memory bandwidth requirements are pretty slim today.
With 400 shader processors behind a shared 128-bit DDR3 memory interface, the upper bound for Sumo performance is the Radeon HD 5570. In practice, you should expect performance to be noticeably lower since the GPU does have to share its precious memory bandwidth with up to four x86 CPU cores.
The mobile version of Llano supports up to DDR3-1600 while the desktop parts can run at up to DDR3-1866. Maximum memory capacities are 32GB and 64GB for notebooks and desktops, respectively.
Llano has a total of 24 PCIe Gen 2 lanes at its disposal. Sixteen of those lanes can be used for external graphics. Four of the lanes can be used for devices that need low latency/high bandwidth access to the APU itself (e.g. Gigabit ethernet). The remaining four lanes are used to connect the APU to its sole partner in crime: the Fusion Controller Hub.
AMD is particularly proud of the display output configurations supported by Llano. The possible combinations are listed below:
Chipsets
AMD will offer two Fusion Controller Hubs (FCHs) as options for Llano: A70M and A60M. The only difference between the two is in their support for USB 3.0; the A70M has four USB 3.0 ports while the A60M has none.
Both FCHs support 6Gbps SATA and perform just as well as AMD's 8-series chipset (or Intel's Z68) with a high performance SSD. USB 3.0 performance is also comparable to 3rd party solutions we've seen deployed on motherboards already.
Power Gating
With 1.45 billion transistors on die, Llano relies on extensive power gating in order to keep things in order. The APU is split into two independent power islands: the CPU and the GPU. The memory controller and North Bridge both live on the GPU's power island. Each island has its own independent voltage source.
Everything from an individual CPU core to the entire GPU or virtually the entire APU package can be power gated. AMD provided photon recombination images to show the impact power gating the GPU can have on leakage current:
Although not depicted above, Llano can also fully power gate the x86 CPU cores or both the CPU and GPU if the entire APU is in a deep sleep state. Being able to completely power gate CPU cores or the GPU is an important part of enabling the next major feature of Llano: Turbo Core.
Turbo Core
All processors whether CPUs, GPUs or APUs have to be designed to strict thermal and power limits. OEMs need to know exactly what sort of chassis they'll be able to build around these chips and as a result the chip vendors provide guidance in the form of specifications, including the chip's thermal design point (TDP).
In the old days of microprocessors things were simple. You had a single core that ran all the time and it consumed all of the available thermal budget allocated for that core. AMD and Intel eventually enabled dynamic clock frequencies which let your single core underclock itself when it wasn't being used, which helped reduce power and extend battery life. Then came the multi-core era.
CPUs couldn't just start putting out twice as much heat now that they had two cores; instead, each core had to consume less power. The chip guys achieved this by running the cores at lower frequencies and voltages than they did in the single-core days. Two cores paved the way to four cores, which meant another reduction in clock speed per core. Sure we got much better multi-threaded performance, but for single-threaded applications performance wasn't as great as it could be. Users had to make a tradeoff: good multi-threaded performance or good single-threaded performance; you couldn't have both. Until power gating came along that is.
Without power gating you can never really shut off power to an idle core. The transistors aren't switching but power is still dissipated thanks to leakage current. Remember that transistors don't simply stop conducting electricity when they're off. The smaller they get, the more leaky our beloved transistors become. Power gating lets you physically block the flow of current to the transistors that are being gated, so when they're off, they're actually off. With an idle core shut off, now you have the extra TDP headroom to run any active cores at higher frequencies.
Intel does this with a technology it calls Turbo Boost. Intel looks at current draw and thermal sensors spread out all over the chip and determines when it has the available thermal headroom to turbo up any active cores. AMD implements a similar technology in Llano (and previously in their hex-core desktop parts) called Turbo Core.
I say similar but not identical because AMD's approach differs in a very important way. While Intel looks at current draw and temperature data, AMD looks at workload. Each activity within the Llano APU is assigned a certain power weight (e.g. an integer multiply is known to require a certain amount of power). Llano is aware of the operations it's currently working on and based on the weights associated with these operations it comes up with a general estimate of its power consumption on a per core basis. I mention this is an estimate because it correlates digital activity to power consumption; it doesn't actually measure power consumption.
Based on the number of events and their individual weights, AMD estimates the power consumption of each core and determines how much TDP headroom exists in the system. If the OS is requesting the highest p-state from the CPU and there's available TDP headroom, Llano will turbo up any active cores up to a maximum frequency. Like Sandy Bridge, Llano is able to temporarily exceed the APU's maximum TDP if it determines that the recent history of power consumption has been low enough that it'll take a while for the APU to ramp up to any thermal limits.
One major limit of Llano's Turbo Core is that the GPU can't turbo up in the event of the CPU cores being idle. Only the CPU cores can turbo up if they have available headroom. I suspect future versions of Llano will probably enable GPU Turbo Core as well:
It's unclear to me at this point what shortcomings or advantages exist for AMD's Turbo Core method vs. Intel's Turbo Boost. At the bare minimum the two are finally comparable although they use different approaches to attain a similar end result. AMD doesn't yet have a method of actually displaying Turbo Core frequencies, unfortunately, so we're operating a bit blind at this point. Over time I hope to have a better idea of how AMD's solution stacks up.
Introducing Mobile Llano
Anand has provided our coverage of Llano’s architecture and he’ll have a preview of desktop performance, but he’s leaving the mobile coverage to me (Jarred). At a high level, the breakdown of Llano is really quite simple: take a K10.5 series CPU core (dual- or quad-core), pair it up with a DX11 capable GPU core similar to AMD’s Redwood line (5600/5600M or 6500M), and then mix in power gating and Turbo Core; bake everything in a 32nm process and you’ve got Llano. Easier said than done, of course, as K10.5 parts previously used a 45nm process while Redwood used 40nm, so AMD had plenty of work to do before they could realize the simplistic overview I just described; the result is what matters, though, so let’s break out our spoons and see how the pudding tastes. Here’s the overview of the mobile A-series APUs launching today.
AMD A-Series Fusion APUs for Notebooks | |||||||
APU Model | A8-3530MX | A8-3510MX | A8-3500M | A6-3410MX | A6-3400M | A4-3310MX | A4-3300M |
CPU Cores | 4 | 4 | 4 | 4 | 4 | 2 | 2 |
CPU Clock (Base/Max) | 1.9/2.6GHz | 1.8/2.5GHz | 1.5/2.4GHz | 1.6/2.3GHz | 1.4/2.3GHz | 2.1/2.5GHz | 1.9/2.5GHz |
L2 Cache (MB) | 4 | 4 | 4 | 4 | 4 | 2 | 2 |
Radeon Model | HD 6620G | HD 6620G | HD 6620G | HD 6520G | HD 6520G | HD 6480G | HD 6480G |
Radeon Cores | 400 | 400 | 400 | 320 | 320 | 240 | 240 |
GPU Clock (MHz) | 444 | 444 | 444 | 400 | 400 | 444 | 444 |
TDP | 45W | 45W | 35W | 45W | 35W | 45W | 35W |
Max DDR3 Speed |
DDR3- 1600 DDR3L- 1333 |
DDR3- 1600 DDR3L- 1333 |
DDR3- 1333 DDR3L- 1333 |
DDR3- 1600 DDR3L- 1333 |
DDR3- 1333 DDR3L- 1333 |
DDR3- 1333 DDR3L- 1333 |
DDR3- 1333 DDR3L- 1333 |
There are two different power envelopes for Llano right now: 35W and 45W. The former models end with an M while the latter end in MX. Don’t let the relatively high TDPs fool you, as similar to Intel we’re looking at maximum TDP while idle and low-load TDP will be far lower. Based on battery life, it appears that the entire test notebook consumes around 7.42W at idle. By comparison, a slightly larger dual-core SNB notebook consumes around 7.68W when idle, so we’re very close to parity at idle. As noted earlier, all APU models come with 1MB L2 cache per core, and Turbo Core allows for cores to clock up to higher values under the right circumstances. That could prove important, as clock-for-clock K10.5 cores can’t hope to keep up with Sandy Bridge, and Sandy Bridge parts are already clocking significantly higher.
On the CPU side of the equation, there are currently only dual-core and quad-core parts, so tri-core appears dead (or at least MIA for now). The other part of the APU is the GPU cores, and here there are three options. The A6 and A8 APUs are both quad-core, but A6 has 320 Radeon cores clocked at 400MHz compared to 400 cores at 444MHz—so the 6620G is potentially 40% faster. A4 APUs trim the GPU further, with 240 cores clocked at 444MHz, and they’re the dual-core parts. The 6620G could be up to 67% faster than 6480G, under the right circumstances. As Anand mentioned, right now all of the A-series APUs are coming from the “big Llano” die, but in the future we’ll see the A4 production shift to “little Llano” instead of using harvested die.
Vision and Radeon Branding
For 2011, AMD is simplifying their Vision branding with Llano, skipping the Premium, Ultimate, and Black modifiers and instead referring to the APU. Vision E2 refers to the dual-core E-series APUs, while the A4, A6, and A8 lines correlate directly with the A-series APUs. The Radeon brand continues as an important asset, so there will be sticker options to promote quad-core and dual-core CPUs with Radeon graphics. What about the Dual Graphics, though?
With the integrated GPU finally able to approach the performance of midrange mobile GPUs, AMD is making a return to hybrid CrossFire (IGP and a dGPU working together), though the official name is now apparently “Radeon Dual Graphics” or just "Dual Graphics"; we’ve also heard it referred to as “Asymmetrical CrossFire”, and we’ll use any of these terms throughout this article.
We first saw an attempt at hybrid CrossFire with the HD 2400 and the 790 chipset, and later that extended to HD 3400 cards, but it never really impressed as it was limited to desktops and you could still get far better performance by spending an extra $10 to upgrade from a 3400 to a 3600 dGPU. The 6620G fGPU is several times more powerful than the old HD 4250 IGP, making CrossFire potential useful, especially on laptops where the power savings from shutting off the dGPU are very significant.
With Radeon Dual Graphics, AMD introduces more brands. The various Fusion GPUs (fGPUs) only work in CrossFire with specific discrete GPUs (dGPUs)—nearly all of the 6400M, 6600M, and 6700M line are eligible—giving rise to several new Radeon names. If you start with a base of a Radeon HD 6620G and add a Radeon HD 6770M to it, the resulting combination is now called a Radeon HD 6775G2. Pair it with a 6750M and you get a 6755G2. The entirety of the list is depicted in the slide from above. For now these names are just going to be listed on the notebook spec sheet, the drivers themselves will report the actual GPU you have driving the panel you're connected to. AMD is still working out the right way to expose these names through software to avoid confusion.
AMD’s Llano Mobile Test Platform
Similar to our Sandy Bridge Notebook, AMD shipped us a test notebook that likely will not actually hit the market. It’s also early hardware, as we haven’t received anything from the usual suspects, but performance and battery life should be representative of what we’ll see in shipping hardware. There’s still room for BIOS, firmware, and driver optimizations, so if anything we’d expect some scores to even improve from what we’re reporting, but for now we can get a starting point for what to expect from shipping Llano laptops and notebooks. Our test notebook is manufactured by Compal, and we understand there was a very limited production run, so what we’ve got is an existing shell with a new motherboard, slapped together for preview articles. Here are the specifications of our test system.
AMD Llano Notebook Specifications | |
Processor |
AMD A8-3500M (4x1.5GHz, 2.4GHz Turbo, 32nm, 4x1MB L2, 35W) |
Chipset | AMD A70M |
Memory | 2x2GB DDR3-1333 (Max 2x4GB) |
Graphics |
AMD Radeon HD 6620G 1GB DDR3 (400 Radeon Cores, 444MHz) AMD Radeon HD 6630M 1GB DDR3 (480 Radeon Cores, 485MHz/1.6GHz Core/RAM clocks) Dual Radeon HD 6690G2 (Asymmetrical CrossFire) |
Display | 14.0-inch LED Matte 16:9 1366x768 |
Hard Drive(s) | Hitachi Travelstar 7K500 250GB 7200RPM SATA 3Gbps Hard Disk |
Optical Drive | Blu-ray/DVDRW Combo Drive |
Networking |
Gigabit Ethernet (Realtek RTL8168/8111) 802.11b/g/n (Broadcom) |
Audio |
Realtek ALC269 HD audio Stereo speakers Headphone and microphone jacks |
Battery | 6-Cell, 58Wh battery |
Front Side | Flash reader |
Left Side |
1 x USB 3.0 HDMI 1.4a Ethernet VGA Exhaust vent AC adapter port |
Right Side |
Headphone/microphone jacks 2x USB 2.0 Optical drive Kensington lock |
Back Side | - |
Operating System | Windows 7 Home Premium 64-bit SP1 |
Dimensions | 13.5" x 9.5" x 1.3-1.5" (WxDxH) |
Weight | 4.78 lbs |
Extras |
Webcam Flash reader (MMC, SD/Mini SD, MS/Duo/Pro/Pro Duo) USB 3.0 |
AMD equipped this laptop with their highest performance 35W part, the A8-3500M. That gives us four cores running at a nominal 1.5GHz, all 400 Radeon Cores clocked at 444MHz, and the potential for Turbo Core to take the CPU has high as 2.4GHz. Here’s where we run into our first snag, unfortunately: apparently there’s no software currently available that will report the actual real-time core speeds for the CPU or GPU. Turbo Core appears to be working in some cases, but we don’t know how fast the CPU cores are running. We’ll see the results in the benchmarks in a moment, but for now it appears that the Llano Turbo Core isn’t quite as aggressive as Sandy Bridge’s Turbo Boost.
One interesting aspect of the test notebook is that it comes equipped with both the integrated Fusion GPU (fGPU) along with an HD 6630M discrete GPU (dGPU). The 6630M is a Turks core with 480 Radeon cores clocked at 485MHz (well, this GPU is clocked at 485; the specs for 6630M are actually 500MHz), with 1GB of DDR3-800 memory. We'll see what happens when we enable Dual Radeon later.
The rest of the notebook specs are pretty much what you’d expect. The hard drive is a 250GB 7200RPM model from Hitachi, so performance won’t be quite as good as the latest 500GB+ models and it won’t come anywhere near SSD levels. Networking is present and accounted for, with both Gigabit Ethernet and 2.4GHz 802.11n WiFi. The optical drive is Blu-ray capable (despite the DVDRW face plate in the pictures), and there’s even a USB 3.0 port.
We could discuss the build quality, keyboard, and screen quality, but there’s no real point in doing so on a laptop that won’t see full production. The keyboard is the “floating island” style commonly seen in Acer builds, which Compal apparently manufactures, and the LCD is a matte panel for a change (but still low contrast). The overall build quality isn’t bad, but we expect to see better retail builds from Acer, ASUS, HP, Lenovo, and others so we won’t spend any more time discussing the specifics of this laptop other than to note that it has a reasonable 58Wh battery and a 14” LCD. Expected pricing is $500 for laptops with A4 APUs, $600 for A6 APUs, and $700+ for the A8 series. Adding a discrete GPU like the 6630M (and thus enabling Asymmetrical CrossFire) should tack on another ~$100.
AMD is quoting “over eight hours” of battery life, but that’s highly dependent on what you’re doing as well as battery capacity. Since that’s going to be one of the major improvements with Llano, we’re going to start there.
Battery Life: All Day Computing
AMD makes a point of their mobile offerings (A/C/E-series APUs) all offering “all day computing”, with a note that “all day” is defined as eight hours or more. While that’s easy to do with a gigantic battery, doing so with the typical 48/56Wh batteries in mainstream laptops is a lot more difficult. One of their test notebooks apparently manages around 10.5 hours (best-case) with a 62Wh battery, compared to 6.5 hours for a similar Core i5-2410M laptop. Without specifics on all the settings, we’ll just say that our results for “similar” laptops don’t show nearly the disparity AMD achieved, but the important point is that AMD is finally competitive in battery life.
We ran our usual series of battery life tests, with the LCDs set for ~100 nits (70% brightness for the Llano laptop). We shut off WiFi for the idle test and mute audio; the Internet test is run over WiFi and repeatedly loads four tabs of content every minute, again with audio muted; finally, the H.264 playback result is done with a set of earbuds connected and WiFi disabled. Here’s how the Llano laptop stacks up to some recently reviewed laptops—you can compare Llano with other laptops in Mobile Bench.
Starting with pure battery life, only three laptops consistently offer longer battery life than the Llano system: the ASUS U41JF, MSI’s X370, and the quad-core Sandy Bridge notebook. Also, the ASUS K53E boasts better battery life in the H.264 playback test, which for whatever reason is a test where SNB has proved particularly potent. Intel’s DXVA decode may be efficient, but it's also possible it's doing less work; we're running the test again with all of AMD's video enhancement features turned off. [Update: I retested with all the AMD video enhancement features disabled, and battery life didn't change, so Intel is simply more efficient at H.264 decoding with SNB.]
Back to the discussion of battery life: all three of the laptops that beat Llano have the advantage of slightly to moderately higher battery capacities, so the comparison isn’t entirely fair. Let’s level the playing field by looking at relative battery life.
Rather amazing is that Llano actually rises to the top of the charts in the Idle test, and it’s only slightly behind the competition in the other two tests. Considering the X370 is equipped with an E-350 APU, the fact that Llano is even close is surprising. While we should note that the X370 wasn’t the most efficient of the E-350 laptops we’ve tested, we also need to point out that the 13.3” LCD is a lot closer to the 14” panel in the Llano notebook than the 11.6” panels used in the Sony YB and HP dm1z. The dual-core SNB notebook still leads in the H.264 test, and considering it has a 15.6” panel we’d say that relative battery life is very similar between the two.
We also want to talk about AMD’s claims of “all day battery life”. If we accept their definition of 8+ hours, the test laptop doesn’t actually hit that mark in our idle test. We did run the same test again at 40% LCD brightness (around 60 nits) and managed eight hours exactly, but that’s in an absolutely best-case test. For Internet surfing, which represents a more useful metric, the best way to get 8+ hours is demonstrated by ASUS’ U41JF: stuff in a higher capacity battery!
Rounding out the battery life discussion, we also tested battery life while looping 3DMark06 at native resolution (1366x768). This represents a reasonable 3D gaming scenario, and Llano still managed a reasonable 161 minutes. Considering graphics performance is a healthy step up from what Intel’s HD 3000 offers and that AMD manages double the battery life under gaming situations compared to the K53E, mobile gaming is clearly a win.
Overall, for the first time in a long time, AMD is able to offer battery life that competes with and even exceeds what Intel offers with their current mainstream offerings. There are of course a bunch of lower power Intel CPUs we could discuss, but looking at the 35W TDP parts the combination of 32nm and power gating has brought AMD back into the discussion. Even more interesting is that you should be able to get something like our test laptop for $600, possibly less, compared to dual-core SNB i5 laptops that start at $700. But then, perhaps Core i5 isn’t the best comparison for quad-core Llano, despite what AMD might like to say? Let’s move on to general performance and gaming discussions before we decide which mobile part is the “best”.
Application Performance, Round One: PCMark 7
If the battery life was a pleasant change of pace, general application unfortunately remains a weak spot for AMD. Remember that Llano uses a tweaked K10.5 architecture for the CPU portion of the core, and while L2 cache per core is doubled relative to the previous generation quad-core Phenom parts, clock speeds and IPC (instructions per clock) still appear much lower than what Intel offers. I had hoped to see Turbo Core come into play here, which makes the comparison with Toshiba’s A660D a good starting point. That notebook has a Phenom II X4 P920 (quad-core 1.6GHz) with HD 5650M graphics, so the Llano A8-3500M has very similar specs.
Before we get to the graphs, let me make a quick note that not all laptops have been tested in all applications/games. Most of the systems have been shipped back to the manufacturer, so our newer benchmarks are going to have omissions (e.g. PCMark 7). In the gaming charts later in the review, we’ll have even more omissions, and many of the slower GPUs/IGPs will only be tested at our “low” settings.
With that out of the way, let’s start our application performance comparison with PCMark7, our only all-inclusive benchmark for laptops right now. We’ve run all of the benchmarks suites in the hopes of providing a better look at overall performance; however, outside of the “Computation” suite all of the tests have a storage element. That means any system with an SSD (like the quad-core SNB unit) will boast a massive advantage over the competition. The Computation suite also has an interesting footnote in that it supports Intel’s Quick Sync for video encoding, which again gives SNB systems a massive performance advantage. You can read more about the specific suites in PCMark 7 in their whitepaper. We’ll also have two results for Llano going forward: one for using the fGPU (6620G) and a second for using the dGPU (6630M).
And here’s our first hint that Llano may not be the homerun so many were hoping to see from AMD. All of the SNB laptops are still a healthy step up from Llano in overall PCMarks—the K53E leads by 43%, and systems with quad-core SNB are faster still. Llano might appear to at least surpass the previous generation Arrandale i5-520M in Dell’s E6410, but the storage subsystem in that laptop is a particularly slow 160GB HDD and that skews the results. Then again, the overclocked Arrandale i3-380M in the ASUS U41JF falls short of Llano, so AMD is at the very least competitive with Arrandale.
Since we’re not on a level playing field as far as storage, we won’t comment too much more here, but I do have SSD-based testing complete for four of the notebooks, and once I’ve swapped in an SSD for Llano we’ll have a follow-up article. Let’s move on to application testing round two, where we’ll look at some tests where we eliminate the storage bottleneck.
Applications, Round Two: Treading Water
This time we have a more interesting competitor to look at: the Toshiba A660D. AMD says Turbo Core works at speeds of up to 2.4GHz on the A8-3500M, but we have no way of monitoring the actual CPU clocks right now. (CPU-Z if you’re wondering shows a constant 1.5GHz, but AMD says that utility doesn’t currently detect the proper clocks.) When we compare performance results between the Llano notebook and the A660D, we definitely see some differences in performance. Some of that may come from the added L2 cache and other architectural tweaks, but Cinebench R10 in particular shows a healthy 17% performance increase, even with a base clock that’s 7% lower. In the multi-threaded Cinebench result, the lead drops to 10%, which correlates well with how we’d expect Turbo Core to work. PCMark Vantage is still heavily influenced by the storage subsystem, and the storage score of 2950 on the A660D versus 3791 on the Llano suggests the Toshiba HDD is a significant bottleneck.
Looking at other laptops and tests where we’re looking purely at CPU performance, suddenly Llano starts to struggle. The Arrandale i5-520M offers 92% higher single-threaded performance in Cinebench R10 and 48% better single-threaded performance in R11.5; multi-threaded performance also goes to Arrandale, with a 23% lead in R10 and 17% lead in R11.5. x264 also gives Arrandale a decent lead, with i5-520M 17% faster in the first pass and 29% faster in the more intense second pass. The overclocked i3-380M in ASUS’ U41JF tells a similar tale—and both of these laptops are running processors from early last year. When we shift to Sandy Bridge, even without looking at the quad-core parts AMD’s CPU performance is tenuous. The i5-2520M is anywhere from 50 to 150 percent faster depending on which test we look at; even if we toss out the older Cinebench R10 single-threaded result of 150%, R11.5 given the 2520M a 94% lead. In general, then, a moderate dual-core Sandy Bridge i5-series processor looks to be at least 30% faster, so quad-core Llano really only competes with Core i3 and its lower, non-Turbo clocks.
None of the results here are particularly surprising; K10.5 even at 32nm is still largely the same performance. AMD has focused this round up upgrades more on reducing power consumption rather than increasing performance, and that’s a perfectly reasonable approach for a mobile CPU. Most of us probably aren’t doing 3D rendering, CAD/CAM, or unassisted video transcoding on our laptops anyway. It would still be great to see AMD offer up an equivalent to Intel’s Quick Sync; they have the better GPU architecture, but a dedicated decoder like Quick Sync can clearly pay dividends. Outside of that one deficit the reality is that Llano is still plenty fast. Slapping an SSD into Llano will make more of a difference than upgrading an HDD-based Llano laptop to Core i5, so if you’re looking for an inexpensive laptop that can do everything most users need, Llano is very appealing.
Fusion GPUs: A Long-Awaited Upgrade to IGPs Everywhere
During our conversations with AMD, at one point they mentioned that they prefer not to use the term “IGP” anymore since they consider it a derogatory term. I asked what we should call Llano’s graphics and they said AMD officially refers to it as the “Fusion GPU” (fGPU), so that's what we'll use going forward. Regardless of what we call it, though, there’s no doubt that the 6620G fGPU is a dramatic upgrade to the old HD 4250; in fact, the 6620G should also boast significantly better performance than Intel’s HD 3000…provided the CPU core doesn’t become a bottleneck. Let’s start with 3DMark comparisons to see just where Llano falls. Again, we have the 6620G and 6630M Llano setups tested, but now we’re adding CrossFire to the mix.
I’m including all of the 3DMark iterations to provide a broad view of graphics potential. The latest 3DMark11 release seems to be almost purely GPU-limited, but of course it requires DX11 support and thus many of the other laptops (including Intel’s IGP) fail to run it. 3DMark Vantage’s Performance defaults are about as demanding, and Llano comes out 40-50% ahead of Sandy Bridge’s HD 3000. Of course, Arrandale completely falls on its face in the Performance test, generating a result of just 161, but AMD’s old HD 4250 is only marginally better with a score of 238. Remove some of the demands with the Vantage Entry-Level preset and Sandy Bridge starts to close the gap, with the quad-core 2820QM actually coming out ahead of Llano. Things that make you go hmm….
Things don’t get any better when we look at Asymmetrical CrossFire (ACF) from Llano. 3DMark11 comes in a whopping 50% faster than the 6630M dGPU, or 78% faster than the fGPU. If that performance boost showed up in our games, things would be great, but unfortunately it doesn’t. AMD informed us just yesterday that only DX10 or DX11 games and applications will even work with ACF, so perhaps that explains why we see little to no benefit in 3DMark03/05/06. The Vantage Performance preset shows a respectable 38% increase vs. the dGPU and 68% over the fGPU, but on Entry-Level it’s only 11-14% faster, and in 3DMark03 the dGPU actually scored lower than the fGPU.
If we were to stop our analysis of graphics performance right now, I suspect there would be a lot of confusion. Llano’s fGPU is anywhere from being equal to HD 3000 to 50% faster; Asymmetrical CrossFire is either a boon or a bust. So which is it? This is why we only place a minor emphasis on 3DMarks; let’s get to some actual gaming benchmarks.
Fusion GPU Takes on Gaming
For our gaming tests, we’ll start with our Low and Medium detail gaming benchmarks. We’ll save Asymmetrical CrossFire and High detail gaming for the next page. Note that we run all of the Low and Medium tests using DX9/DX10 modes, even on games that support DX11. There reason is simple: in nearly every game with DX11 support, enabling it often proves too taxing for anything but the fastest discrete GPUs—or in other cases, the graphics quality difference is negligible (Civilization V, Metro 2033, and Total War: Shogun 2 fall into this category). When we refresh our list of games later this year, we might start testing DX11 more often, but for now we’ll stick with DX9/10 on mainstream laptop testing.
Low Detail Gaming
Medium Detail Gaming
The age-old adage is that if you want a good gaming experience, you need to put more money into the graphics subsystem. With Llano, we need to modify that and add a corollary that you can trade a faster CPU for a better IGP/fGPU and end up with acceptable gaming performance. The 6620G is the first integrated GPU that can actually keep pace with the midrange discrete GPUs (at least on laptops—desktop GPUs are a different story). The Llano A8-3500M comes out ahead of AMD’s previous P920 + HD 5650 in many of the results, while A8-3500M + HD 6630M adds anywhere from 3-40% and averages 24% faster than the 6620G.
If we look at the competition, A8-3500M is anywhere from -3.5% to 167% faster than Intel’s HD 3000 with dual-core SNB, running everything at our Low presets. The sole victory for Intel comes in the lightly-threaded StarCraft II where Intel can really flex its Turbo Boost muscles. On the other end of the spectrum, HD 3000 turns in extremely poor results in Civilization V, Mafia II, and Metro 2033—games where Llano is at least playable. On average, the A8-3500M is 50% faster than HD 3000 at Low settings; move up to our Medium settings and Llano is 76% faster on average, with leads in every title ranging from 36% (StarCraft II is again the worst showing for AMD) to as much as 204% (Civilization V).
Bring the older Arrandale into the picture and things get even more lopsided. Never mind the fact that Arrandale’s HD Graphics are unable to break 30FPS in most of our test games at minimum detail (StarCraft II being the one exception); at our Low presets, A8-3500M puts Arrandale to shame, with performance anywhere from 57 to 472 percent faster and 223% faster on average. Obviously, you don’t want to try gaming on Arrandale’s IGP, which is where laptops like the ASUS U41JF come into play. You can pick up the U41JF for just over $800, but while the CPU is certainly faster, gaming performance with the GT 425M is only 15% faster than the stock A8-3500M on average, with Llano pulling wins in Civ5, Metro 2033, and TWS2 at Medium detail.
As a final note on gaming performance, while the A8-3500M isn’t clocked particularly high, there’s still more performance on tap in many games. Switching over to the 6630M dGPU improves performance by an average of 20% over the fGPU. A few titles only show an incremental performance increase (Metro 2033 and Mafia II); the biggest performance gains come in DiRT 2 and Total War: Shogun 2, with performance increases of 40%/35% respectively at low detail and 20%/25% at medium detail.
The target price of $700 for A8 laptops could make for a reasonably powerful and inexpensive gaming laptop, and if it’s like current AMD notebooks I suspect we’ll see A8 laptop prices dip into the low $600s. $800 for A8 Llano with the 6630M becomes a more difficult proposition, considering it would butt up squarely against laptops like the U41JF. Gaming performance would be similar, but the larger battery would give ASUS (and Intel) the lead in that area and gaming performance would be largely a wash. Depending on how much of a threat Intel deems Llano to be, we could see SNB laptops similar to the U41JF push pricing down, but for now Llano certainly fills a popular market niche.
High Detail Gaming and Asymmetrical CrossFire Misfire
Update, 8/10/2011: Just to let you know, AMD managed to get me a new BIOS to address some of the rendering issues I experienced with CrossFire. As you'll read below, I had problems in several titles, and I still take exception with the "DX10/11 only" approach. I can name dozens of good games out there that are DX9-only that released in the past year. Anyway, the updated BIOS has at least addressed the rendering errors I noticed, so retail Asymmetrical CrossFire laptops should do better. With that disclaimer out of the way, here's my initial experience from two months back.
So far, the story for Llano and gaming has been quite good. The notebook we received comes with the 6620G fGPU along with a 6630M dGPU, though, and AMD has enabled Asymmetrical CrossFire...sort of. The results for ACF in 3DMarks were interesting if only academic, so now we're going to look at how Llano performs with ACF enabled and running at our High detail settings (using an external LCD).
Just a warning before we get to the charts: this is preproduction hardware, and AMD informed us (post-review) that they stopped worrying about fixing BIOS issues on this particular laptop because it isn't going to see production. AMD sent us an updated driver late last week that was supposed to address some of the CrossFire issues, but in our experience it didn’t help and actually hurt in a few titles. Given that the heart of the problem is in the current BIOS, that might also explain why Turbo Core doesn't seem to be working as well as we would expect.
AMD also notes that the current ACF implementation only works on DX10/11 games, and at present that's their plan going forwards as the majority of software vendors state they will be moving to DX10/11. While the future might be a DX10/11 world, the fact is that many recent titles are still DX9 only. Even at our "High" settings, five of our ten titles are tested in DX9 mode (DiRT 2, L4D2, Mafia II, Mass Effect 2, and StarCraft II—lots of twos in there, I know!), so they shouldn't show any improvement...and they don't. Of those five titles, four don't have any support for DX10/11 (DiRT 2 being the exception), and even very recent, high-profile games are still shipping in DX9 form (e.g. Crysis 2, though a DX11 patch is still in the works). Not showing an improvement is one thing, but as we'll see in a moment, enabling CrossFire mode actually reduces performance by 10-15% relative to the dGPU. That's the bad news. The good news is that the other half of the games show moderate performance increases over the dGPU.
If that doesn't make the situation patently clear, CrossFire on our test unit is largely not in what we consider a working state. With that out of the way, here are the results we did managed to cobble together:
Given this is preproduction hardware that won't see a store shelf, the above results are almost meaningless. If ACF can provide at least a 30% increase on average, like what we see in TWS2, it could be useful. If it can't do at least 30%, it seems like switchable graphics with an HD 6730M would be less problematic and provide better performance. The only takeaway we have right now is that ACF is largely not working on this particular unit. Shipping hardware and drivers should be better (they could hardly be worse), but let's just do a quick discussion of the results.
If we just look at games with DX10/11 enabled, the story isn't too bad. Not accounting for the rendering issues noted below, ACF is able to boost performance by an average of 24% over the dGPU at our High settings. We didn’t include the Low and Medium results for ACF on the previous page for what should be obvious reasons, but if the results at our High settings are less than stellar, Low and Medium settings are even less impressive. Trimming our list of titles to three games (we tested TWS2 and STALKER in DX9 mode at our Low and Medium settings), ACF manages to average a 1% performance increase over the dGPU at Low and a 14% increase at Medium, but Civ5 still had to contend with rendering errors and Metro 2033 showed reduced performance.
In terms of rendering quality, ACF is very buggy on the test system; the default BIOS settings initially resulted in corrupted output for most games and 3D apps, but even with the correct settings we still encountered plenty of rendering errors. Civilization V only had one GPU rendering everything properly while units were missing on the other GPU, so you’d get a flicker every other frame with units appearing/disappearing. At higher detail settings, the corruption was even more severe. STALKER: Call of Pripyat and Total War: Shogun 2 also had rendering errors/flickering at higher quality settings. Since we didn't enable DX10/11 until our High defaults, right when ACF is supposed to start helping is where we encountered rendering issues.
Just to be clear: none of this means that Asymmetrical CrossFire is a bad idea; it just needs a lot more work on the drivers and BIOS. If/when we get a retail notebook that includes Asymmetrical CrossFire support, we’ll be sure to revisit the topic. Why ACF isn’t supported in DX9 is still a looming question, and AMD’s drivers need a much better interface for managing switchable graphics profiles. A list of all supported games with a central location to change all the settings would be a huge step up from the current UI, and users need the ability to enable/disable CrossFire support on a per-game basis if AMD wants anyone to actually use ACF. We also hope AMD rethinks their “only for DX10/DX11 modes” stance; CrossFire has worked with numerous DX9 games in the past, and what we’d like to see is ACF with the same list of supported games as regular CrossFire. If nothing else, having ACF enabled shouldn't reduce performance in DX9 titles.
In summary: we don't know if ACF will really help that much. We tested Asymmetrical CrossFire on what is, at best, beta hardware and drivers, and it didn't work very well. We want it to work, and the potential is certainly there, but we'll need to wait for a better test platform. To be continued....
AMD’s Llano Platform: Contending for your Mobile Dollar
When we first heard about Llano, it sounded like a good idea but we had concerns it might be too little too late. Core 2 was already beating AMD in the mobile sector, and since then we’ve had Arrandale and then Sandy Bridge. What was once a performance and battery life deficit has grown to a gaping chasm, and returning yet again to the aging K10/K10.5 architecture—which is a reworking of K8—felt like AMD’s mobile platforms were going to continue their history of stagnation. This is an important sector as well, as many businesses are shifting to completely mobile PCs and laptops are now outselling desktops. What we get with Llano is in some cases better than we were hoping for and in others not enough, but make no mistake: Llano is really all about the mobile sector.
The power and battery life optimizations are the best evidence of this: Llano offers roughly triple the battery life of the previous generation Danube platform, all while providing similar to superior CPU performance and a dramatic upgrade to graphics performance. From that perspective, Llano is a clear win for AMD, allowing their less expensive notebooks to finally offer competitive battery life with superior graphics. If you do a lot of complex CPU calculations (and you can’t or won’t switch to GPGPU computations), Intel’s Sandy Bridge processors are still faster than Llano, often times by a large amount. However, not everyone needs a quad-core Sandy Bridge notebook for $1000+. That’s where AMD hopes to come into the picture, offering a viable entry-level gaming notebook that can handle all the other mundane tasks you might want for under $700.
What we can’t really comment on is how gaming potential and performance will scale up and down with the rest of the Llano lineup. The A8-3500M is very likely one of the best A-series offerings, with the full 400 Radeon cores and four CPU cores. The A6 series has similar quad-core clock speeds, but the fGPU is trimmed down to 320 cores and the clock drops from 444MHz to 400MHz—so the HD 6520G provides 72% of the compute power of the 6620G we’ve looked at today. In a similar vein, dual-core processors aren’t completely dead yet, as Intel continues to prove with their i3/i5 series parts. Unfortunately, with the A4 Llano parts you get higher clocked dual-core with only 240 Radeon cores—the 6480G has 60% of the compute power of 6620G. If the fGPU is largely bandwidth limited, the drop in computation performance may not matter, but where the A8-3500M can generally handle medium detail 1366x768 gaming, A6 will likely require a few lowered settings to hit 30FPS and A4 will mostly fill the role of minimum detail 768p gaming.
The other interesting takeaway with Llano is that Brazos has just become far less interesting for many of us. Double the performance of Atom still isn’t enough, and now it’s only a bit more money to double or triple CPU performance while gaming (graphics) performance is two to four times faster than E-350. I’m pretty much content to say that I have no interest in Atom—even Cedar Trail—outside of tablets and smartphones, and Brazos while better is in a similar position. Those who like 10” netbooks are welcome to disagree, but that’s really the only stronghold where Llano and Sandy Bridge can’t quite compete—and Intel is even encroaching on that market with their new Ultrabook platform. Intel looks set to leave Atom out of the laptop race going forward, shifting it to tablets and other fanless designs, and Llano looks set to push Brazos into a similar niche. That’s fine with me, since in a couple more years we’re likely to see performance equal to or better than today’s Llano on tablets and smartphones.
As usual, your choice of laptop will once more come down to deciding what you really want/need. If you want maximum performance with reasonable battery life, Intel’s quad-core Sandy Bridge parts matched with NVIDIA’s Optimus-enabled GPUs are the best way to get there, but you’ll pay quite a bit more for the privilege. If you’re willing to forego battery life, Sandy Bridge with discrete-only AMD or NVIDIA graphics will power the fastest notebooks you can currently find, but they’re bulky, heavy, and expensive. It’s when you start talking about moderate priced laptops that Llano becomes important.
Some people will try to tell you that AMD will sell you more CPU cores than Intel for a lower price, but unlike desktop parts, mobile Llano cores don’t clock high enough to consistently outperform dual-core Intel processors. Even in heavily-threaded benchmarks where quad-core CPUs can shine, dual-core i5 processors are still typically 30% faster than the A8-3500M. Instead of selling you more CPU cores for less money, what AMD is now selling is substantially better graphics for less money. Home theater enthusiasts might find a use for such parts as well, but really the purpose of GPUs is simple: they’re for playing games. Until and unless GPGPU can take off and provide some killer apps, businesses and non-gaming folk alike will be better served by Intel’s processors—unless you want to save $100 to $200.
If you’re after a good all-around laptop for $500-$600, Llano should have just what you need; and for gaming, it will likely power some of the best sub-$700 gaming capable laptops you're going to find right now (short of fire-sales and refurbished laptops). For those interested, the only viable gaming notebook (e.g. with at least HD 5650M/6530M or GT420M/520M GPU) we can find for under $700 with an Intel CPU is the MSI CX640 at $650. Hopefully we'll see Llano offerings drop into the sub-$600 range with A8 APUs.
Now if you want to have your cake and eat it too, the APU to wait for would be Trinity. Due out somewhere in the 2012 - 2013 timeframe, combine a Bulldozer derived architecture with AMD's next-generation GPU architecture and you've got Trinity. Third time's the charm, right?