Original Link: https://www.anandtech.com/show/2879



At its recent financial analyst day, AMD disclosed processor and platform roadmaps for 2010 and 2011. As the target public consisted mainly of financial analysts, the presentations focused more on AMD’s strategy and competitiveness than on technical accuracy. We had a conference call with John Fruehe and Phil Hughes of AMD and we tried to find out what the new server CPU roadmap means for our readers, the IT professionals who actually configure and buy these servers.

Compared to the mobile and desktop market, AMD is doing relatively well in the server and HPC market. The early delivery of the six-core Opteron (codenamed Istanbul) enabled Cray to build the fastest supercomputer in the world (at least for Q4 2009). It's called the the Cray XT5-HE “Jaguar” with 224162 cores, good for almost 1.76 million GFlops. The Opteron EE made heads turn in the low power cloud computing market, and the six-core Opteron is a good price/performance alternative in the rest of the server world. And last but not least, the 4-socket 84xx Opterons are the unchallenged champions in the quad socket world.

Nevertheless, AMD’s position in the server and HPC market is seriously threatened. An impressive 95 out of the top 500 supercomputers contain Intel's "Nehalem-EP" Xeon 5500 processors. Intel’s star has been rising fast in the HPC market since the introduction of the Intel Xeon 5500. Intel’s Nehalem EX is almost ready to attack the quad socket market. And there's more.

AMD created a very “cool” niche market with the 40W ACP (60W TDP) Opteron EE. Large power limited datacenters bought these CPUs in quantities of a few (and more!) thousands at once. Just a few months ago, Intel also introduced a 45 Watt Xeon L3426 at 1.86 GHz based on their Lynfield core (LGA1156 socket). Considering that AMD’s ACP numbers are rather optimistic and Intel’s TDPs are rather pessimistic, the 8-thread quadcore 1.86 GHz L3426 ($284) makes the six-core 1.8 GHz Opteron 2419EE look expensive ($989). The former can push it’s clock up to 3.2 GHz under single threaded loads, and is thus a really interesting option if your application has a significant part of non-parallel code.

So far AMD has countered Intel’s higher “per core” performance with 50% more cores. Indeed, the six-core Opteron can keep up with the Xeon 5500 in quite a few applications. But Intel is readying a slightly improved six-core version of the Xeon 5500 series called Westmere-EP in the first half of 2010. Being a 32 nm high-K dielectric CPU, the six-core Westmere-EP wil offer about the same power consumption with six-cores under load as the quadcore Xeon 5500 (Nehalem EP). At idle, Westmere-EP will consume less (14 to 22% less leakage). Westmere-EP’s architecture is identical to that of the Nehalem EP, with the exception of a 50% larger L3 cache (12 instead of 8 MB) and support for special AES instructions.

AMD's Answer

It was hardly noticeable but AMD made a historic step forward in September 2009 with the introduction of it’s own server chipsets. For the first time, AMD is a real server platform supplier, in control of both the CPU and chipset. The previous AMD server platform was mostly based on NVIDIA's nForce 3600 Pro. The nForce 3600 gave some system administrators quite a few headaches, especially in combination with VMware’s ESX. VMware’s ESX installed flawlessly on all Intel platforms we have tried so far, but it was unpredictable whether or not an nForce board would work with ESX. Of course, the added value of a tier one OEM is that they sort these things out and offer you a driver + hardware platform that is certified for ESX and others. So you could say that this was a non-issue for HP, SUN and Dell buyers (I have hardly seen any IBM Opteron based servers in the wild). Still, it is good to see that AMD is now completely responsible and in charge of it’s own server platform.

Below you find the specs of AMD’s northbridge server chipsets:
 
 
And next the southbridge chip. 
 
 
 

At the moment, the impact of the “Fiorano” or SR56xx chipsets is negligible. Most server vendors are preparing the servers based on the C32 socket and G34 socket and don’t feel like investing in the socket-F server platform which is at the end of its long road. Only Tyan and Supermicron, which focus mostly on the HPC market, offer servers based on the AMD SR5690 chipset right now.



Server CPUs in 2010

AMD’s best core in 2010 is a slightly improved revision of the current six-core Opteron “Istanbul” with the following additions:

• Finally a “real” C1E state which reduces power for each core that is idleing
• Support for DDR-3

In theory, DDR-3 1333 offers 66% higher bandwidth, but in practice the Stream benchmark does not measure more than a 25% boost in bandwidth. The latency of going off-die is about the same. That means that the performance increase in most server applications will not be tangible. Only the most bandwidth intensive HPC applications will get a boost of 10 to 20%.

Currently, AMD's six-core Opteron can match the performance of Intel’s quadcore Xeon 5500 at the same clockspeed in some important server applications: OLAP databases, virtualization and web applications. Intel’s best Xeon wins with a significant margin in OLTP, ERP and rendering. A large part of the HPC market is a lost cause: a quadcore Intel Xeon 5570 at 2.93 GHz is about twice as fast as a AMD Opteron 2389 at 2.9 GHz. The fact that we could not find any Opteron 2435 results in LS-Dyna is another indication of what to expect: the 10-20% higher performance in HPC applications will not be a large step forward.

Intel is going to increase performance by 20-30% per CPU (50% more cores), while AMD’s CPUs will see only marginal increases. So basically, Intel’s performance advantage is going to grow by 20 to 30%, except in HPC workloads where it is already running circles around the competition. Not an enviable position to be in for AMD.

Suppose that you are the strategic brain behind AMD. The competition offers better “per chip” and “per core” performance. The last thing you want to do is to offer the same kind of server platform. If a six-core Opteron (“Lisbon") goes head to head with a six-core Xeon (“westmere EP”), it will not be pretty: the Intel chip will beat the AMD chip in performance and performance/watt (remember, westmere EP is a 32 nm CPU). Despite this, AMD found some clever ways to make their server platforms interesting…

Cheaper 4-Socket Servers

 

“Know your enemies and know yourself”.

In which usage scenario’s are Intel’s offerings less compelling? The Nehalem-EX is a powerful platform, but it is also a completely different one than the “Westmere EP” platform. The Nehalem-EX's most important market is the 4-socket/8-socket x86 market, where about 400,000 servers are sold per year, or about 5% of the total x86 server market. It is also a pretty complex platform with two I/O hubs and 16 (!) memory buffers chips on a 4-socket board. The Nehalem EX platform does not only want to conquer the high end 4 and 8-socket x86 server market, it also wants to convince the more paranoid RISC and Itanium buyers:

 
 
 
AMD uses the same building blocks for it’s midrange 4-socket platform as it does for the high-end 2-socket platform and calls it the G34 infrastructure. The consequence is that the RAS features stay the same, and as a result, AMD can not completely compete with the Nehalem EX platform when it comes to RAS. But that is not really a problem, as some of the "high-end" RAS features aren't used by 98% of the x86 crowd who buy the more expensive 2-socket and 4-socket servers. To compete with the 8 core/16 thread Nehalem EX, AMD puts two DDR3 Istanbuls together, which communicate via a hypertransport link and calls it a twelve core Opteron 6100 (Socket G34). A server based on the Opteron 6100 can probably come close to the performance of the lower-end and midrange Nehalem EX, but it is a lot cheaper to design and produce. The disadvantage is that it only has 12 DIMM slots per CPU, while the Nehalem EX has 16 DIMM slots per CPU.

Our first impression is that AMD will find it hard to win the high end database and ERP market. The quadcore Nehalem 5500 already outperforms the six-core Opteron “Istanbul” by a large margin (30-50%). The Opteron 6100 also has 50% more cores, but it is likely that a “native octalcore” will scale a bit better than a two times 6-core design. For the virtualization market, the higher amount of DIMM slots are an advantage for the Nehalem EX. At first sight, it looks like it will be pretty tough for AMD to regain market share in this part of the server market.

Expensive 2-Socket Servers

When it comes to expensive 2-socket servers, AMD positioning is cunning. In the midrange we will find servers with sixteen Opteron cores (2-socket x 2-quad-core die per socket) offering 8 memory channels and 24 DIMM slots. Performance will probably be “close enough” to the Westmere EP servers , which can only offer six memory channels and 18 DIMM slots. The extra amount of memory bandwidth might make a dual Opteron 6100 attractive to the HPC folks, while the higher amounts of DIMM slots together with a competitive price may very well convince the virtualization market.

Midrange and Budget 2-Socket Servers

 
When it comes to the midrange of the 2-socket market, AMD has no choice: it must compete on price. There is no way an Opteron 4100 (“Lisbon”, socket C32) is going to be competitive with Westmere-EP at the same clockspeed. As we noted before, the former will be a few percent faster than the current six-core Opteron “Istanbul”, while the Westmere chip is at least 20% faster than it famous older brother. The fastest Lisbons are probably not even going to be able to keep up the low clocked six-core Westmere-EPs. So the Opteron 4100 and “San Marino” platform have only one mission: to be a lot cheaper than the low-end Westmere servers. To increase the performance/watt ratio, the San Marino servers do not support the 105-137W SE CPUs. This ensures that the server vendors do not have to overbuild the voltage regulators and PSU, which in turn lowers the overall power a server consumes when running with 75W ACP parts.

Ultra Low Power Server

AMD had some succes in the ultra low power market and clearly wants more. The “Adelaide” platform is the successor the power optimized “Kroner” platform. Low power memory and chipset, voltage regulators and PSUs that only support low power Opterons: every component is tuned for low power. Remarkably, the ACP of the Opteron 4100 EE is lowered to a very low 35W ACP, or less than 6W per CPU. AMD feels these CPUs offer an excellent alternative to the VIA Nano and Intel Atom based servers. Instead of running one small website on an Intel Atom based server, AMD hopes that ISPs will prefer to run 6 websites on a container based solution. So each website would get it’s own 6 Watt core which is much more powerful than the best Intel Atom CPUs.

Upgrade to Bulldozer

AMD’s C32 and G34 platforms will be upgradeable to the new Valencia and Interlagos CPUs which are both based on the Bulldozer core.
 
 
 

We will discuss this core in more detail but here are some extra tidbits we managed to find out:

• Two integer clusters share fetch and decode logic but have their own dedicated Instruction and Data cache
• Integer clusters can not be shared between threads: integer cores act like a Chip Multi Processing (CMP) CPU.
• The extra integer core (schedulers, D-cache and pipelines) adds only 5% die space
• L1-caches are similar to Barcelona/Shanghai (64 KB 2-way? Not confirmed)
• Up to 4 modules share a L3-cache and Northbridge
• Two times 4 Bulldozer modules (2 x 8 "cores" or 16 cores) are about 60 to 80% faster than the twelve core Opteron 6100 CPU in SPECInt_rate.

With Bulldozer, AMD finally seems to have designed an aggressive integer core. Since the introduction of the Intel Woodcrest in 2006, Intel’s CPUs have been offering superior integer crunching performance per core. Since integer performance determines the performance of 90-95% of the server application out there, this is a big deal.

Conclusion

Intel has a very strong product lineup for each segment of the market: the massive octalcore Nehalem EX for the “mission-critical” high-end, the six-core Westmere-EP for the midrange and the “Lynfield” based Xeons for the low power market. But AMD doesn't roll over willingly: it breaks all market segment rules and shatters some (artificial?) boundaries. That will result in some very interesting opportunities for the server buyers in 2010.

So which products are worth watching or waiting for? The G34 Opteron 6100 will find a home in 48-core servers, and these servers should be a cheaper alternative to the 32-core Nehalem EX servers in the high-end. We are not completely convinced that performance and RAS features will be compelling enough to sway the typical Nehalem EX buyers (OLTP, ERP) towards an AMD Opteron server. That is our first impression, but we will give AMD the benefit of the doubt of course.

We are much more enthusiastic about AMD’s highend 2-socket platform. The fact that you will be able to buy a relatively cheap (compared to 4-socket solutions) 2-socket server with two quad channel octal cores is very attractive and a great strategic move by AMD. A platform with 16 cores (or 24 if you like) and 24 DIMM slots might attract quite a lot of typical 2-socket “virtualization consolidation” server buyers.

The other really compelling offer to the market might be the Adelaide platform, depending on how high the premium is that AMD wants for its EE Opterons. AMD has been asking pretty high prices for it’s lowest power Opterons, clearly targetting the "Facebooks" and "Googles" of the world. But if AMD is going after the Intel Atom server market, it may mean that it's going to offer some low power products in price ranges that are interesting to the rest of us.

Log in

Don't have an account? Sign up now