Name: Infrastructure as a Service: Benchmarking Cloud Computing
Item: Infrastructure as a Service: Benchmarking Cloud Computing
Author: Johan De Gelas

Original Link: https://www.anandtech.com/show/4349/infrastructure-as-a-service-benchmarking-cloud-computing

Infrastructure as a Service: Benchmarking Cloud Computing

VIEW ARTICLE

by Johan De Gelas on June 2, 2011 8:50 PM EST

29 Comments

Finding a Home for Your Website

Building and deploying a heavy duty web service from the ground up is a long and costly process. At the IT section of AnandTech, we mostly focus on the fun part of the process: choosing and buying a server. However, there is much more to it. Designing the software and taking care of cooling, networking, security, availability, patching and performance is a lot of work. Add all these time investments to the CAPEX investments in your server and it is clear that doing everything yourself is a huge financial risk.

These days, almost everybody outsources a part of this process. The most basic form is collocation: you rely on a hosting provider to provide the internet bandwidth and access, the electricity, and the rack space; you take control of rest of the process. A few steps higher is unmanaged dedicated hosting services. The hosting provider takes care of all the hardware and networking. You get full administrative access to the server (for example root access for Linux), which means the client is responsible for the security and maintenance of his own dedicated box.

The next step is to outsource that part too. With managed hosting services you won’t get full control, but the hosting provider takes care of almost everything: you only have to worry about the look and content of your web service. The Service Level Agreement (SLA) guarantees the quality of service that you get.

The problem with managed and unmanaged hosting services is that they are in many cases too restrictive and don't offer enough control. If performance is lacking, for example, the hosting provider often points to the software configuration while the customer feels that the hardware and network might be the problem. It is also quite expensive to enable the web server to scale to handle peak loads, and high availability may come at a premium.

Cloud Hosting

Enter cloud hosting. Many feel that cloud computing is just old wine in new bottles, but cloud hosting is an interesting evolution. A good cloud hosting starts by building on a clustered hosting solution: instead of relying on one server, we get the high availability and the load balancing capabilities of a complete virtualized cluster.

Virtualization allows the management software to carve up the cluster any way the customers like--choose the number of CPUs, RAM and storage that you want and make your own customized server; if you need more resources for a brief period, the cluster can provide this in a few seconds and you only pay for the time that you actually use this extra capacity. Best of all, cloud hosting allows you to set up a new server in less than an hour. Cloud hosting, or Infrastructure as a Service (IaaS), is definitely something new. Technically it is evolutionary, but from the customer point of view it offers a kind of flexibility that is revolutionary.

There is a downside to the whole cloud IaaS solution: most of the information about the subject is so vague and fluffy that it is nearly useless. What exactly are you getting when you start up an Amazon Instance or your own cloud at the Terremark Enterprise Cloud?

As always, we don’t care much about the marketing fluff; we're more interested in benchmarking in true AnandTech style. We want to know what kind of performance we get when we buy a certain amount of resources. Renting 5GB of RAM is pretty straightforward: it means that our applications should be able to use up to 5GB of RAM space. But what about 5GHz--what does that mean? Is that 5GHz of nostalgic Pentium goodness; or is it 5GHz of the newest complex, out-of-order, integrated memory controller, 1 billion transistor CPU monsters? We hope to provide some answers with our investigations.

Terremark’s Enterprise Cloud

Terremark’s Enterprise Cloud is based on the “Infinistructure” platform, a mix of servers, storage, and networking with a self-service portal digitalOps. The software part of the Infinistructure Utility Computing platform is based upon VMware’s vSphere virtualization platform and makes use of vMotion, load balancing and Dynamic Resource Scheduling (DRS).

You can create a server in a few minutes by using the templates that Terremark provides (“Create Server”). This instructional movie gives a quick overview how this is done. It is a simple and straightforward process that takes 10-15 minutes at most.

The digitalOps environment interface has three tabs: Resources, Devices, and Network. The Devices tab lists all the virtual machines you have created so far. The virtual machines can be sorted/grouped as you can see below.

Creating servers (using the “Create Server” option) from the Terremark templates was very easy, but the "Create Blank Servers" option was a different story. It was not possible to mount our own ISOs. According to Terremark, this will be fixed in the next release, as you will be able to import your own Open Virtualization Format (OVF) virtual machine. Installing ISOs will not be possible yet.

The Resources tab gives you the real “cloud computing” or “IaaS” feel. Your “cloud data center” is a collection of a specified amount of processor GHz, several gigabytes of memory space, and storage usage. You also get a summary of how much of those resources you have used the past 24 hours.

More Terremark Enterprise Cloud Details

Another typical IaaS feature is “bursting”. You can see that there is a “disable/enable” burst button. If you enable bursting, you allow your virtual machines to use more than the purchased GHz or RAM space. Of course you pay a premium for the extra resources, but only for the time you really need that extra power. Terremark guarantees that you get a surplus of 20% in all circumstances. If you do not limit your burst capability, you can get what is left in the vSphere resource pool. In our case, we got up to 24GHz, up from the original 5GHz (reserved) and 10GHz (limit).

The networking part is explained here. You can see the internal, external, and public IP addresses. A basic firewall is available.

You can connect to the consoles of each virtual machine via the SSL CISCO AnyConnect VPN client. The final tab in the Environment section is network. Site to Site VPNs is also possible. Depending on where you live, logging in to digitalOps will connect you to one of the European or American data centers of Terremark. Our virtual servers were located in the data center of Amsterdam (the Netherlands). US customers will typically connect to the Miami, Washington DC, Dallas, or Santa Clara data centers.

The Terremark Enterprise Cloud became available in the US at the end of 2008; in the first half of 2009, it was also made available to European customers.

The Hardware Behind the Enterprise Cloud

The actual hardware in the Enterprise Cloud is of course a moving target. Terremark claims that every cluster is renewed every three years. In the beginning of 2009, the Enterprise Cloud was based on IBM x3850M2 servers and Fibre Channel IBM System Storage N6040 based SANs. Apparently this would mean that the early Enterprise Cloud was powered by highly clocked Xeon 7400 CPUs. The US based data centers are also using quad Xeon HP DL580 and quad Opteron HP DL585 servers.

Currently, the infrastructure we tested in Amsterdam is based on the Xeon X7542, the highest clock speed “Nehalem-EX” processor. This six-core Xeon runs at 2.67GHz and can Turbo Boost to 2.8GHz.

It is interesting that Terremark chose this particular Xeon. First of all, we have shown that servers based on the fastest Xeons 7500 come with a much lower performance/watt ratio than the Xeon X5600 and Opteron 6100 series. It looks like Terremark decided in favor of raw performance and high availability over power and cost. Second, the Xeon X7542 was not available until Q1 2010, so it is likely that we tested one of the higher performance parts of the Terremark Enterprise Cloud. Terremark claims that all server clusters are replaced every three years, so chances are slim that you will end up with the old Xeon 7400 based servers.

From the customer point of view, that is good news. We don’t have to pay the power bill and the GHz power comes from a raging bull rather than a squawking chicken. If we order 5GHz, it is is more likely that this comes from two 2.5GHz cores instead of 10 500MHz cores. Thus, we expect good CPU performance from this cloud--as you can imagine, we would not be fans of a Sparc T3 based cloud for CPU intensive loads (though network intensive might be better).

A valid concern in a cloud computing environment is that a possible attacker can be on the same network as you behind the same firewall. In other words, somebody could buy some server space to try and attack you. Terremark uses VLAN network partitioning and PCI-compliant firewalls to ensure security.

Benchmarking the Terremark Cloud

We wanted to compare the virtual IaaS servers of the Terremark Enterprise Cloud with a virtualized physical server, because that is the decision you will have to make: will you deploy your application on a server in your own local data center, or will you deploy to a virtual server in an IaaS environment?

The "In House" Reference Machine

Nowadays most applications find a home inside a virtual machine on top of a hypervisor. Since Terremark servers have Intel's Xeon 7500s inside, we decided to use a reference machine based on the same platform. We used the QSSC-4R machine, equipped with four Xeon X7560 CPUs running at 2.26GHz. We ran vSphere 4.1 Update 1, basedpon the 64-bit ESX 4.1.0 b348481 hypervisor on top of this server.

CPU	4x Xeon X7560 at 2.26GHz
RAM	16x4GB Samsung Registered DDR3-1333 at 1066MHz
Motherboard	QCI QSSC-S4R 31S4RMB00B0
Chipset	Intel 7500
BIOS version	QSSC-S4R.QCI.01.00.S012,031420111618
PSU	4x Delta DPS-850FB A S3F E62433-004 850W

Typically, a group of virtual machines share the CPU, memory and storage resources that have been allocated to their "resource pool", so we tested the "in house" machine in two ways. In the first benchmark run, virtual machines were only limited by the amount of virtual CPUs they were given. The one OLAP virtual machine got to eight virtual CPUs, and our three web servers each got two virtual CPUs. With sixteen total CPU cores, that means the OLAP machine is able to use up to eight physical CPUs and each web server is able to use two physical CPUs.

In the second benchmark setup, we limited the virtual machines (14 virtual CPUs in total) to a resource pool of 10GHz of CPU power. This is similar to the Terremark setup (as well as other "cloud" setups), which also use resource pools to make optimal use of the underlying hardware. After all, it is costly to reserve hardware resources if they are not being used.

The Terremark Virtual Server Infrastructure

We reserved 5GHz (10GHz limit) of CPU power, 10GB of RAM, and 215GB of storage space in the Terremark Enterprise Cloud. We tested this IaaS cloud in two ways. First, we disabled the burst function, which means that we are limited to a maximum of 10GHz of CPU power. Second, we enabled the burst function. In that case, the Terremark Infrastructure will offer extra CPU power, but the amount of processing power that will be made available to your server depends on how heavy the Terremark cluster is currently loaded. Terremark guarantees that in all circumstances 20% extra resources are available, and during our tests we saw up to 24GHz was made available to us.

vApus Mark II

vApus Mark II is our newest benchmark suite that tests how well servers cope with virtualizing "heavy duty" applications; we've previously explained the benchmark methodology. However, we made a few changes to make this benchmark suitable for a cloud environment. The OLTP test, the freely available test "Calling Circle" of the Oracle Swingbench Suite, was not included. The OLTP test requires SSDs or a large amount of SAS drives and this would make it costly to run such a test on rented hardware. Thus, our scores are not directly comparable to other servers we have tested in the past, but the chart below uses the same test setup for all servers.

vApusMark

It is little surprise that our reference server is able to offer the best performance. We have four VMs requesting the power of 14 virtual CPUs, so the server has ample resources to satisfy this request. As a result, 14 physical cores of 2.26GHz are allocated, good for 31.6GHz of CPU power. This is our upper limit.

Next is the Terremark cluster in burst mode. We only reserved 10GHz, but the Terremark cluster is able to offer an extra 80% of CPU power on average. The result is that the Terremark cluster is able to offer about 70% of the throughput of the "in house" server. That is pretty good: we only pay for 10GHz most of the time and although the extra 80% comes at a premium cost, we only pay for the times where we actually need it.

Finally, let us compare the two similar setups, the "native server" with a 10GHz resource pool and the Terremark servers with a similar limitation. Once again, the Terremark virtual servers achieve about 70% of the throughput. That is not superb but it's not bad either. Even if Terremark ensures that every 10GHz of CPU power allocated is backed up with real physical processing power, the Terremark cluster has to manage more virtual machines and thus the overhead is higher than on our test machine that has to manage only our test virtual machines.

Response Time

Our current virtualization stress tests are focused on measuring the maximum throughput of a certain server CPU or server platform. As a result we try to maximize CPU load by using high amounts of concurrent (simulated) users. In other words, the concurrencies are quite a bit higher than what the (virtual or physical) machine can cope with, in order to attain 95-100% CPU load. As a result, the response times are inflated above what would be acceptable in the real world. Still, it would be interesting to get an idea of the response times for our server versus the cloud server.

Our vApus Mark II test starts off with 400 concurrent users and ends with 800 concurrent users. The 400 concurrent users still cause very high CPU load on most machines (80-90%), but that should still give us a "worst case" response time scenario. In the next graph we list the response time at the lowest concurrency. Terremark's data center was in Amsterdam, and we had an 11 to 20 ms round trip delay from our lab, so to be fair to Terremark you should deduct 11 to 20 ms from the Terremark numbers.

vApusMark

Both the "in house" server with 10GHz resource pool and the 10GHz Terremark "cloud server" get hit very hard by 400 concurrent MS SQL server connections, but the 10GHz resource pool that we get from the Terremark cluster is less powerful as it needs up to 85% more time to respond. The reason for that is twofold.

First, while the limit of the resource pool is at 10GHz, only 5GHz is reserved. So depending on how hard the parent resource pool is loaded by other VMs, we probably get somewhere between 5 and 10GHz of CPU power. Second, there is some extra overhead coming from the firewalls, routers, and load balancers between us and the VM. The infrastructure of Terremark is--for safety and security reasons--quite a bit more complex than our testing environment. That could add a few tens of ms too.

We also noticed that some parts of the Terremark cluster were using slightly older drivers (vmxnet 2 instead of vmxnet 3 for example), which might result in a few lost CPU cycles. But we are nitpicking and Terremark told us it is just a matter of time before the drivers get updated to the latest version.

And there is good news too. If you are willing to pay the premium for "bursting", the Terremark cluster scales very well and is capable of giving response times similar to what an "in house" server delivers. That is the whole point of cloud computing: pay and use peak capacity only when you actually need it.

Conclusion

Cloud computing is here to stay, there is no doubt about it. Terremark's Enterprise Cloud materializes all the cloud computing promises: pay only for what you use, be able to handle peaks without oversizing your infrastructure, build a server cluster with a few mouse clicks on a self-service portal, and you don't have to worry about cooling, electricity, or complex security matters. However, Infrastructure as a Service should not be a "next... next" point and click adventure; there is some serious planning involved.

It is important to size your applications properly so you have an idea how much capacity you should reserve. If you reserve too little, you will have to pay a large premium as your virtual infrastructure will be using the bursting feature a lot. If you reserve too much, you're overspending on unused capacity. We still have to perform a thorough cost analyses. However, it seems that if you size your applications well, cloud computing is an attractive alternative.

Terremark's Enterprise Cloud does not have all the features an "in house" environment has (like mounting the ISO that you have in your portable), but it is very intuitive to set up a new server. In fact, it takes only a few tens of minutes to set up a complete virtualized data center. That is big bonus for development oriented companies who want to set up testing and staging environments very quickly.

The bigger benefit is that when it comes to responding to traffic spikes, the Terremark Enterprise Cloud delivers. In our testing it was able to offer 100% extra CPU power as needed, and sometimes potentially even more if necessary. Enabling bursting is not cheap, but your virtualized infrastructure is capable of keeping up with heavy traffic as the Enterprise Cloud scales quickly and well. One of the reasons is that the Enterprise Cloud is based on high clocked, high power Xeons, so we would definitely consider the Enterprise Cloud if your application is mission critical and gets some very spiky traffic. The fact that you can set up extra servers almost instantly is an added bonus.

Finally, I would lıke to thank Tıjl Deneut for hıs ınvaluable assıstance.

Infrastructure as a Service: Benchmarking Cloud Computing

Log in

Don't have an account? Sign up now