Comments Locked

45 Comments

Back to Article

  • DanNeely - Monday, December 15, 2014 - link

    "Users should also note that only one [motherboard molex plug] needs to be connected when 3+ PCIe devices are used to help boost power. I quizzed them on SATA power connectors instead, or a 6-pin PCIe, however the response was not enthusiastic."

    I can understand them not liking the idea of using a PCIe cable if they don't need more than a single 12v pin for extra power because a lot of users wouldn't have an extra available to use without going for a kludgy molex-pcie adapter; but what's the problem with using a sata power plug?
  • themeinme75 - Tuesday, December 16, 2014 - link

    I find this interesting the molex spec is 60 watts on the 12v line, even though you probably get 75+ safely and sata would have 54 watts for 12v line. I think for a MB that cost 500+ with user that plan multiple GPU I think you can get a power supply with plenty of connections.
  • themeinme75 - Tuesday, December 16, 2014 - link

    http://www.moddiy.com/pages/Power-Supply-Connector...
  • wolrah - Tuesday, December 16, 2014 - link

    My guess is it's because the Molex connector is significantly more durable and any modern system is sure to have one available.

    A home server would be a pretty logical role for this board, so you might have already devoted all 12 of the SATA power connectors most power supplies ship with to running hard drives. Likewise as you note a workstation could easily use up their PCIe power connections with GPUs.

    That leaves the Molex as the one of the three regularly available power connections which can be most expected to be available.
  • Dakosta Le'Marko - Thursday, December 18, 2014 - link

    Stunning! I've started averaging 85 dollars/hourly since i started working online half a year ago... What i do is to sit at home several hours each day and do simple jobs i get from this company that i found over the internet... I am very happy to share this with you... It's an awesome side job to have http://orkan201.tk
  • ddriver - Monday, December 15, 2014 - link

    Price : US (Newegg) ->
    Search Terms: "x99 ws-e 10g"
    We have found 0 items that match "x99 ws-e 10g".
  • Ian Cutress - Monday, December 15, 2014 - link

    While the product is officially announced, it doesn't seem to have filtered through yet, hence we don't know the pricing. When it gets to Newegg, hopefully that link will show it.
  • ShieTar - Monday, December 15, 2014 - link

    There are a few offers in Europe already, for >700€ (>900$):

    http://preview.tinyurl.com/mduqtgu
  • akula2 - Thursday, December 18, 2014 - link

    Thanks for mentioning the price. I'll settle for Asus X99-E WS boards.
  • Pcgeek21 - Monday, December 15, 2014 - link

    Were jumbo frames used for the 10GBASE-T testing? They would need to be enabled both inside the VMs and in ESXi's virtual switches (if they were used). My recollection is that jumbo frames were created to deal with the problems you encountered with CPU usage on 10Gb links.
  • Jammrock - Monday, December 15, 2014 - link

    You can achieve 10Gb speeds (~950MB/s-1.08Gb/s real world speeds) on a single point-to-point transfer if you have the right hardware and you know how to configure it. Out-of-the-box...not likely. The following assumes your network hardware is all 10Gb and jumbo frame capable and enabled.

    1. You need a source that can sustain ~1GB/s reads and a destination that can sustain ~1GB/s writes. A couple of high end PCIe SSD cards, RAID'ed SSDs or a RAMdisk can pull it off, and that's about it.

    2. You need a protocol that supports TCP multi-channel. SMB3, when both source and destination are SMB3 capable (Win8+/2012+), does this by default. Multi-threaded FTP can. I think NFS can, but I'm not 100% certain...

    3. You need RSS (Receive Side Scaling), LSO (Large Send/Segment Offloading), TCP window scaling (auto tuning) and TCP Chimney (for Windows), optionally RSC (Receive Side Coalescing), are setup and configured properly.

    Even modern processors cannot handle 10Gb worth of reads on a single processor core, thus RSS needs setup with a minimum of 4 physical processor cores (RSS doesn't work on Hyperthreaded logical cores), possibly 8, depending on processor, to distribute receive load across multiple processors. You can do this via PowerShell (Windows) with the Set-NetAdapterRss cmdlet.

    # example command for a 4 physical core proc w/ Hyerpthreading (0,2,4,6 are physical, 1,3,5,7 are logical....pretty much a rule of thumb)
    Set-NetAdapterRss -Name "<adapter name>" -NumberOfReceiveQueues 4 -BaseProcessorNumber 0 -MaxProcessorNumber 6 -MaxProcessors 4 -Enabled

    LSO is set in the NIC drivers and/or PowerShell. This allows Windows/Linux/whatever to create a large packet (say 64KB-1MB) and let the NIC hardware handle segmenting the data to the MSS value. This lowers processor usage on the host and makes the transfer faster since segmenting is faster in hardware and the OS has to do less work.

    RSC is set in Windows or Linux and on the NIC. This does the opposite of LSO. Small chunks are received by the NIC and made into one large packet that is sent to the OS. Lowers processor overhead on the receive side.

    While TCP Chimney gets a bad rap in the 1Gb world, it shines in the 10Gb world. Set it to Automatic in Windows 8+/2012+ and it will only enable on 10Gb networks under certain circumstances.

    TCP window scaling (auto-tuning in the Windows world) is an absolute must. Without it the TCP windows will never grow large enough to sustain high throughput on a 10Gb connection.

    4. Enable 9K jumbo frames (some people say no, some say yes...really depends on hardware, so test both ways).

    5. Use a 50GB file or larger. You need time for the connection to ramp up before you reach max speeds. A 1GB file is way too small to test a 10Gb connection. To create a dummy file in Windows use fsutil: fsutil file createnew E:\Temp\50GBFile.txt 53687091200

    This will normally get you in the 900 MB/s range on modern hardware and fast storage. LSO and TCP Chimney makes tx faster. RSS/RSC make rx faster. TCP multi-channel and auto-tuning give you 4-8 fast data streams (one for each RSS queue) on a single line. The end result is real world 10Gb data transfers.

    While 1.25GB/s is the theoretical maximum, that is not the real world max. 1.08GB/s is the fastest I've gone on a single data transfer on 10Gb Ethernet. That was between two servers in the same blade chassis (essentially point-to-point with no switching) using RAM disks. You can't really go much faster than that due to protocol overhead and something called bandwidth delay product.
  • Ian Cutress - Monday, December 15, 2014 - link

    Hi Jammrock, I've added a link in the main article to this comment - it is a helpful list of information for sure.

    For some clarification, our VMs were set for RAMDisk-to-RAMDisk operation, but due only having UDIMMs on hand the size of our RAMDisks was limited. Due to our internal use without a switch, not a lot else was changed in the operation, making it more of an out-of-the-box type of test. There might be scope for ASRock to apply some form of integrated software to help optimise the connection. If possible I might farm out this motherboard to Ganesh for use in future NAS reviews, depending on his requirements.
  • staiaoman - Monday, December 15, 2014 - link

    wow. Such a concise summary of what to do in order to achieve high speed network transfers...something so excellent shouldnt just be buried in the comments on Anandtech (although if it has to be in the comments of a site, Anand or STH.com are clearly the right places ;-P). Thanks Jammrock!!
  • Hairs_ - Monday, December 15, 2014 - link

    Excellent comment, but it just underlines what a ridiculously niche product this is.

    Anyone running workloads like this surely isn't doing it using build it yourself equipment over a home office network?

    While this sort of arrive no doubt is full of interesting concepts to research for the reviewer, it doesn't help 99% of builders or upgraders out there.

    Where are the budget/midrange haswell options? Given the fairly stagnant nature of the amd market, what about an article on long term reliability? Both things which actually might be of interest to the majority of buyers.

    Nope, another set of ultra-niche motherboard reviews for those spending several hundred dollars.

    The reviews section on newegg is more use as a resource at this stage.
  • Harald.1080 - Monday, December 15, 2014 - link

    It's not that complicated.
    We set up 2 xeon E5 single socket machines with esxi 5.1, some guests on both machines, a 800€ 10g switch, and as the NAS backup machine a xeon E3 with 2 samsung 840pro in raid0 as fastcache in front of a fast raid5 disk system. NFS. All 3 machines with intel single port 10g. Jumbo frames.

    Linux vm guest A to other hosts vm guest B with ramdiskt 1GB/s from the start.
    Vmware hosts to NAS (the xeon E3 NFS System) with ssd cache: 900 MB/s write. w/o cache: 20 MB/s

    Finally used Vmdk disk tools to copy snapshotted disks for backup. Faster than file copy.

    I think, doing the test on the SAME MACHINE is a bad idea. Interrupt handlers will have a big effect on the results. What about Queues?
  • shodanshok - Tuesday, December 16, 2014 - link

    I had similar experience on two Red Hat 6 boxes using Broadcomm's NetXtreme II BCM57810 10 Gb/s chipset. The two boxes are directly connected by a Cat 6e cable, and the 10GBASE-T adapters are used to synchronize two 12x 15K disks arrays (sequential read > 1.2 GB/s)

    RSS is enabled by default, and so are TCO and the likes. I manually enabled jumbo frames on both interface (9K MTU). Using both netperf and iperf, I recorded ~9.5 Gb/s (1.19 GB/s) on UDP traffic and slightly lower (~9.3 Gb/s) using TCP traffic.

    Jumbo frames really made a big difference. A properly working TCP windows scaling alg is also a must have (I had two 1 Gb/s NICs with very low DRBD throughput - this was due to bad window scaling decision from the linux kernel when using a specific ethernet chip driver).

    Regards.
  • jbm - Saturday, December 20, 2014 - link

    Yes, the configuration is not easy, and you have to be careful (e.g. if you want to use SMB multichannel over several NICs, you need to have them in separate subnets, and you should make sure that the receive queues for the NICs are not on the same CPU cores). Coincidentally, I configured a couple servers for hyper-v at work recently which use Intel 10Gb NICs. With two 10Gb NICs, we get live migration speeds of 2x 9.8Gb/s, so yes - it does work in real life.
  • Daniel Egger - Monday, December 15, 2014 - link

    > The benefits of 10GBase-T outside the data center sound somewhat limited.

    Inside the data center the benefits are even more limited as there's usually no problem running fibre which is easier to handle, takes less volume, uses less power and allows for more flexibility -- heck, it even costs less! No sane person would ever use 10GBase-T in a datacenter.

    The only place where 10GBase-T /might/ make sense is in a building where one has to have cross room connectivity but cannot run fibre; but better hope for a good Cat.7 wiring and have the calibration protocol ready in case you feel the urge to sue someone because it doesn't work reliably...
  • gsvelto - Monday, December 15, 2014 - link

    There's also another aspect that hasn't been covered by the review: the reason why 10GBase-T is so slow when used by a single user (or when dealing with small transfers, e.g. NFS with small files) is that it's latency is *horrible* compared to Direct Attach SFP+. A single hop over an SFP+ link can take as little as 0.3µs while one should expect at least 2µs per 10GBase-T link and it can be higher.

    This is mostly due to the physical encoding (which requires the smallest physical frame transferable to be 400 bytes IIRC) and the heavy DSP processing needed to extract the data bits from the signal. Both per-port price and power are also significantly.

    In short, if you care about latency or small-packet transfers 10GBase-T is not for you. If you can't afford SFP+ then go for aggregated 1GBase-T links, they'll serve you well, give you lower latency and redundancy as the cherry on top.
  • shodanshok - Tuesday, December 16, 2014 - link

    This is very true, but it really depend on the higher-level protocol you want to use over it.

    IP over Ethernet is *not* engineered for latency. Try to ping your localhost (127.0.0.1) address: on RHEL 6.5 x86-64 running on top of a Xeon E5-2650 v2 (8 cores at 2.6 GHz, with performance governor selected, no heavy processes running) RTT times are about 0.010 ms, or about 10 usec. On-way sending is about half, at 5us. Adding 2us is surely significant, but hardly world-changer.

    This is for a localhost connection with a powerful processor and no other load. On a moderately-loaded, identical machine, the localhost RTT latency increase to ~0.03ms, or 15us for one-way connection. RTT for one machine to another is ranging from 0.06ms to 0.1ms, or 30-50us for one way traffic. As you can see, the 2-4us imposed by the 10Base-T encoding/decoding is rapidly fading away.

    IP creators and stack writers know that. They integrated TCP window scaling, Jumbo frames et similar to overcome that very problem. Typically, when very low-latency is needed, some lightweight protocol is used *on top* of these low-latency optical links. Heck, even PCI-E, with its sub-us latency is often too slow for some kind of workload. For example, some T-series SPARC CPU include 10GB Ethernet links rightly into the CPU packages, using dedicated low-latency internal bus, but using classical IP schemes on top of these very fast connection will not give you very high gain over more pedestrian 10Base-T ethernet cards...

    Regards.
  • gsvelto - Tuesday, December 16, 2014 - link

    Where I worked we had extensive 10G SFP+ deployments with ping latency measured in single-digit µs. The latency numbers you gave are for pure-throughput oriented, low CPU overhead transfers and are obviously unacceptable if your applications are latency sensitive. Obtaining those numbers usually requires tweaking your power-scaling/idle governors as well as kernel offloads. The benefits you get are very significant on a number of loads (e.g. lots of small file over NFS for example) and 10GBase-T can be a lot slower on those workloads. But as I mentioned in my previous post 10GBase-T is not only slower, it's also more expensive, more power hungry and has a minimum physical transfer size of 400 bytes. So if you're load is composed of small packets and you don't have the luxury of aggregating them (because latency matters) then your maximum achievable bandwidth is greatly diminished.
  • shodanshok - Wednesday, December 17, 2014 - link

    Sure, packet size play a far bigger role for 10GBase-T then optical (or even copper) SFP+ links.

    Anyway, the pings tried before were for relatively small IP packets (physical size = 84 bytes), which are way lower then typical packet size.

    For message-passing workloads SFP+ is surely a better fit, but for MPI it is generally better to use more latency-oriented protocol stacks (if I don't go wrong, Infiniband use a lightweight protocol stack for this very reason).

    Regards.
  • T2k - Monday, December 15, 2014 - link

    Nonsense. CAT6a or even CAT6 would work just fine.
  • Daniel Egger - Monday, December 15, 2014 - link

    You're missing the point. Sure Cat.6a would be sufficient (it's hard to find Cat.7 sockets anyway but the cabling used nowadays is mostly Cat.7 specced, not Cat.6a) but the problem is to end up with a properly balanced wiring that is capable of properly establishing such a link. Also copper cabling deteriorates over time so the measurement protocol might not be worth snitch by the time you try to establish a 10GBase-T connection...

    Cat.6 is only usable with special qualification (TIA-155-A) over short distances.
  • DCide - Tuesday, December 16, 2014 - link

    I don't think T2k's missing the point at all. Those cables will work fine - especially for the target market for this board.

    You also had a number of other objections a few weeks ago, when this board was announced. Thankfully most of those have already been answered in the excellent posts here. It's indeed quite possible (and practical) to use the full 10GBase-T bandwidth right now, whether making a single transfer between two machines or serving multiple clients. At the time you said this was *very* difficult, implying no one will be able to take advantage of it. Fortunately, ASRock engineers understood the (very attainable) potential better than this. Hopefully now the market will embrace it, and we'll see more boards like this. Then we'll once again see network speeds that can keep up with everyday storage media (at least for a while).
  • shodanshok - Tuesday, December 16, 2014 - link

    You are right, but the familiar RJ45 & cables can be a strong motivation to go with 10GBase-T in some cases. For a quick example: one of our customer bought two Dell 720xd to use as virtualization boxes. The first R720xd is the active one, while the second 720xd is used as hot-standby being constantly synchronized using DRBD. The two boxes are directly connected with a simple Cat 6e cable.

    As the final customer was in charge to do both the physical installation and the normal hardware maintenance, a familiar networking equipment as RJ45 port and cables were strongly favored by him.

    Moreover, it is expected that within 2 die shrinks 10GBase-T controller become cheap/low power enough that they can be integrated pervasively, similar to how 1GBase-T replaced the old 100 Mb standard.

    Regards.
  • DigitalFreak - Monday, December 15, 2014 - link

    Don't know why the went with 8 PCI-E lanes for the 10Gig controller. 4 would have been plenty.

    1 PCI-E 3.0 lane is 1GB per second (x4 = 4GB). 10Gig max is 1.25 GB per second, dual port = 2.5 GB per second. Even with overhead you'd still never saturate an x4 link. Could have used the extra x4 for something else.
  • The Melon - Monday, December 15, 2014 - link

    I personally think it would be a perfect board if they replaced the Intel X540 controller with a Mellanox ConnectX-3 dual QSFP solution so we could choose between FDR IB and 40/10/1Gb Ethernet per port.

    Either that or simply a version with the same slot layout and drop the Intel X540 chip.

    Bottom line though is no matter how they lay it out we will find something to complain about.
  • Ian Cutress - Tuesday, November 1, 2016 - link

    The controller is PCIe 2.0, not PCIe 3.0. You need to use a PCIe 3.0 controller to get PCIe 3.0 speeds.
  • eanazag - Monday, December 15, 2014 - link

    I am assuming we are talking about the free ESXi Hypervisor in the test setup.

    SR-IOV (IOMMU) is not an enabled feature on ESXi with the free license. What this means is that networking is going to tax the CPU more heavily. Citrix Xenserver does support SR-IOV on the free product, which it is all free now - you just pay for support. This is a consideration to base the results of the testing methodology used here.

    Another good way to test 10GbE is using iSCSI where the server side is a NAS and the single client is where the disk is attached. The iSCSI LUN (hard drive) needs have something going on with an SSD. It can just be 3 spindle HDDs in RAID 5. You can use disk test software to drive the benchmarking. If you opt to use Xenserver with Windows as the iSCSI client. Have the VM directly connect to the NAS instead of using Xenserver to the iSCSI LUN because you will hit a performance cap from VM to host in the typical add disk within Xen. This is in older 6.2 version. Creedance is not fully out of beta yet. I have done no testing on Creedance and the contained changes are significant to performance.

    About two years ago I was working on coming up with the best iSCSI setup for VMs using HDDs in RAID and SSDs as caches. I was using Intel X540-T2's without a switch. I was working with Nexenta Stor and Sun/Oracle Solaris as iSCSI target servers run on physical hardware, Xen, and VMware. I encountered some interesting behavior in all cases. VMware's sub-storage yielded better hard drive performance. I kept running into an artifical performance limit because of the Windows client and how Xen handles the disks it provides. The recommendation was to add the iSCSI disk directly to the VM as the limit wouldn't show up there. VMware still imposed a performance ding on (Hit>10%) my setup. Physical hardware had the best performance for the NAS side.
  • AngelosC - Wednesday, January 7, 2015 - link

    They could have tested it on Linux KVM with SR-IOV or just run iperf on Linux between the 2 interfaces.

    They ruined the test.
  • eanazag - Monday, December 15, 2014 - link

    Okay, so the use case of a board like this is for network attached storage using iSCSI or SMB3. That network storage has to be able to perform above 1GbE bandwith for a single stream. 1 GbE = ~1024 Mbps = ~128 MBps no counting overhead. Any single SSD these days can outperform a 1GbE connection.

    If you're considering this board, there is a Johan written article on Anand that is a couple of years old about 10GbE performance. It will cover why it is worth it. I did the leg work and found them.

    http://www.anandtech.com/show/4014/10g-more-than-a...
    http://www.anandtech.com/show/2956/10gbit-ethernet...
  • extide - Monday, December 15, 2014 - link

    At the end of the day, I still think I'd rather the X99 Extreme 11.
  • tuxRoller - Monday, December 15, 2014 - link

    How Is the DPC measurement made? Average (which?), worst case, or just once?
  • Ian Cutress - Tuesday, November 1, 2016 - link

    Peak (worst value) during our testing period, which is usually a minute at 'idle'
  • TAC-2 - Tuesday, December 16, 2014 - link

    Either there's something wrong with your test of the NICs or there is a problem with this board. I've been using 10GBase-T for years now, even with default settings I can push 500-1000 MB/s using intel NICs.
  • AngelosC - Wednesday, January 7, 2015 - link

    I recon they were not testing this board's most important feature properly.

    The reviewer makes it sounds like they don't know how to test…
  • jamescox - Tuesday, December 16, 2014 - link

    This seems more like a marketing thing; who will actually buy this? Given the current technology, it seems like it is much better to buy a discrete card, if you actually need 10GB.

    The feature I would like to see come down to the consumer market is ECC memory. I have had memory start to get errors after installation. I always run exhaustive memory test when building a system (memtest86 or other hardware specific test). I did not have any stability issues. I only noticed that something was wrong when I found that recently written files were corrupted. Almost everything passes through system memory at some point. Why is it okay for this not to be ECC protected? Given how far system memory is from the cpu (with L3 cache, and soon to be L4 with stacked memory), the speed is actually less important. Everything should be ECC protected.

    There may be some argument that the gpu memory doesn't need to be ECC, since if it is just being used for display; errors will only result in display artifacts. I am not sure if this is actually the case anymore though with what gpus are being used for. Can a single bit error in gpu memory cause a system crash? I may have to start running gpu memory test also.
  • petar_b - Thursday, December 18, 2014 - link

    ASROCK solely targets users with need of 10G network. If network card was an discrete option price would be lower and they would target wider audience. I like two PLXes, as I can attach all kind of Network, SAS and GPU cards. PLX and ASROCK quality is the reason I use their mobos.

    Regarding ECC memory for GPU, not agree there. If GPU is used to do math with OpenCL, then avoiding memory errors is very important.
  • akula2 - Thursday, December 18, 2014 - link

    Avoiding memory errors is beyond extremely important in my case when I churn tons of Science and Engineering things out of those Nvidia Titan Black, Quadro and Tesla cards. AMD did an amazing job with FirePro W9100 cards too.
  • koekkoe - Wednesday, December 17, 2014 - link

    One usage scenario: iSCSI storage, (especially when used also for booting) greatly benefits from 10G, because on 1G you're limited to 125MB/s, and big 16/24 disc arrays like EqualLogic can easily saturate also 10G bandwidth.
  • petar_b - Thursday, December 18, 2014 - link

    Xtreme11 used LSI SAS controller, it was awesome feature, I would happily pay for decent controller instead of slow SATA marvel ports - each time we add one more sata disk, overall disk transfer speed significantly drops. Thanks to LSI we can have 8 SSD SATA on SAS and they all perform 400MB/s even if used simultaneously. Marvel was dropping as low as 50MB/s with 8 SSD simultaneously used. What a lame.
  • akula2 - Thursday, December 18, 2014 - link

    I didn't prefer that board either -- not everything should be integrated from hardware scalability and fallback point of views. I'd prefer to build from a board such as Asus X99-E WS without filling up completely, and eventually choke it up!
  • atomt - Saturday, December 20, 2014 - link

    "It doesn't increase your internet performance"

    I beg to differ. 10Gbps internet is available for residential connections in my area. :-D
  • AngelosC - Wednesday, January 7, 2015 - link

    Several things bother me with this review:
    1) Did I miss it or is there really no mention on how the VMs were accessing the X540? Was it running SR-IOV? Or VMXNET3? What network drivers were loaded in the VMs?
    2) 10GE being the major selling point of the mobo but it was only tested using "LAN Speed Test" with results summarized into a simple chart? I suggest you could have also tested using netperf or iperf, showing results also from other OSes like CentOS? Performance difference between UDP and TCP/IP streams? If you just create packets and send, then receive packets and discard (as in the case of iperf3), you probably wouldn't have run into problem of having to place a file on a RAM disk and some other issues. And then if you ran iperf on Linux, you could have ran on bare metal, taking the VMS overhead out of the equation.
    3) For sake of correctness, would you please clarify whether it was a X540-AT2 or X540-BT2?

    To be frank, this review is below the standard I'd expect from AnandTech.

Log in

Don't have an account? Sign up now