Admittedly to lazy to search myself, but I would be interested to know what physical cabling Avago are pushing to extend the PCIe fabric between external hosts, and what limitations the medium will impose. (I'm thinking of how this would compete with infiniband and NICs from a price/performance standpoint.)
Within a rack, you can just use a backplane. Technically you can go between racks with the right cards and optical cabling (which Avago also produce) without a translation layer, so I was told.
Indeed, this could be very impressive! Consider communication between small tightly packed compute nodes. They don't need a regular LAN made for long distances, a quick high performance hop over a few 10 cm would be all they need, maybe even less depending on clever chassis design.
This could also help AMD to combat NV link. It doesn't make the pipe into a single CPU any wider, but for a system with multiple GPUs and/or CPU sockets it could provide massive benefits.
Imagine marrying this to a Hyper Converged platform like Nutanix. Super fast i/o to multiple hosts for the distributed file system, lightning fast vmotion and storage vmotion. This is going to be some really cool stuff.
My (long time) dream of turning a 1Gb/s home network (1Gb/s) into a "home HPC cluster" of about 10GB/s (with an external PEX9700 switch and some PCIe boards) may finally come true.
you can do that now, 40gbit infiniband cards are going for less than $100 these days and you can find 36 port switches for $700. older 10 and 20gbit infiniband stuff is even cheaper. All of my home lab hosts are connected with 40gbit ib
As far as I'm concerned and based on my tests, you can't push 20Gbps (not even close to 10Gpbs!) on a 20Gbit IB switch by using IP over Infiniband (IPoIB). In order to achieve 10/20/40 Gbps you need to bypass the kernel to avoid generating lots of system calls and taxing the CPU. You need to use the native Infiniband "verbs" programming interface to make full use of an Infiniband HCA and initiate data transfer from user space.
This will greatly limit the use for most apps, because they rely on the TCP/IP stack.
So yes, it's possible. But if you expect a plug-and-play scenario to achieve 10 or 20 Gbps from your PC to your home fileserver with low cost IB HCAs, Switches and cables - I don't think so.
thats correct, your max theoretical with 40gbit ib is 32gbit usable. I'm usually getting in the ~26 gbit rangeusing SRP/iSER and NFS-RDMA based storage targets backed by either ssds or a conventional disk pool with a relatively large (400gb enterprise flash) buffer. I've seen higher rates with RDS/RDMA enabled apps though. I notice esxi 6 has a whole rdma framework builtin now so i'm looking forward to playing with that. My cards (mellanox connect-x 2) aren't supported in Windows 2012 for SMB Direct, so I haven't had a chance to mess with it much unfortunately.
"but the heart of these features lies in the ability to have multiple nodes access data quickly within a specific framework without having to invest in expensive technologies such as Infiniband."
Unless you are somehow talking expense in the sense of protocol overhead, I have to argue that point. Infiniband is one of the cheapest high-performance fabrics currently available, cheaper than 10GbE, 40GbE, and I would have to imagine purely on a scale basis, cheaper than customized native PCIe networking solutions.
We’ve updated our terms. By continuing to use the site and/or by logging into your account, you agree to the Site’s updated Terms of Use and Privacy Policy.
12 Comments
Back to Article
JFish222 - Tuesday, May 12, 2015 - link
Admittedly to lazy to search myself, but I would be interested to know what physical cabling Avago are pushing to extend the PCIe fabric between external hosts, and what limitations the medium will impose. (I'm thinking of how this would compete with infiniband and NICs from a price/performance standpoint.)Ian Cutress - Tuesday, May 12, 2015 - link
Within a rack, you can just use a backplane. Technically you can go between racks with the right cards and optical cabling (which Avago also produce) without a translation layer, so I was told.MrSpadge - Tuesday, May 12, 2015 - link
Indeed, this could be very impressive! Consider communication between small tightly packed compute nodes. They don't need a regular LAN made for long distances, a quick high performance hop over a few 10 cm would be all they need, maybe even less depending on clever chassis design.This could also help AMD to combat NV link. It doesn't make the pipe into a single CPU any wider, but for a system with multiple GPUs and/or CPU sockets it could provide massive benefits.
MrSpadge - Tuesday, May 12, 2015 - link
And thanks for reporting such non-gaming topics, AT!olderkid - Tuesday, May 12, 2015 - link
Imagine marrying this to a Hyper Converged platform like Nutanix. Super fast i/o to multiple hosts for the distributed file system, lightning fast vmotion and storage vmotion. This is going to be some really cool stuff.Spirall - Tuesday, May 12, 2015 - link
My (long time) dream of turning a 1Gb/s home network (1Gb/s) into a "home HPC cluster" of about 10GB/s (with an external PEX9700 switch and some PCIe boards) may finally come true.zipcube - Wednesday, May 13, 2015 - link
you can do that now, 40gbit infiniband cards are going for less than $100 these days and you can find 36 port switches for $700. older 10 and 20gbit infiniband stuff is even cheaper. All of my home lab hosts are connected with 40gbit ibimmortalex - Thursday, May 14, 2015 - link
As far as I'm concerned and based on my tests, you can't push 20Gbps (not even close to 10Gpbs!) on a 20Gbit IB switch by using IP over Infiniband (IPoIB). In order to achieve 10/20/40 Gbps you need to bypass the kernel to avoid generating lots of system calls and taxing the CPU.You need to use the native Infiniband "verbs" programming interface to make full use of an Infiniband HCA and initiate data transfer from user space.
This will greatly limit the use for most apps, because they rely on the TCP/IP stack.
So yes, it's possible. But if you expect a plug-and-play scenario to achieve 10 or 20 Gbps from your PC to your home fileserver with low cost IB HCAs, Switches and cables - I don't think so.
zipcube - Thursday, May 14, 2015 - link
thats correct, your max theoretical with 40gbit ib is 32gbit usable. I'm usually getting in the ~26 gbit rangeusing SRP/iSER and NFS-RDMA based storage targets backed by either ssds or a conventional disk pool with a relatively large (400gb enterprise flash) buffer. I've seen higher rates with RDS/RDMA enabled apps though. I notice esxi 6 has a whole rdma framework builtin now so i'm looking forward to playing with that. My cards (mellanox connect-x 2) aren't supported in Windows 2012 for SMB Direct, so I haven't had a chance to mess with it much unfortunately.wishgranter - Wednesday, May 13, 2015 - link
Ok so that mean we will see a PCI-E riser cards eventually ??phoenix_rizzen - Thursday, May 14, 2015 - link
3rd paragraph, 3rd sentence: Should "The PLX9700 series" be "The PEX9700 series"? L-->EJames5mith - Saturday, May 16, 2015 - link
"but the heart of these features lies in the ability to have multiple nodes access data quickly within a specific framework without having to invest in expensive technologies such as Infiniband."Unless you are somehow talking expense in the sense of protocol overhead, I have to argue that point. Infiniband is one of the cheapest high-performance fabrics currently available, cheaper than 10GbE, 40GbE, and I would have to imagine purely on a scale basis, cheaper than customized native PCIe networking solutions.