Can you achieve the same thing with PCI bridges? You can have up to 255 PCI busses, so you can host about 128 GPUs on the leaf nodes. It seems the main limitation from using PCI bridges would be that the topology will have to be a tree, which will be a bottleneck if there's no data locality.
Application Specific Processor, not ASIC. ASICs are full-custom chip designs. Technically an FPGA is an ASIC which can be programmed to perform specific functionality required for an application, but never at the speed or power of an ASIC.
If the main host dies, you have to shut down whole tree. In distributed host-less environment you're mainly taking offline single nodes for very short time. Also in high concurrency PCIe tends to congest, and DRAM throughput is big limitation, especially if data from one channel connected to one CPU needs data from other DRAM/NUMA node. That can introduce high randomness into latency and further decrease performance.
Yay, finally, I've been wondering with PCIe switch chips being so affordable, how long before someone figures out you can make crazy fast complex topology interconnect on the budget. I've been looking into using PCIe switches to build supercomputers out of affordable single socket motherboards - snap in a decent CPU and 2 GPUs for compute, use the other PCIe slots to connect to other nodes and there you have it - no need for crazy expensive proprietary interconnect, no need for crazy expensive components.
In there you mention you could combine the BI and the BN but that wouldn't make sense .. I mean then you have a server with a cpu and an accelerator .. right where we started.
However it makes sense to combine the CN and the BI -- that way you don't need the extra layer of BI nodes.
Am I misunderstanding this or was that a typo? or?
We’ve updated our terms. By continuing to use the site and/or by logging into your account, you agree to the Site’s updated Terms of Use and Privacy Policy.
8 Comments
Back to Article
YaleZhang - Monday, December 21, 2015 - link
Can you achieve the same thing with PCI bridges? You can have up to 255 PCI busses, so you can host about 128 GPUs on the leaf nodes. It seems the main limitation from using PCI bridges would be that the topology will have to be a tree, which will be a bottleneck if there's no data locality.Loki726 - Monday, December 21, 2015 - link
I think that part of the problem is GPU driver performance and correct functionality with that many devices, but you can get quite far with bridges.SleepyFE - Tuesday, December 22, 2015 - link
The GPU is more power hungry and less flexible. With an FPGA you program an ASIC onto it making it suit your needs better.SaberKOG91 - Tuesday, December 22, 2015 - link
Application Specific Processor, not ASIC. ASICs are full-custom chip designs. Technically an FPGA is an ASIC which can be programmed to perform specific functionality required for an application, but never at the speed or power of an ASIC.Vatharian - Monday, December 21, 2015 - link
If the main host dies, you have to shut down whole tree. In distributed host-less environment you're mainly taking offline single nodes for very short time. Also in high concurrency PCIe tends to congest, and DRAM throughput is big limitation, especially if data from one channel connected to one CPU needs data from other DRAM/NUMA node. That can introduce high randomness into latency and further decrease performance.ddriver - Monday, December 21, 2015 - link
Yay, finally, I've been wondering with PCIe switch chips being so affordable, how long before someone figures out you can make crazy fast complex topology interconnect on the budget. I've been looking into using PCIe switches to build supercomputers out of affordable single socket motherboards - snap in a decent CPU and 2 GPUs for compute, use the other PCIe slots to connect to other nodes and there you have it - no need for crazy expensive proprietary interconnect, no need for crazy expensive components.agentd - Saturday, December 26, 2015 - link
Have you seen the Avago (formerly PLX) PEX9700 series switches?extide - Monday, December 21, 2015 - link
In there you mention you could combine the BI and the BN but that wouldn't make sense .. I mean then you have a server with a cpu and an accelerator .. right where we started.However it makes sense to combine the CN and the BI -- that way you don't need the extra layer of BI nodes.
Am I misunderstanding this or was that a typo? or?