Comments for Scaling Inference with NVIDIA’s T4: A Supermicro Solution with 320 PCIe Lanes

Scaling Inference with NVIDIA’s T4: A Supermicro Solution with 320 PCIe Lanes

by Ian Cutress on 11/19/2018 3:00 PM EST

Post Your Comment
Please log in or sign up to comment.

Comments Locked

12 Comments

Back to Article

osteopathic1 - Monday, November 19, 2018 - link
If I fill it with cards, will it run Crysis?
coder543 - Monday, November 19, 2018 - link
> This is achieved using Broadcom 9797-series PLX chips, splitting each PCIe x16 root complex from each processor into five x16 links

320 / 5 = 64 real PCIe lanes total.

AMD Epyc processors offer 128 PCIe lanes, right? It just seems like poor planning to start a project like this and decide to choose a processor platform that only offers 64 PCIe lanes as the foundation, so... half the actual throughput that they could have had just by choosing Epyc.
phoenix_rizzen - Monday, November 19, 2018 - link
If they split the mobo in half and used single-CPU setups, they'd have 256 full PCIe lanes to play with (128 on each half). Then just cluster the two systems together to create a single server image. :)

Or just go with a dual-CPU setup with 3-way PCIe switches (one 16x link split into three x16 links) instead of the 5-way of the Xeon setup.
Kevin G - Monday, November 19, 2018 - link
Not necessarily. There are GPU-to-GPU transactions that can be performed through the PCIe bridge chip at full 16x PCIe bandwidth.

There is also room for potential multipathing using those Avago/Broadcomm/PLX chips as they support more than one root complex. This would permit all five 9797 chips to have an 8x PCIe 3.0 link to each Xeon. Further more, the multiplathing also works to each slot: the 16 electrical lanes could be coming as two sets of 8 lanes from different 9797 chips. For GPU-to-GPU communications, they would not need to touch the root complex in the Xeon CPU's at only. Each 9797's uplink to the Xeons would be exclusively for main memory access. Those 9797 chips are very flexible in terms of how they can be deployed.

As for Epyc, it still lacks the number of PCIe lanes to do without bridge chips. At best, this would permit some high speed networking (dual 100 Gbit Ethernet) on the motherboard while still having the same arrangement of bridge chips. However, Epyc could do this with a single socket instead of two (though the two socket solutions would still leverage the bridge chips in a similar fashion).

The coming Rome version of Epyc may include more flexibility in terms of how the IO is handled due to centralization in the packaging (i.e. 96 + 96 PCIe lane dual socket configuration maybe possible). Avago/Broadcomm/PLX 9797 chips would still be necessary, but with 16 full lanes of bandwidth from each socket to six of those bridge chips.
Spunjji - Friday, November 23, 2018 - link
That was a well-written comment and nothing you said was inaccurate, but it doesn't fundamentally change the mathematics of 128 PCIe lanes being better than 64. You'd benefit from either fewer bridge chips, greater bandwidth or some combination thereof.
HStewart - Tuesday, November 20, 2018 - link
I would think some smart person could make a system with has a Broadcom 9797-series PLX chip on extending the PCI express lanes - this technology could make the number of PCI almost meanless. One in theory could take your system of choice and add more PCI express lanes.

Some one in theory could take an external TB3 systems and support more video cards on it.
Kevin G - Tuesday, November 20, 2018 - link
Performance will start to dwindle if you got from PCIe switch to PCIe switch. Latency will increase and host bandwidth will become an ever increasing bottleneck. Sure, you can leverage a 9797 chip to provide eleven 8x PCIe 3.0 slots for GPUs but performance is going to be chronically limited by the 4x PCIe 3.0 uplink that Thunderbolt 3 provides. Only niche were this would make sense would be mining.
rahvin - Tuesday, April 2, 2019 - link
These chips don't magically create bandwidth, they share the bandwidth. Something like this makes sense in a compute situation (which is what this is designed for where you aren't bouncing tons of data back and forth) but it would fall on it's face in a any other situation where where high speed and high bandwidth access to each card was needed.

This is designed for compute clusters that aren't passing a lot of data to or from the PCIe cards. Don't expect to see this type of solution used elsewhere, particularly your thunderbolt example as it would provide no benefit.
abufrejoval - Tuesday, November 20, 2018 - link
Two comments for the price of one:
1. Looks like a crypto mining setup being repurposed
2. Sure looks like this Avago's smart IP grabbing efforts are paying off big time in this design

Does anyone know, btw. if the PLX design teams are still busy at work doing great things for PCIe 4.0 or is Avago just cashing in?
Cygni - Tuesday, November 20, 2018 - link
Nobody would bother with pricey PLX chips to run full 16x slots for crypto.
Godaniabilo - Thursday, November 22, 2018 - link
Hi, my name is Isabel. I like roleplay and storyplay. write me there http://badoos.ml
Mensentona - Friday, November 23, 2018 - link
Hello everyone) I am a girl who is tired without attention and looking for a young man. Write me there http://badoos.ml

Scaling Inference with NVIDIA’s T4: A Supermicro Solution with 320 PCIe Lanes

Post Your Comment

12 Comments

Back to Article

osteopathic1 - Monday, November 19, 2018 - link

coder543 - Monday, November 19, 2018 - link

phoenix_rizzen - Monday, November 19, 2018 - link

Kevin G - Monday, November 19, 2018 - link

Spunjji - Friday, November 23, 2018 - link

HStewart - Tuesday, November 20, 2018 - link

Kevin G - Tuesday, November 20, 2018 - link

rahvin - Tuesday, April 2, 2019 - link

abufrejoval - Tuesday, November 20, 2018 - link

Cygni - Tuesday, November 20, 2018 - link

Godaniabilo - Thursday, November 22, 2018 - link

Mensentona - Friday, November 23, 2018 - link

Log in

Don't have an account? Sign up now