1 link vs multiple PCIe lanes. It's why big storage vendors are pushing so hard for PCIe5 (and then 6); because a single 5.0 lane will have as much bandwidth as a 3.0x4 letting them cram a lot more drives onto a single mobo without needing PLXes to expand the nominal PCIe lane count.
The need for PLX is only if you use Xeon hosts. Using 2nd Gen Epyc a dual socket host can fully address 24x PCIe 4.0 x4 NVMe SSDs and still have 4x PCIe 4.0 x16 slots for dual port 100Gbps NICs. A single socket 2nd Gen Epyc can do the same just with 2x x16 slots.
PLX is used in large arrays (EMC) and are host independent. Intel Dual Ice Lake SP will also support 128 PCIe4 lanes - and will finally kick off the PCIe4 storage era.
Wow only has taken Intel 3 years to catch up on PCIe lanes with their dual socket to the AMD single socket. 2nd Gen dual socket Epyc can be configured to allow for 160 PCIe 4.0 lanes.
Enterprise use large arrays - from companies like EMC. They are not building their own - and the jump from PCIe5 to PCIe6 is a much heavier life than 3 to 4 or 4 to 5.
Why, due to the switch to PAM-4 and the resulting need for extensive error correction? I think the very high signal clocks of PCIe 5.0 will not be a piece of cake either.
I'm not sure if NVMe (even with U.2 connections) support things like SAS Extenders/Backplanes (connecting many drives to a single port, albeit shared bandwidth) or Multipath (One drive connected to two controllers, for redundancy in case a controller dies).
In fact, I'm trying to figure out of there's a way to "externalize" an NVMe SSD and it seems that most options are limited to only 4 or so drives in an internal PCIe card. Maybe I'm missing a standard here, but taking an NVMe drive and connecting it externally seems to only be possible through Thunderbolt.
Maybe NVMe-oF is the standard that I'm missing, but in any case, SAS as a connection standard offers storage flexibility and redundancy that I can't seem to find for NVMe yet.
NVMe backplanes and expanders exist, they work using pci switches. The main problem is not a single vendor I am aware of right now will sell any NVMe hardware that isn’t part of a complete build. You can’t just buy a backplane or a chassis, you MUST buy “fully configured” servers. The number of devices is still “limited”, but for example, you can drive 32 NVMe drives over x8 with this card: https://www.broadcom.com/products/storage/host-bus...
But getting the expanders that actually allow you to do so is quite hard right now unless you are ready to spend fat stax.
Enterprise NVMe SSDs have a direct connection to the host CPU so there isn't a need for controller cards. When it comes to adding a lot of SSD the biggest issue is physical space rather than protocol. Using PLX switches as mentioned before you can have more SSD connected to a single host. The is the reason that Samsung has made M.3 and Intel has made the "Ruler." The idea is to be able to have more than the 24x 2.5" drives in a single 2U host.
Old "desktop" tower cases can accommodate 50+ drives. 3,5" slot can accept 2*2,5" drives and 5,25" slot can accommodate six (low height ones). I have an Cooler Master ATCS 840 case wich has 6*3,5" slots and 6*5,25" slots. Because the case is so big (google Cooler Master 53GHz machine with five mini itx boards in it) I could probably cram around 70 drives if the CPU is cooled by air.
Packing 7mm SATA drives that densely is possible, but even 7mm U.2 drives tend to have substantially higher power draw on account of delivering much higher performance. U.2 enclosures that are denser than 15mm per drive are basically unheard of, but if one existed you would certainly hear it from a long ways off due to all the fan noise.
Some NVMe drives (such as the CM6) include dual-port capability, where they can do multi-path with x2+x2 links rather than operating as a single x4 link. PCIe switches are the equivalent of SAS expanders. Broadcom's PLX and Microchip's Microsemi are the two main providers of big PCIe switches, with lane counts up to 96-100 lanes. So port counts are lower than the largest SAS expanders, but total bandwidth is way higher.
Thunderbolt isn't the only PCIe cabling option, and isn't used in servers. External connections between a server and a PCIe JBOF are usually done with SFF-8644: https://www.serialcables.com/product-category/pcie...
When it was first introduced, NVMe definitely was at a disadvantage to SAS in terms of enterprise-oriented features. But the NVMe ecosystem has caught up and SAS has very few remaining advantages.
"PCIe switches are the equivalent of SAS expanders"
In my understanding, SAS expanders implement circuit switching whereas PCIe switches are packet switched. They have similar role but fulfill it in a different way. A SAS expander will reserve a route through the topology for the entire duration of a transaction (although I think the route might be different for different transactions). So when you mostly send information downstream, say, during a write operation, the uplink channel can not be used for some independent read operation. Whereas with NVMe each packet is independent. This makes them quite different in my opinion.
Obviously, the future belongs to NVMe. Especially over fabrics - the protocol is much less chatty.
SAS-4 does add dynamic channel multiplexing, which somewhat relaxes the restrictions of the circuit-switched model, but I'm not sure it does anything to address the poor utilization of full-duplex bandwidth that you refer to. If only they would share the spec publicly, instead of just vaguely describing the features in press releases.
2 different drives here - the SAS12 (existing) and SAS24 (new) and then the NVMe drives. Most SAS drives are mechanical - with the SAS24 - building out with SAS24 SSDs (These are NOT NVMe drives) offer a huge benefit over mech drives - from both the speed of and SSD and then the increase of bandwidth.
So - SAS12 and SAS24 ARE NOT NVMe - with these you can use port expanders just like you can with SAS12 and SATA - they need to be SAS24 - but can use expanders.
Clarification: SAS is fully backwards-compatible, so these new 24G SAS drives will work fine (albeit slower) when connected to 12Gb SAS hosts or expanders. Alternately, 12Gb SAS drives work fine connected to 24G SAS expanders, and the expander can talk to the host adapter at 24G speeds even if the individual drives are all slower than 24G. So 24G SAS gives expanders more headroom for aggregating bandwidth from individual drives.
And a lot of newer SAS HBAs and RAID cards are tri-mode, supporting SATA, SAS and NVMe drives all through U.3 ports, but usually with much tighter restrictions on how many NVMe drives can be connected. So as 24G SAS rolls out, it will be increasingly common to find it deployed in a way that allows a given drive bay to hold either a SAS or NVMe SSD.
Everybody crying about how QLC/TLC is bad for longevity and next second someone makes 30 TB with 1DWPD. overprovision is propably like 2x or 3x but for everyone who needs insane read speed and cappacity, without a lot of writes, this 30 TB thing with QLC (would be 40TB) that's insane.
QLC is basically the SSD version of Hard Drive SMR, except that it doesn't suck a terribly as SMR does. It's a nice compromise between price and capacity for read-heavy workloads - I wouldn't mind a 8-10 TB QLC SSD for a reasonable price ($600 or less), though I realize that's still somewhat off.
These 1DWPD SSDs are ideal as the capacity tier in Hyperconverged datacenters. Since almost all the writes are to the caching SSD, the "low" endurance drives are used for reads. That will hide the latency issues of QLC and still give you a very powerful vSAN.
We’ve updated our terms. By continuing to use the site and/or by logging into your account, you agree to the Site’s updated Terms of Use and Privacy Policy.
31 Comments
Back to Article
romrunning - Tuesday, June 16, 2020 - link
Note: the chart needs an update -the PM6 is missing the 30TB size under the 1 DWPD model.Billy Tallis - Tuesday, June 16, 2020 - link
Thanks. Copy and paste error on my part.danbob999 - Tuesday, June 16, 2020 - link
What's the point of SAS 12 and 24 G over NVMe?DanNeely - Tuesday, June 16, 2020 - link
1 link vs multiple PCIe lanes. It's why big storage vendors are pushing so hard for PCIe5 (and then 6); because a single 5.0 lane will have as much bandwidth as a 3.0x4 letting them cram a lot more drives onto a single mobo without needing PLXes to expand the nominal PCIe lane count.schujj07 - Tuesday, June 16, 2020 - link
The need for PLX is only if you use Xeon hosts. Using 2nd Gen Epyc a dual socket host can fully address 24x PCIe 4.0 x4 NVMe SSDs and still have 4x PCIe 4.0 x16 slots for dual port 100Gbps NICs. A single socket 2nd Gen Epyc can do the same just with 2x x16 slots.DigitalFreak - Tuesday, June 16, 2020 - link
Some companies want to stick with Intel, so...Deicidium369 - Tuesday, June 16, 2020 - link
Ice Lake SP offers 128 x PCIe4 lanes in a dual socket config - launching soon... and PCIe4 will become main streamAnarchoPrimitiv - Friday, June 19, 2020 - link
It's still only half the lanes of a single Epyc CPU which has x128 PCIe 4.0 lanesamnesia0287 - Saturday, June 20, 2020 - link
A dual socket Epyc will also still have 128 lanes...Deicidium369 - Tuesday, June 16, 2020 - link
PLX is used in large arrays (EMC) and are host independent. Intel Dual Ice Lake SP will also support 128 PCIe4 lanes - and will finally kick off the PCIe4 storage era.schujj07 - Tuesday, June 16, 2020 - link
Wow only has taken Intel 3 years to catch up on PCIe lanes with their dual socket to the AMD single socket. 2nd Gen dual socket Epyc can be configured to allow for 160 PCIe 4.0 lanes.danbob999 - Tuesday, June 16, 2020 - link
you understand that at some point you need some sort of link (most likely PCIe) to interface between the CPU and the SAS 24G controller, right?Deicidium369 - Tuesday, June 16, 2020 - link
Enterprise use large arrays - from companies like EMC. They are not building their own - and the jump from PCIe5 to PCIe6 is a much heavier life than 3 to 4 or 4 to 5.Santoval - Tuesday, June 16, 2020 - link
Why, due to the switch to PAM-4 and the resulting need for extensive error correction? I think the very high signal clocks of PCIe 5.0 will not be a piece of cake either.schujj07 - Tuesday, June 16, 2020 - link
More and more companies are moving away from the traditional SAN to hyperconverged.MenhirMike - Tuesday, June 16, 2020 - link
I'm not sure if NVMe (even with U.2 connections) support things like SAS Extenders/Backplanes (connecting many drives to a single port, albeit shared bandwidth) or Multipath (One drive connected to two controllers, for redundancy in case a controller dies).In fact, I'm trying to figure out of there's a way to "externalize" an NVMe SSD and it seems that most options are limited to only 4 or so drives in an internal PCIe card. Maybe I'm missing a standard here, but taking an NVMe drive and connecting it externally seems to only be possible through Thunderbolt.
Maybe NVMe-oF is the standard that I'm missing, but in any case, SAS as a connection standard offers storage flexibility and redundancy that I can't seem to find for NVMe yet.
MenhirMike - Tuesday, June 16, 2020 - link
(As a sidenote, if someone knows how to connect 50-ish NVMe drive to a single system, hints are appreciated)amnesia0287 - Saturday, June 20, 2020 - link
NVMe backplanes and expanders exist, they work using pci switches. The main problem is not a single vendor I am aware of right now will sell any NVMe hardware that isn’t part of a complete build. You can’t just buy a backplane or a chassis, you MUST buy “fully configured” servers. The number of devices is still “limited”, but for example, you can drive 32 NVMe drives over x8 with this card: https://www.broadcom.com/products/storage/host-bus...But getting the expanders that actually allow you to do so is quite hard right now unless you are ready to spend fat stax.
schujj07 - Tuesday, June 16, 2020 - link
Enterprise NVMe SSDs have a direct connection to the host CPU so there isn't a need for controller cards. When it comes to adding a lot of SSD the biggest issue is physical space rather than protocol. Using PLX switches as mentioned before you can have more SSD connected to a single host. The is the reason that Samsung has made M.3 and Intel has made the "Ruler." The idea is to be able to have more than the 24x 2.5" drives in a single 2U host.Here is some information on NVMe-oF https://community.mellanox.com/s/article/what-is-n...
bigvlada - Tuesday, June 16, 2020 - link
Old "desktop" tower cases can accommodate 50+ drives. 3,5" slot can accept 2*2,5" drives and 5,25" slot can accommodate six (low height ones). I have an Cooler Master ATCS 840 case wich has 6*3,5" slots and 6*5,25" slots. Because the case is so big (google Cooler Master 53GHz machine with five mini itx boards in it) I could probably cram around 70 drives if the CPU is cooled by air.Billy Tallis - Tuesday, June 16, 2020 - link
Packing 7mm SATA drives that densely is possible, but even 7mm U.2 drives tend to have substantially higher power draw on account of delivering much higher performance. U.2 enclosures that are denser than 15mm per drive are basically unheard of, but if one existed you would certainly hear it from a long ways off due to all the fan noise.Billy Tallis - Tuesday, June 16, 2020 - link
Some NVMe drives (such as the CM6) include dual-port capability, where they can do multi-path with x2+x2 links rather than operating as a single x4 link. PCIe switches are the equivalent of SAS expanders. Broadcom's PLX and Microchip's Microsemi are the two main providers of big PCIe switches, with lane counts up to 96-100 lanes. So port counts are lower than the largest SAS expanders, but total bandwidth is way higher.Thunderbolt isn't the only PCIe cabling option, and isn't used in servers. External connections between a server and a PCIe JBOF are usually done with SFF-8644: https://www.serialcables.com/product-category/pcie...
When it was first introduced, NVMe definitely was at a disadvantage to SAS in terms of enterprise-oriented features. But the NVMe ecosystem has caught up and SAS has very few remaining advantages.
MenhirMike - Tuesday, June 16, 2020 - link
Thanks for the info! I'll keep an eye out, but SFF-8644 is a good lead! (no pun intended)kobblestown - Wednesday, June 17, 2020 - link
"PCIe switches are the equivalent of SAS expanders"In my understanding, SAS expanders implement circuit switching whereas PCIe switches are packet switched. They have similar role but fulfill it in a different way. A SAS expander will reserve a route through the topology for the entire duration of a transaction (although I think the route might be different for different transactions). So when you mostly send information downstream, say, during a write operation, the uplink channel can not be used for some independent read operation. Whereas with NVMe each packet is independent. This makes them quite different in my opinion.
Obviously, the future belongs to NVMe. Especially over fabrics - the protocol is much less chatty.
Billy Tallis - Wednesday, June 17, 2020 - link
SAS-4 does add dynamic channel multiplexing, which somewhat relaxes the restrictions of the circuit-switched model, but I'm not sure it does anything to address the poor utilization of full-duplex bandwidth that you refer to. If only they would share the spec publicly, instead of just vaguely describing the features in press releases.Deicidium369 - Tuesday, June 16, 2020 - link
2 different drives here - the SAS12 (existing) and SAS24 (new) and then the NVMe drives. Most SAS drives are mechanical - with the SAS24 - building out with SAS24 SSDs (These are NOT NVMe drives) offer a huge benefit over mech drives - from both the speed of and SSD and then the increase of bandwidth.So - SAS12 and SAS24 ARE NOT NVMe - with these you can use port expanders just like you can with SAS12 and SATA - they need to be SAS24 - but can use expanders.
Billy Tallis - Tuesday, June 16, 2020 - link
Clarification: SAS is fully backwards-compatible, so these new 24G SAS drives will work fine (albeit slower) when connected to 12Gb SAS hosts or expanders. Alternately, 12Gb SAS drives work fine connected to 24G SAS expanders, and the expander can talk to the host adapter at 24G speeds even if the individual drives are all slower than 24G. So 24G SAS gives expanders more headroom for aggregating bandwidth from individual drives.And a lot of newer SAS HBAs and RAID cards are tri-mode, supporting SATA, SAS and NVMe drives all through U.3 ports, but usually with much tighter restrictions on how many NVMe drives can be connected. So as 24G SAS rolls out, it will be increasingly common to find it deployed in a way that allows a given drive bay to hold either a SAS or NVMe SSD.
Santoval - Tuesday, June 16, 2020 - link
I had no idea a new version of SAS had been developed. I thought SAS was still (and would remain) at 12 Gbit/s.deil - Wednesday, June 17, 2020 - link
Everybody crying about how QLC/TLC is bad for longevity and next second someone makes 30 TB with 1DWPD. overprovision is propably like 2x or 3x but for everyone who needs insane read speed and cappacity, without a lot of writes, this 30 TB thing with QLC (would be 40TB) that's insane.MenhirMike - Wednesday, June 17, 2020 - link
QLC is basically the SSD version of Hard Drive SMR, except that it doesn't suck a terribly as SMR does. It's a nice compromise between price and capacity for read-heavy workloads - I wouldn't mind a 8-10 TB QLC SSD for a reasonable price ($600 or less), though I realize that's still somewhat off.schujj07 - Wednesday, June 17, 2020 - link
These 1DWPD SSDs are ideal as the capacity tier in Hyperconverged datacenters. Since almost all the writes are to the caching SSD, the "low" endurance drives are used for reads. That will hide the latency issues of QLC and still give you a very powerful vSAN.