Comments Locked

47 Comments

Back to Article

  • frbeckenbauer - Tuesday, May 11, 2021 - link

    So basically it's their version of IBM OMI
  • Arsenica - Tuesday, May 11, 2021 - link

    Just like the article says!!
  • Ithaqua - Wednesday, May 12, 2021 - link

    So basically the 2021 version of a LIM 4 memory board.
    All things old are new again.
  • Spunjji - Tuesday, May 11, 2021 - link

    Interesting stuff. I can think of a few use cases for another level of memory that slots into the memory hierarchy below system DRAM in terms of bandwidth and latency, but with additional capacity. Huge databases would be an obvious candidate.

    I'm sure there will also be use cases where bandwidth is so critical to performance that filling a server's PCIe allocation with lower-capacity versions of this could give a system a measurable performance boost.
  • deil - Tuesday, May 11, 2021 - link

    read replicas god-level equipment?
    PCIe5, ddr5 it will take a while for us to see it in action.
  • koaschten - Tuesday, May 11, 2021 - link

    Well, DDR5 mass production started in march 2021, but at least AMD will only adopt it with ZEN 4, from the top of my head, can't recall if Intel stated which platform will support DDR5. Eventually Alder Lake S?
  • ET - Tuesday, May 11, 2021 - link

    AMD will introduce DDR5 with Rembrandt (which is Zen3 based), probably in Q1 2022, going by their normal mobile release cycle. For desktop, who knows...

    And yes, Alder Lake S will support DDR5.
  • Yojimbo - Tuesday, May 11, 2021 - link

    I'm surprised AMD are putting DDR5 in Rembrandt. Is that rumor from a good source? Cezanne is PCIe 3, for example. DDR5 is going to be more expensive than DDR4 in the beginning. So if Rembrandt will be DDR5 I assume it will only be for high-end laptop designs and it will exist alongside Cezanne in their lineup. Perhaps it will be like Alder Lake, in that it will be both DDR4 and DDR5-capable. In that case we couldn't be sure we'd actually see any DDR5 designs at launch.
  • Santoval - Wednesday, May 12, 2021 - link

    It's probably a bit more than a rumor, since it is a based on a leaked AMD roadmap. For Rembrandt they list support for LPDDR5, DDR5 and PCIe 4.0, along (click the link below). No less than 12 RDNA2 compute units are also rumored for the top Rembrandt variants (AMD needs to compete with Xe and its successors) up from 8 Vega CUs in the previous four (or is it five? I lost count for how long AMD's APUs have been stuck with Vega...) APU generations.
    According to the same table Rembrandt is to be released about a quarter or more before the Zen 4 based (desktop) Raphael, so the former might be released in Q4 2021, not Q1 2022.
    Of course the table might well be fabricated and/or AMD's plans might change and they ditch DDR5 support for Rembrandt.
    https://videocardz.com/newz/amd-ryzen-9-6900h-remb...
  • Santoval - Wednesday, May 12, 2021 - link

    p.s. (Rembrandt is Zen 3 based, not Zen 4 based, so it may well be released a quarter or two before Raphael).
  • flashmozzg - Tuesday, May 11, 2021 - link

    LPDDR5 != DDR5
  • Yojimbo - Tuesday, May 11, 2021 - link

    Alder Lake, to be released in 2021 (latest rumors say November), will have PCIe 5. It can support either DDR5 or DDR4 - but not both at once, from what I understand - so we will have to wait and see what consumer options are actually available from motherboard manufacturers.
  • Santoval - Wednesday, May 12, 2021 - link

    Most likely Alder Lake-S and above (the HEDT version, assuming there will be one) will get DDR5, Alder Lake-H will get LPDDR5 or DDR4 and the low power U/Y variants will probably support LPDDR5 (solely or not will depend on LPDDR5's projected prices in the second half of the year).
  • Tomatotech - Tuesday, May 11, 2021 - link

    Interesting idea 2.

    For comparison, fast PCie 4.0 SSDs do about 8GB/s, and presumably PCIe 5.0 SSDs will do about 16-20GB/s in the real world.

    This has a theoretical max of 32 GT/s = 256GB/s but will not get anywhere near that. Still, it might get to around 64 or 128 GB/s for bulk transfer. The real advantage of DRAM over PCIe will be the god-like small file and random access speeds eg for databases.

    The fastest HDDs got to around 500 KB/s (IIRC) for random. Good SSDs currently get to around 100 MB/s for random single queue, 200MB/s for random with deep queues. Optane gets to 500/700 MB/s for an arm and a leg.

    If this can do random access at DRAM speeds, i.e. several GB/s for around the same price as Optane, it could be a winner. As it's over PCIE it doesn't need to be top speed DDR5 RAM. It could use older slower and cheaper DDR4 and still be a winner. Maybe they're using DDR5 for lower heat, lower power use, and higher capacity?

    Let's consider possible costs. Mainstream memory runs at about $5 per GB, so 256GB of this would cost around $1200 for the DRAM alone (once DDR5 has become mainstream) then double that for being rare and server level, so it looks like this will cost about $2,500 for 256GB. For a proper server capacity, maybe $10,000 for 1TB.

    I'm not so sure this will compete with Optane after all. Optane runs at about $1,000 per TB, considerably cheaper. But there are companies willing to pay for DRAM in the TB levels for their systems.
  • mode_13h - Tuesday, May 11, 2021 - link

    > For a proper server capacity, maybe $10,000 for 1TB.

    Think bigger. Modern server CPUs already support multi-TB of direct-attached RAM.
  • Tomatotech - Tuesday, May 11, 2021 - link

    They do yes, but I hear actually installing that much RAM can be a pain. (I don't have any direct experience at that level.) This CXL drive offers a quick way to plug in an extra few TB of almost DRAM level memory.
  • Calin - Wednesday, May 12, 2021 - link

    With servers, you don't plug extra memory. You buy them fully populated, and by the time you need more memory you throw them out and buy new systems (faster processors, faster I/O, lower power for the same performance, higher performance in the same power/cooling budget, more cores for more virtual machines, better security i.e. no/less Spectre/...).
    And 512/768GB of RAM would be standard fare for Intel processors without the "extra memory" tax, while 2TB would be accessible by both AMD and Intel.
    And the "extra memory" Intel tax, while expensive, pales in comparison to the cost of 2TB of server RAM.
    This "CXL" memory is also slower, and behind another level of indirection (OS support, HW support, ...) - so it would need to be either much cheaper to replace normal RAM, or be used above the normal memory.
  • mode_13h - Wednesday, May 12, 2021 - link

    I thought Ice Lake SP didn't have an "extra memory" tax.
  • schujj07 - Monday, May 17, 2021 - link

    All depends on the length of your lease. We started with 512GB RAM in some servers on a multi year lease. A little over a year later our workloads changed and additional RAM was required for these servers. At that point I went and I took out the 16x 32GB RDIMMs and put in 16x 64GB LRDIMMs.

    Depending on your Intel CPU there is a RAM tax but that has changed with Ice Lake.

    While CXL memory will be slower than DRAM, it will most likely be faster than Optane RAM and available to anyone not just Intel.
  • Santoval - Wednesday, May 12, 2021 - link

    "If this can do random access at DRAM speeds, i.e. several GB/s for around the same price as Optane, it could be a winner."
    I strongly doubt the IOPS will be anywhere close to that of DRAM from DIMM slots. CXL or not the PCIe link will still be the latency bottleneck. PCIe 5.0 slots will also need to move even closer to the CPU than PCIe 4.0 slots, otherwise their signal integrity will take a severe hit. PCIe 6.0 switches to PAM4 encoding, but PCIe 5.0 will still use NRZ encoding. All else being equal, since PCIe 5.0 just has double the clocks of PCIe 4.0, the signal will be twice as weak (all else will *not* be equal, since PCIe 5.0 motherboards will be optimized for PCIe 5.0, their PCBs will probably have even more layers etc, but you get my point).

    CXL.mem devices like the above are intended for capacity and bandwidth (mostly capacity), not latency critical and high IOPS stuff, since latency will be poorer. How much poorer? I have no idea, it might be anywhere between 10 to 100 times worse (I would guess 30 to 50 times worse). What's certain is that since this device uses DDR5 the bottleneck will be in the PCIe link, not the memory. In contrast the bottleneck of non volatile Optane DIMMS lies in Optane itself.
  • schujj07 - Thursday, May 13, 2021 - link

    32GT/s =/ 256GB/sec. Each lane does 32GT (32Gbit) * 16 lanes = 512GT/s (512Gbit) / 8 (8 bits per byte) = 64GB/sec.That is roughly the throughput of dual channel DDR4-4000 just with more overhead and longer connections. Realistically figure that number would drop by 10-20% over theoretical maximum.
  • mode_13h - Thursday, May 13, 2021 - link

    That's per-direction, FWIW. So, depending on the CXL protocol and their implementation, it's conceivable you could hit 2x that, if your reads and writes are somewhat balanced.

    I agree with your approach, however. Since there are so many caveats around bi-dir throughput, we should just focus on the uni-dir numbers and hope bi-dir is higher.
  • schujj07 - Friday, May 14, 2021 - link

    When talking about bandwidth, you never state a 64GB/sec bi-directional link is a 128GB/sec connection. The stated bandwidth is always said in the uni-directional numbers, however, you might say it is 64GB/sec bi-directional.
  • Exotica - Tuesday, May 11, 2021 - link

    When PCI 6.0 drops, this will have even more bandwidth abilities. And with cxl 2.0, Persistent memory could be coming to the mainstream.
  • Tomatotech - Tuesday, May 11, 2021 - link

    A weird wafer-munching wizard wrote about CXL 2.0 over here:

    https://www.anandtech.com/show/16227/compute-expre...
  • DanNeely - Tuesday, May 11, 2021 - link

    Well this is a blast from the past. Back in the 80s you could add more ram to your PC over the ISA bus. ex

    https://www.lo-tech.co.uk/wiki/Lo-tech_1MB_RAM_Boa...
  • Lucky Stripes 99 - Tuesday, May 11, 2021 - link

    Same with your Commodore Amiga and a Zorro bus card. Some were even combo cards that also included a SCSI controller and a space to mount a 3.5" hard drive (a so-called "hard card").

    When I bought an M.2 NVMe PCIe expansion card a couple years ago, I had a bit of a "what's old is new again" feeling from my old computer days.
  • watersb - Wednesday, May 12, 2021 - link

    The new DRAM modules could be called "QuadRam"... It has a nice ring to it...
  • abufrejoval - Tuesday, May 11, 2021 - link

    Just reminds me how normal it used to be to have RAM in extra boards.

    S100 systems always used separate memory boards, Apple ][ gave you an extra 16KB with the language card and my 1.5MB Intel Above board came with an 80287 and Windows 1.0 as goodies ("who would ever need more than 640k in a PC?").

    Never really noticed when they stopped putting RAM into add-on cards but noted how HPC code has spent decades trying to hide where the memory actually is. With the number of memory tiers constantly increasing, CXL will help some code adapting itself to take advantage of potential gains, but I don't envy those who have to write and validate the code that makes it happen.
  • Duncan Macdonald - Tuesday, May 11, 2021 - link

    High bandwidth may be possible but what is the latency ? How long will it take a CPU to read a location in one of these modules vs reading a location in directly attached memory.
    From the various CPU tests that have been done here - ordinary random RAM access is in the order of 80ns - how much overhead will the CXL/PCIe interface and protocols add ?
  • Tomatotech - Tuesday, May 11, 2021 - link

    A wild wafer-munching wizard appears! He answers your questions here:

    https://www.anandtech.com/show/16227/compute-expre...
  • mode_13h - Tuesday, May 11, 2021 - link

    Uh, that's a good background read, but it doesn't specifically address the issue of latency (other than in regards to the optional point-to-point encryption feature).

    That's not fully-accurate. There's a throw-away line:

    "CXL 2.0 is still built upon the same PCIe 5.0 physical standard, which means that there aren’t any updates in bandwidth or latency ..."

    However, my understanding is that the design of the CXL protocol does actually reduce latency vs. PCIe, even though they share the same PHY layer.

    I think it's still going to be bad enough that you wouldn't forego direct-attached DRAM. I see these plug-in modules as being useful for caching and specifically as a memory pool shared between multiple CPUs (and other accelerators).
  • mode_13h - Tuesday, May 11, 2021 - link

    > a memory pool shared between multiple CPUs (and other accelerators).

    I mean specifically for holding shared data.
  • back2future - Tuesday, May 11, 2021 - link

    it was called RAMdrive or i-RAM https://en.wikipedia.org/wiki/RAM_drive
  • pjcamp - Tuesday, May 11, 2021 - link

    Wow! A memory expansion board! That takes me back.

    https://en.wikipedia.org/wiki/Expanded_memory
    https://en.wikipedia.org/wiki/Extended_memory

    I did my dissertation on a PC AT with one of these and Word for Windows before the Windows it ran on even existed. Word was bundled with a runtime version. I still had to save the document every 4 or 5 pages since the disk swapping became intolerable.
  • Toadster - Tuesday, May 11, 2021 - link

    DEVICE=C:\Windows\HIMEM.SYS
    DOS=HIGH,UMB
    DEVICE=C:\Windows\EMM386.EXE NOEMS
  • mode_13h - Tuesday, May 11, 2021 - link

    LOL.

    IIRC, HIMEM was all about unlocking that extra 64k at the end of the 20-bit address range. EMM was a paging-based hack to access > 1 MB addresses without having to run in full-blown 32-bit mode (of which there were 2, IIRC). And configuring your AUTOEXEC.BAT and CONFIG.SYS to play the latest DOS game eventually became a black art.

    Let's not even get started on IRQ and DMA channel conflicts...
  • MrEcho - Tuesday, May 11, 2021 - link

    This would be great for VM servers. RAM is always an issue with running a lot of VM's.
  • Kamen Rider Blade - Tuesday, May 11, 2021 - link

    Why can't the 3x major associations get along and work together?

    CEL (Compute Express Link)
    https://en.wikipedia.org/wiki/Compute_Express_Link

    Gen-Z
    https://en.wikipedia.org/wiki/Gen-Z

    OpenCAPI
    https://en.wikipedia.org/wiki/Coherent_Accelerator...

    Serialized Memory Interface could do a world of good for the future of computing.
    It could replace the current parallel interface to DRAM.
  • Billy Tallis - Tuesday, May 11, 2021 - link

    Last I heard, CXL and Gen-Z were working together, with the aim that CXL would be used more for direct-attached stuff like this within a single system, and Gen-Z would be more for interconnects between systems within the same rack or between racks.

    OpenCAPI seems to still be at odds with the other two, especially CXL. CAPI has been around a lot longer, but CXL seems to have garnered the support of everyone but IBM.
  • Kamen Rider Blade - Tuesday, May 11, 2021 - link

    But OpenCAPI & it's OMI interface seems to be about the connection between the CPU Memory Controller & RAM.

    By changing out the old Parallel interface for a new serialized one in OMI.
  • Wereweeb - Wednesday, May 12, 2021 - link

    How long until non-serial DRAM basically becomes a fat L4 cache?
  • Billy Tallis - Thursday, May 13, 2021 - link

    I doubt it'll ever be handled as a cache by the hardware. Operating systems are probably going to want the different pools of DRAM exposed as separate NUMA nodes.
  • mode_13h - Thursday, May 13, 2021 - link

    I don't see these memory pools as de facto standard. I think they have a few, specific use cases:
    * storage caching
    * in-memory DBs
    * sharing data among multiple CPUs/accelerators
  • guswillard - Monday, May 17, 2021 - link

    This seems to have some potential. I don't have much background on the OS and kernel side of things but trying to join some dots. How will the system see this type of memory, will it appear as contiguous memory under /dev/mem or some other location. For example, if I run a 'dmidecode' command on a system with this memory attached, what type will it be? How will the OS map this memory. Appreciate any pointers to existing reading material.
  • mode_13h - Tuesday, May 18, 2021 - link

    > How will the system see this type of memory

    I assume it'll get mapped into the system's address space. However, I think the OS won't treat it the same as direct-attached memory, by default. There would probably be a special driver for it, or at least some way to explicitly allocate it.
  • AustinTechie - Monday, May 17, 2021 - link

    Based on the images, and the x16 interface shown, this is not a U.2 device...it is an EDSFF E3.S device using the new EDSF PCIe interface.

    Based on this, the current standard for EDSFF E3.S is Max 40W and if they go to an EDSFF E3.L then they will have a 70W envelop.

Log in

Don't have an account? Sign up now