Name: Storage Matters: Why Xbox and Playstation SSDs Usher In A New Era of Gaming
Item: Storage Matters: Why Xbox and Playstation SSDs Usher In A New Era of Gaming
Author: Billy Tallis

Original Link: https://www.anandtech.com/show/15848/storage-matters-xbox-ps5-new-era-of-gaming

Storage Matters: Why Xbox and Playstation SSDs Usher In A New Era of Gaming

VIEW ARTICLE

by Billy Tallis on June 12, 2020 9:30 AM EST

200 Comments

A new generation of gaming consoles is due to hit the market later this year, and the hype cycle for the Xbox Series X and Playstation 5 has been underway for more than a year. Solid technical details (as opposed to mere rumors) have been slower to arrive, and we still know much less about the consoles than we typically know about PC platforms and components during the post-announcement, pre-availability phase. We have some top-line performance numbers and general architectural information from Microsoft and Sony, but not quite a full spec sheet.

The new generation of consoles will bring big increases in CPU and GPU capabilities, but we get that with every new generation and it's no surprise when console chips get the same microarchitecture updates as the AMD CPUs and GPUs they're derived from. What's more special with this generation is the change to storage: the consoles are following in the footsteps of the PC market by switching from mechanical hard drives to solid state storage, but also going a step beyond the PC market to get the most benefit out of solid state storage.

Xbox Series X internals

Solid State Drives were revolutionary for the PC market, providing immense improvements to overall system responsiveness. Games benefited mostly in the form of faster installation and level load times, but fast storage also helped reduce stalls and stuttering when a game needs to load data on the fly. In recent years, NVMe SSDs have provided speeds that are on paper several times faster than what is possible with SATA SSDs, but for gamers the benefits have been muted at best. Conventional wisdom holds that there are two main causes to suspect for this disappointment: First, almost all games and game engines are still designed to be playable off hard drives because current consoles and many low-end PCs lack SSDs. Game programmers cannot take full advantage of NVMe SSD performance without making their games unplayably slow on hard drives. Second, SATA SSDs are already fast enough to shift the bottleneck elsewhere in the system, often in the form of data decompression. Something aside from the SSD needs to be sped up before games can properly benefit from NVMe performance.

Microsoft and Sony are addressing both of those issues with their upcoming consoles. Game developers will soon be free to assume that users will have fast storage, both on consoles and on PCs. In addition, the new generation of consoles will add extra hardware features to address bottlenecks that would be present if they were merely mid-range gaming PCs equipped with cutting-edge SSDs. However, we're still dealing with powerful hype operations promoting these upcoming devices. Both companies are guilty of exaggerating or oversimplifying in their attempts to extol the new capabilities of their next consoles, especially with regards to the new SSDs. And since these consoles are still closed platforms that aren't even on the market yet, some of the most interesting technical details are still being kept secret.

The main source of official technical information about the PS5 (and especially its SSD) is lead designer Mark Cerny. In March, he gave an hour-long technical presentation about the PS5 and spent over a third of it focusing on storage. Less officially, Sony has filed several patents that apparently pertain to the PS5, including one that lines up well with what's been confirmed about the PS5's storage technology. That patent discloses numerous ideas Sony explored in the development of the PS5, and many of them are likely implemented in the final design.

Microsoft has taken the approach of more or less dribbling out technical details through sporadic blog posts and interviews, especially with DigitalFoundry (who also have good coverage of the PS5). They've introduced brand names for many of their storage-related technologies (eg. "Xbox Velocity Architecture"), but in too many cases we don't really know anything about a feature other than its name.

Aside from official sources, we also have leaks, comments and rumors of varying quality, from partners and other industry sources. These have definitely helped fuel the hype, but with regards to the console SSDs in particular, these non-official sources have produced very little in the way of real technical details. That leaves us with a lot of gaps that require analysis of what's possible and probable for the upcoming consoles to include.

What do we know about the console SSDs?

Microsoft and Sony are each using custom NVMe SSDs for their consoles, albeit with different definitions of "custom". Sony's solution aims for more than twice the performance of Microsoft's solution and is definitely more costly even though it will have the lower capacity. Broadly speaking, Sony's SSD will offer similar performance to the high-end PCIe 4.0 NVMe SSDs we expect to see on the retail market by the end of the year, while Microsoft's SSD is more comparable to entry-level NVMe drives. Both are a huge step forward from mechanical hard drives or even SATA SSDs.

Console SSD Confirmed Specifications
	Microsoft Xbox Series X	Sony Playstation 5
Capacity	1 TB	825 GB
Speed (Sequential Read)	2.4 GB/s	5.5 GB/s
Host Interface	NVMe	PCIe 4.0 x4 NVMe
NAND Channels		12
Power	3.8 W

The most important and impressive performance metric for the console SSDs is their sequential read speed. SSD write speed is almost completely irrelevant to video game performance, and even when games perform random reads it will usually be for larger chunks of data than the 4kB blocks that SSD random IO performance ratings are normally based upon. Microsoft's 2.4GB/s read speed is 10–20 times faster than what a mechanical hard drive can deliver, but falls well short of the current standards for high-end consumer SSDs which can saturate a PCIe 3.0 x4 interface with at least 3.5GB/s read speeds. Sony's 5.5GB/s read speed is slightly faster than currently-available PCIe 4.0 SSDs based on the Phison E16 controller, but everyone competing in the high-end consumer SSD market has more advanced solutions on the way. By the time it ships, the PS5 SSD's read performance will be unremarkable – matched by other high-end SSDs – except in the context of consoles and low-cost gaming PCs that usually don't have room in the budget for high-end storage.

Sony has disclosed that their SSD uses a custom controller with a 12-channel interface to the NAND flash memory. This seems to be the most important way in which their design differs from typical consumer SSDs. High-end consumer SSDs generally use 8-channel controllers and low-end drives often use 4 channels. Higher channel counts are more common for server SSDs, especially those that need to support extreme capacities; 16-channel controllers are common and 12 or 18 channel designs are not unheard of. Sony's use of a higher channel count than any recent consumer SSD means their SSD controller will be uncommonly large and expensive, but on the other hand they don't need as much performance from each channel in order to reach their 5.5GB/s goal. They could use any 64-layer or newer TLC NAND and have adequate performance, while consumer SSDs hoping to offer this level of performance or more with 8-channel controllers need to be paired with newer, faster NAND flash.

The 12-channel controller also leads to unusual total capacities. A console SSD doesn't need any more overprovisioning than typical consumer SSDs, so 50% more channels should translate to about 50% more usable capacity. The PS5 will ship with "825 GB" of SSD space, which means we should see each of the 12 channels equipped with 64GiB of raw NAND, organized as either one 512Gbit (64GB) die or two 256Gbit (32GB) dies per channel. That means the nominal raw capacity of the NAND is 768GiB or about 824.6 (decimal) GB. The usable capacity after accounting for the requisite spare area reserved by the drive is probably going to be more in line with what would be branded as 750 GB by a drive manufacturer, so Sony's 825GB is overstating things by about 10% more than normal for the storage industry. It's something that may make a few lawyers salivate.

It's probably worth mentioning here that it is unrealistic for Sony to have designed their own high-performance NVMe SSD controller, just like they can't do a CPU or GPU design on their own. Sony had to partner with an existing SSD controller vendor and commission a custom controller, probably assembled largely from pre-existing and proven IP, but we don't know who that partner is.

Microsoft's SSD won't be pushing performance at all beyond normal new PC levels now that OEMs have moved beyond SATA SSDs, but a full 1TB in a PC priced similarly to consoles would still be a big win for consumers. Multiple sources indicate that Microsoft is using an off-the-shelf SSD controller from one of the usual suspects (probably the Phison E19T controller), and the drive itself is built by a major SSD OEM. However, they can still lay claim to using a custom form factor and probably custom firmware.

Neither console vendor has shared official information about the internals of their SSD aside from Sony's 12-channel specification, but the capacities and performance numbers give us a clue about what to expect. Sony's pretty much committed to using TLC NAND, but Microsoft's lower performance target is down in the territory where QLC NAND is an option: 2.4GB/s is a bit more than we see from current 4-channel QLC drives like the Intel 665p (about 2GB/s) but much less than 8-channel QLC drives like the Sabrent Rocket Q (rated 3.2GB/s for the 1TB model). The best fit for Microsoft's expected performance among current SSD designs would be a 4-channel drive with TLC NAND, but newer 4-channel controllers like the Phison E19T should be able to hit those speeds with the right QLC NAND. Either console could conceivably in the future get a double-capacity version that uses QLC NAND to reach the same read performance of the original models.

DRAMless, but that's OK?

Without performance specs for writes or random reads, we cannot rule out the possibility of either console SSD using a DRAMless controller. Including a full-sized DRAM cache for the flash translation layer (FTL) tables on a SSD primarily helps performance in two ways: better sustained write speeds when the drive's full enough to require a lot of background work shuffling data around, and better random access speed when reading data across the full range of the drive. Neither of those really fits the console use case: very heavily read-oriented, and only accessing one game's dataset at a time. Even if game install sizes end up being in the 100-200GB range, at any given moment the amount of data used by a game won't be more than low tens of GB, and that is easily handled by DRAMless SSDs with a decent amount of SRAM on the controller itself. Going DRAMless seems very likely for Microsoft's SSD, and while it would be very strange in any other context to see a 12-channel DRAMless controller, that option does seem to be viable for Sony (and would offset the cost of the high channel count).

The Sony patent mentioned earlier goes in depth on how to make a DRAMless controller even more suitable for console use cases. Rather than caching a portion of the FTL's logical-to-physical address mapping table in on-controller SRAM, Sony proposes making the table itself small enough to fit in a small SRAM buffer. Mainstream SSDs have a ratio of 1 GB of DRAM for each 1 TB of flash memory. That ratio is a direct consequence of the FTL managing flash in 4kB chunks. Having the FTL manage flash in larger chunks directly reduces the memory requirements for the mapping table. The downside is that small writes will cause much more write amplification and be much slower. Western Digital sells a specialized enterprise SSD that uses 32kB chunks for its FTL rather than 4kB, and as a result it only needs an eighth the amount of DRAM. That drive's random write performance is poor, but the read performance is still competitive. Sony's patent proposes going way beyond 32kB chunks to using 128MB chunks for the FTL, shrinking the mapping table to mere kilobytes. That requires the host system to be very careful about when and where it writes data, but the read performance that gaming relies upon is not compromised.

In short, while the Sony SSD should be very fast for its intended purpose, I'm going to wager that you really wouldn't want it in your Windows PC. The same is probably true to some extent of Microsoft's SSD, depending on their firmware tuning decisions.

Expandability

Both Microsoft and Sony are providing expandability for the NVMe storage of their upcoming consoles. Microsoft's solution is to re-package their internal SSD into a custom removable form factor reminiscent of what consoles used back when memory cards were measured in MB instead of TB and before USB flash drives were ubiquitous. Since it uses all the same components, this expansion card will be functionally identical to the internal storage. The downside is that Microsoft will control the supply and probably pricing of the cards; currently Seagate is the only confirmed partner for selling these proprietary expansion cards.

Sony is taking the opposite approach, by giving users access to a standard M.2 PCIe 4.0 slot that can accept aftermarket upgrades. The requirements aren't entirely clear: Sony will be doing compatibility testing with third-party drives in order to publish a compatibility list, but they haven't said whether drives not on their approved list will be rejected by the PS5 console. To make it onto Sony's compatibility list, a drive will need to mechanically fit (ie. no excessively large heatsink) and offer at least as much performance as Sony's custom internal SSD. The performance requirements mean no drive currently available at retail will qualify, but the situation will be very different next year.

Balancing The System With Other Hardware Features

The biggest technological advantage consoles have over PCs is that consoles are a fully-integrated fixed platform specified by a single manufacturer. In theory, the manufacturer can ensure that the system is properly balanced for the use case, something PC OEMs are notoriously bad at. Consoles generally don't have the problem of wasting a large chunk of the budget on a single high-end component that the rest of the system cannot keep up with, and consoles can more easily incorporate custom hardware when suitable off-the-shelf components aren't available. (This is why the outgoing console generation didn't use desktop-class CPU cores, but dedicated a huge amount of the silicon budget to the GPUs.)

By now, PC gaming has thoroughly demonstrated that increasing SSD speed has little or no impact on gaming performance. NVMe SSDs are several times faster than SATA SSDs on paper, but for almost all PC games that extra performance goes largely unused. In part, this is due to bottlenecks elsewhere in the system that are revealed when storage performance is fast enough to no longer be a serious limitation. The upcoming consoles will include a number of hardware features designed to make it easier for games to take advantage of fast storage, and to alleviate bottlenecks that would be troublesome on a standard PC platform. This is where the console storage tech gets actually interesting, since the SSDs themselves are relatively unremarkable.

Compression: Amplifying SSD Performance

The most important specialized hardware feature the consoles will include to complement storage performance is dedicated data decompression hardware. Game assets must be stored on disk in a compressed form to keep storage requirements somewhat reasonable. Games usually rely on multiple compression methods—some lossy compression methods specialized for certain types of data (eg. audio and images), and some lossless general-purpose algorithm, but almost everything goes through at least one compression method that is fairly computationally complex. GPU architectures have long included hardware to handle decoding video streams and support simple but fast lossy texture compression methods like S3TC and its successors, but that leaves a lot of data to be decompressed by the CPU. Desktop CPUs don't have dedicated decompression engines or instructions, though many instructions in the various SIMD extensions are intended to help with tasks like this. Even so, decompressing a stream of data at several GB/s is not trivial, and special-purpose hardware can do it more efficiently while freeing up CPU time for other tasks. The decompression offload hardware in the upcoming consoles is implemented on the main SoC so that it can unpack data after it traverses the PCIe link from the SSD and resides in the main RAM pool shared by the GPU and CPU cores.

Decompression offload hardware like this isn't found on typical desktop PC platforms, but it's hardly a novel idea. Previous consoles have included decompression hardware, though nothing that would be able to keep pace with NVMe SSDs. Server platforms often include compression accelerators, usually paired with cryptography accelerators: Intel has done such accelerators both as discrete peripherals and integrated into some server chipsets, and IBM's POWER9 and later CPUs have similar accelerator units. These server accelerators more comparable to what the new consoles need, with throughput of several GB/s.

Microsoft and Sony each have tuned their decompression units to fit the performance expected from their chosen SSD designs. They've chosen different proprietary compression algorithms to target: Sony is using RAD's Kraken, a general-purpose algorithm which was originally designed to be used on the current consoles with relatively weak CPUs but vastly lower throughput requirements. Microsoft focused specifically on texture compression, reasoning that textures account for the largest volume of data that games need to read and decompress. They developed a new texture compression algorithm and dubbed it BCPack in a slight departure from their existing DirectX naming conventions for texture compression methods already supported by GPUs.

Compression Offload Hardware
	Microsoft Xbox Series X	Sony Playstation 5
Algorithm	BCPack	Kraken (and ZLib?)
Maximum Output Rate	6 GB/s	22 GB/s
Typical Output Rate	4.8 GB/s	8–9 GB/s
Equivalent Zen 2 CPU Cores	5	9

Sony states that their Kraken-based decompression hardware can unpack the 5.5GB/s stream from the SSD into a typical 8-9 GB/s of uncompressed data, but that can theoretically reach up to 22 GB/s if the data was redundant enough to be highly compressible. Microsoft states their BCPack decompressor can output a typical 4.8 GB/s from the 2.4 GB/s input, but potentially up to 6 GB/s. So Microsoft is claiming slightly higher typical compression ratios, but still a slower output stream due to the much slower SSD, and Microsoft's hardware decompression is apparently only for texture data.

The CPU time saved by these decompression units sounds astounding: the equivalent of about 9 Zen 2 CPU cores for the PS5, and about 5 for the Xbox Series X. Keep in mind these are peak numbers that assume the SSD bandwidth is being fully utilized—real games won't be able to keep these SSDs 100% busy, so they wouldn't need quite so much CPU power for decompression.

The storage acceleration features on the console SoCs aren't limited to just compression offload, and Sony in particular has described quite a few features, but this is where the information released so far is really vague, unsatisfying and open to interpretation. Most of this functionality seems to be intended to reduce overhead, handling some of the more mundane aspects of moving data around without having to get the CPU involved as often, and making sure the hardware decompression process is invisible to the game software.

DMA Engines

Direct Memory Access (DMA) refers to the ability for a peripheral device to read and write to the CPU's RAM without the CPU being involved. All modern high-speed peripherals use DMA for most of their communication with the CPU, but that's not the only use for DMA. A DMA Engine is a peripheral device that exists solely to move data around; it usually doesn't do anything to that data. The CPU can instruct the DMA engine to perform a copy from one region of RAM to another, and the DMA engine does the rote work of copying potentially gigabytes of data without the CPU having to do a mov (or SIMD equivalent) instruction for every piece, and without polluting CPU caches. DMA engines can also often do more than just offload simple copy operations: they commonly support scatter/gather operations to rearrange data somewhat in the process of moving it around. NVMe already has features like scatter/gather lists that can remove the need for a separate DMA engine to provide that feature, but the NVMe commands in these consoles are acting mostly on compressed data.

Even though DMA engines are a peripheral device, you usually won't find them as a standalone PCIe card. It makes the most sense for them to be as close to the memory controller as possible, which means on the chipset or on the CPU die itself.The PS5 SoC includes a DMA engine to handle copying around data coming out of the compression unit. As with the compression engines, this isn't a novel invention so much as a feature missing from standard desktop PCs, which means it's something custom that Sony has to add to what would otherwise be a fairly straightforward AMD APU configuration.

IO Coprocessor

The IO complex in the PS5's SoC also includes a dual-core processor with its own pool of SRAM. Sony has said almost nothing about the internals of this: Mark Cerny describes one core as dedicated to SSD IO, allowing games to "bypass traditional file IO", while the other core is described simply as helping with "memory mapping". For more detail, we have to turn to a patent Sony filed years ago, and hope it reflects what's actually in the PS5.

The IO coprocessor described in Sony's patent offloads portions of what would normally be the operating system's storage drivers. One of its most important duties is to translate between various address spaces. When the game requests a certain range of bytes from one of its files, the game is looking for the uncompressed data. The IO coprocessor figures out which chunks of compressed data are needed and sends NVMe read commands to the SSD. Once the SSD has returned the data, the IO coprocessor sets up the decompression unit to process that data, and the DMA engine to deliver it to the requested locations in the game's memory.

Since the IO coprocessor's two cores are each much less powerful than a Zen 2 CPU core, they cannot be in charge of all interaction with the SSD. The coprocessor handles the most common cases of reading data, and the system falls back to the OS running on the Zen 2 cores for the rest. The coprocessor's SRAM isn't used to buffer the vast amounts of game data flowing through the IO complex; instead this memory holds the various lookup tables used by the IO coprocessor. In this respect, it is similar to an SSD controller with a pool of RAM for its mapping tables, but the job of the IO coprocessor is completely different from what an SSD controller does. This is why it will be useful even with aftermarket third-party SSDs.

Cache Coherency

The last somewhat storage-related hardware feature Sony has disclosed is a set of cache coherency engines. The CPU and GPU on the PS5 SoC share the same 16 GB of RAM, which eliminates the step of copying assets from main RAM to VRAM after they're loaded from the SSD and decompressed. But to get the most benefit from the shared pool of memory, the hardware has to ensure cache coherency not just between the several CPU cores, but also with the GPU's various caches. That's all normal for an APU, but what's novel with the PS5 is that the IO complex also participates. When new graphics assets are loaded into memory through the IO complex and overwrite older assets, it sends cache invalidation signals to any relevant caches—to discard only the stale data, rather than flush the entire GPU caches.

What about the Xbox Series X?

There's a lot of information above about the Playstation 5's custom IO complex, and it's natural to wonder whether the Xbox Series X will have similar capabilities or if it's limited to just the decompression hardware. Microsoft has lumped the storage-related technologies in the new Xbox under the heading of "Xbox Velocity Architecture":

Microsoft defines this as having four components: the SSD itself, the compression engine, a new software API for accessing storage (more on this later), and a hardware feature called Sampler Feedback Streaming. That last one is only distantly related to storage; it's a GPU feature that makes partially resident textures more useful by allowing shader programs to keep a record of which portions of a texture are actually being used. This information can be used to decide what data to evict from RAM and what to load next—such as a higher-resolution version of the texture regions that are actually visible at the moment.

Since Microsoft doesn't mention anything like the other PS5 IO complex features, it's reasonable to assume the Xbox Series X doesn't have those capabilities and its IO is largely managed by the CPU cores. But I wouldn't be too surprised to find out the Series X has a comparable DMA engine, because that's kind of feature has historically shown up in many console architectures.

What To Expect From Next-gen Games

NVMe SSDs can easily be 50 times faster than hard drives for sequential reads and thousands of times faster for random reads. It stands to reason that game developers should be able to do things differently when they no longer need to target slow hard drives and can rely on fast storage. Workarounds for slow hard drive performance can be discarded, and new ideas and features that would never work on hard drives can be tried out. Microsoft and Sony are in pretty close agreement about what this will mean for the upcoming console generation, and they've touted most of the same benefits for end users.

Most of the game design changes enabled by abandoning hard drives will have little impact on the gaming experience from one second to the next; removing workarounds for slow storage won't do much to help frames per second, but it will remove some other pain points in the overall console experience. For starters, solid state drives can tolerate a high degree of fragmentation with no noticeable performance impact, so game files don't need to be defragmented after updates. Defragmentation is something most PC users no longer need to give even a passing thought, but it's still an occasionally necessary (albeit automatic) maintenance process on current consoles.

Not as obsolete as you might have thought. But soon.

Since game developers no longer need to care so much about maintaining spatial locality of data on disk, it will also no longer be necessary for data that's reused in several parts of a game to be duplicated on several parts of the disk. Commonly re-used sounds, textures and models will only need to be included once in a game's files. This will have at least a tiny effect on slowing the growth of game install sizes, but it probably won't actually reverse that trend except where a studio has been greatly abusing the copy and paste features in their level editors.

Clone tool abuse

Warnings to not turn off the console while a game is saving first showed up when consoles moved away from cartridges with built-in solid state storage, and those warnings continue to be a hallmark of many console games and half-assed PC ports. The write speeds of SSDs are fast enough that saving a game takes much less time than reaching for a power switch, so ideally those warnings should be reduced, if not gone for good.

How to spot a console port

But NVMe SSDs have write speeds that go far beyond even that requirement, and that allows changes in how games are saved. Instead of summarizing the player's progress in a file that is mere megabytes, new consoles will have the freedom to dump gigabytes of data to disk. All of the RAM used by a game can be saved to a NVMe SSD in a matter of seconds, like the save state features common to console emulators. If the static assets (textures, sounds, etc.) that are unmodified are excluded from the save file, we're back down to near instant save operations. But now the save file and in-use game assets can be simply read back into RAM to resume the whole game state in no more than a second or two, bypassing all the usual start-up and load work done by games.

Xbox Series X Quick Resume menu

The deduplication of game assets is a benefit that will trivially carry over to PC ports, and the lack of defragmentation is something PC gamers with SSDs have already been enjoying for years—and neither of those changes requires a cutting edge SSD. The instant save and resume capabilities can work just fine (albeit not quite so "instant") on even a SATA SSD, but implementing this well requires a bit of help from the operating system, so it may be a while before this feature becomes commonplace on PC games. (Desktop operating systems have long supported hibernate and restore of the entire OS image, but doing it per-application is unusual.)

But those are all pretty much convenience features that do not make the core game experience itself any richer. The reduction or elimination of loading screens be a welcome improvement for many games—but many more games have already gone to great lengths to eliminate loading screens as much as possible. This most often takes the form of level design that obscures what would have been a loading screen. The player's movement and field of view are temporarily constrained, drastically reducing the assets that need to stay in RAM and allowing everything else to be swapped out, while retaining some illusion of player freedom. Narrow hallways and canyons, elevator and train rides, and airlocks—frequently one-way trips—are all standard game design elements to make it less obvious where a game world is divided.

Level design for 64MB of RAM

Open-world games get by with fewer such design elements by making the world less detailed and restricting player movement speed so that no matter where the player chooses to move, the necessary assets can be streamed into RAM on the fly. With a fast SSD, game designers and players will both have more freedom. But some transition sequences will still be required for major scene changes. The consoles cannot reload the entire contents of RAM from one frame to the next; that will still take several seconds.

SSD As RAM?

Finally, we come to what may be the most significant consequence of making SSDs standard and required for games, but is also the most overstated benefit: Both Microsoft and Sony have made statements along the lines of saying that the SSD can be used almost like RAM. Whatever qualifiers and caveats those statement came with quickly get dropped by fans and even some press. So let's be clear here: the console SSDs are no substitute for RAM. The PS5's SSD can supply data at 5.5 GB/s. The RAM runs at 448 GB/s, *81 times faster*. The consoles have 16 GB of GDDR6 memory. If a game needs to use more than 16 GB to render a scene, framerates will drop down to Myst levels because the SSD is not fast enough. The SSDs are inadequate in both throughput and latency.

It's certainly possible for a level to use more than 16GB of assets, but not all on screen at once. The technical term here is working set: how much memory is really being actively used at once. What the SSD changes somewhat is the threshold for what can be considered active. With a fast SSD, the assets that need to be kept in DRAM aren't much more than what's currently on screen, and the game doesn't need to prefetch very far ahead. Textures for an object in the same room but currently off-screen may be safe to leave on disk until the camera starts moving in that direction, whereas a hard drive based system will probably need to keep assets for the entire room and adjacent rooms in RAM to avoid stuttering. This difference means an SSD-based console (especially with NVMe performance) can free up some VRAM and allow for some higher-resolution assets. It's not a huge change; it's not like the SSD increases effective VRAM size by tens of GBs, but it is very plausible that it allows games to use an extra few GB of RAM for on-screen content rather than prefetching off-screen assets. Mark Cerny has approximated it as saying the game now only needs to pre-fetch about a second ahead, rather than about 30 seconds ahead.

(Not to scale)

There's another layer to this benefit: partially resident textures has been possible on other platforms, but becomes more powerful now. What was originally developed for multi-acre ground textures can now be effectively used on much smaller objects. Sampler feedback allows the GPU to provide the application with more detailed and accurate information about which portions of a texture are actually being displayed. The game can use that information to only issue SSD read requests for those portions of the file. This can be done with a granularity of 128kB blocks of the (uncompressed) texture, which is small enough to allow for meaningful RAM savings by not loading texels that won't be used, while at the same time still issuing SSD reads that are large enough to be a good fit for SSD performance characteristics.

Microsoft has stated that these capabilities add up to the effect of a 2x or 3x multiplier of RAM capacity and SSD bandwidth. I'm not convinced. Sure, a lot of SSD bandwidth can be saved over short timescales by incrementally loading a scene. But I doubt these features will allow the Series X with its ~10GB of VRAM to handle the kind of detailed scenery you could draw on a PC GPU with 24GB of VRAM. They're welcome to prove me wrong, though.

What's Necessary to Get Full Performance Out of a Solid State Drive?

The storage hardware in the new consoles opens up new possibilities, but on its own the hardware cannot revolutionize gaming. Implementing the new features enabled by fast SSDs still requires software work from the console vendors and game developers. Extracting full performance from a high-end NVMe SSD requires a different approach to IO than methods that work for hard drives and optical discs.

We don't have any next-gen console SSDs to play with yet, but based on the few specifications released so far we can make some pretty solid projections about their performance characteristics. First and foremost, hitting the advertised sequential read speeds will require keeping the SSDs busy with a lot of requests for data. Consider one of the fastest SSDs we've ever tested, the Samsung PM1725a enterprise SSD. It's capable of reaching over 6.2 GB/s when performing sequential reads in 128kB chunks. But asking for those chunks one at a time only gets us 680 MB/s. This drive requires a queue depth of at least 16 to hit 5 GB/s, and at least QD32 to hit 6 GB/s. Newer SSDs with faster flash memory may not require queue depths that are quite so high, but games will definitely need to make more than a few requests at a time to keep the console SSDs busy.

The consoles cannot afford to waste too much CPU power on communicating with the SSDs, so they need a way for just one or two threads to manage all of the IO requests and still have CPU time left over for those cores to do something useful with the data. That means the consoles will have to be programmed using asynchronous IO APIs, where a thread issues a read request to the operating system (or IO coprocessor) but goes back to work while the request is being processed. And the thread will have to check back later to see if a request has been fulfilled. In the hard drive days, such a thread would go off and do several non-storage tasks while waiting for a read operation to complete. Now, that thread will have to spend that time issuing several more requests.

In addition to keeping queue depths up, obtaining full speed from the SSDs will require doing IO in relatively large chunks. Trying to hit 5.5 GB/s with 4kB requests would require handling about 1.4M IOs per second, which would strain several parts of the system with overhead. Fortunately, games tend to naturally deal with larger chunks of data, so this requirement isn't too much trouble; it mainly means that many traditional measures of SSD performance are irrelevant.

Microsoft has said extremely little about the software side of the Xbox Series X storage stack. They have announced a new API called DirectStorage. We don't have any description of how it works or differs from existing or previous console storage APIs, but it is designed to be more efficient:

DirectStorage can reduce the CPU overhead for these I/O operations from multiple cores to taking just a small fraction of a single core.

The most interesting bit about DirectStorage is that Microsoft plans to bring it to Windows, so the new API cannot be relying on any custom hardware and it has to be something that would work on top of a regular NTFS filesystem. Based on our experiences testing fast SSDs under Windows, they could certainly use a lower-overhead storage API, and it would be applicable to far more than just video games.

Sony's storage API design is probably intertwined with their IO coprocessors, but it's unlikely that game developers have to be specifically aware that their IO requests are being offloaded. Mark Cerny has stated that games can bypass normal file IO, elaborated a bit in an interview with Digital Foundry:

There's low level and high level access and game-makers can choose whichever flavour they want - but it's the new I/O API that allows developers to tap into the extreme speed of the new hardware. The concept of filenames and paths is gone in favour of an ID-based system which tells the system exactly where to find the data they need as quickly as possible. Developers simply need to specify the ID, the start location and end location and a few milliseconds later, the data is delivered. Two command lists are sent to the hardware - one with the list of IDs, the other centring on memory allocation and deallocation - i.e. making sure that the memory is freed up for the new data.

Getting rid of filenames and paths doesn't win much performance on its own, especially since the system still has to support a hierarchical filesystem API for the sake of older code. The real savings come from being able to specify the whole IO procedure in a single step instead of the application having to manage parts like the decompression and relocating the data in memory—both handled by special-purpose hardware on the PS5.

For a more public example of what a modern high-performance storage can accomplish, it's worth looking at the io_uring asynchronous API added to Linux last year. We used it on our last round of enterprise SSD reviews to get much better throughput and latency out of the fastest drives available. Where old-school Unix style synchronous IO topped out at a bit less than 600k IOPS on our 36-core server, io_uring allowed a single core to hit 400k IOPS. Even compared to the previous asynchronous IO APIs in Linux, io_uring has lower overhead and better scalability. The API's design has applications communicating with the operating system in a very similar manner to how the operating system communicates with NVMe SSDs: pairs of command submission and completion queues that are accessible by both parties. Large batches of IO commands can be submitted with at most one system call, and no system calls are needed to check for command completion. That's a big advantage in a post-Spectre world where system call overhead is much higher. Recent experimentation has even shown that the io_uring design allows for shader programs running on a GPU to submit IO requests with minimal CPU involvement.

Most of the work relating to io_uring on Linux is too recent to have influenced console development, but it still illustrates a general direction that the industry is moving toward, driven by the same needs to make good use of NVMe performance without wasting too much CPU time.

Keeping Latency Under Control

While game developers will need to put some effort into extracting full performance from the console SSDs, there is a competing goal. Pushing an SSD to its performance limits causes latency to increase significantly, especially if queue depths go above what is necessary to saturate the drive. This extra latency doesn't matter if the console is just showing a loading screen, but next-generation games will want to keep the game running interactively while streaming in large quantities of data. Sony has outlined their plan for dealing with this challenge: their SSD implements a custom feature to support 6 priority levels for IO commands, allowing large amounts of data to be loaded without getting in the way when a more urgent read request crops up. Sony didn't explain much of the reasoning behind this feature or how it works, but it's easy to see why they need something to prioritize IO.

Loading a new world in 2.25 seconds as Ratchet & Clank fall through an inter-dimensional rift

Mark Cerny gave a hypothetical example of when multiple priority levels are needed: when a player is moving into a new area, lots of new textures may need to be loaded, at several GB per second. But since the game isn't interrupted by a loading screen, stuff keeps happening, and an in-game event (eg a character getting shot) may require data like a new sound effect to be loaded. The request for that sound effect will be issued after the requests for several GB of textures, but it needs to be completed before all the texture loading is done because stuttering sound is much more noticeable and distracting than a slight delay in the gradual loading of fresh texture data.

But the NVMe standard already includes a prioritization feature, so why did Sony develop their own? Sony's SSD will support 6 priority levels, and Mark Cerny claims that the NVMe standard only supports "2 true priority levels". A quick glance at the NVMe spec shows that it's not that simple:

The NVMe spec defines two different command arbitration schemes for determining which queue will supply the next command to be handled by the drive. The default is a simple round-robin balancing that treats all IO queues equally and leaves all prioritization up to the host system. Drives can also optionally implement the weighted round robin scheme, which provides four priority levels (not counting one for admin commands only). But the detail Sony is apparently concerned with is that among those four priority levels, only the "urgent" class is given strict priority over the other levels. Strict prioritization is the simplest form of prioritization to implement, but such methods are a poor choice for general-purpose systems. In a closed specialized system like a game console it's much easier to coordinate all the software that's doing IO in order to avoid deadlocks and starvation. much of the IO done by a game console also comes with natural timing requirements.

This much attention being given to command arbitration comes as a bit of a surprise. The conventional wisdom about NVMe SSDs is that they are usually so fast that IO prioritization is unnecessary, and wasting CPU time on re-ordering IO commands is just as likely to reduce overall performance. In the PC and server space, the NVMe WRR command arbitration feature has been largely ignored by drive manufacturers and OS vendors—a partial survey of our consumer NVMe SSD collection only turned up two brands that have enabled this feature on their drives. So when it comes to supporting third-party SSD upgrades, Sony cannot depend on using the WRR command arbitration feature. This might mean they also won't bother to use it even when a drive has this feature, instead relying entirely on their own mechanism managed by the CPU and IO coprocessors.

Sony says the lack of six priority levels on off the shelf NVMe drives means they'll need slightly higher raw performance to match the same real-world performance of Sony's drive because Sony will have to emulate the 6 priority levels on the host side, using some combination of CPU and IO coprocessor work. Based on our observations of enterprise SSDs (which are designed with more of a focus on QoS than consumer SSDs), holding 15-20% of performance in reserve typically keeps latency plenty low (about 2x the latency of an idle SSD) without any other prioritization mechanism, so we project that drives capable of 6.5GB/s or more should have no trouble at all.

Latency spikes as drives get close to their throughput limit

It's still a bit of a mystery what Sony plans to do with so many priority levels. We can certainly imagine a hierarchy of several priority levels for different kinds of data: Perhaps game code is the highest priority to load since at least one thread of execution will be completely stalled while handling a page fault, so this data is needed as fast as possible (and ideally should be kept in RAM full-time rather than loaded on the fly). Texture pre-fetching is probably the lowest priority, especially fetching higher-resolution mimpaps when a lower-resolution version is already in RAM and usable in the interim. Geometry may be a higher priority than textures, because it may be needed for collision detection and textures are useless without geometry to apply them to. Sound effects should ideally be loaded with latency of at most a few tens of milliseconds. Their patent mentions giving higher priority to IO done using the new API, on the theory that such code is more likely to be performance-critical.

Planning out six priority classes of data for a game engine isn't too difficult, but that doesn't mean it will actually be useful to break things down that way when interacting with actual hardware. Recall that the whole point of prioritization and other QoS methods is to avoid excess latency. Excess latency happens when you give the SSD more requests than it can work on simultaneously; some of the requests have to sit in the command queue(s) waiting their turn. If there are a lot of commands queued up, a new command added at the back of the line will have a long time to wait. If a game sends the PS5 SSD new requests at a rate totaling more than 5.5GB/s, a backlog will build up and latency will keep growing until the game stops requesting data more quickly than the SSD can deliver. When the game is requesting data at much less than 5.5GB/s, every time a new read command is sent to the SSD, it will start processing that request almost immediately.

So what's most important is limiting the amount of requests that can pile up in the SSD's queues, and once that problem is solved, there's not much need for further prioritization. It should only take one queue to put all the background, latency-insensitive IO commands into to be throttled, and then everything else can be handled with low latency.

Closing Thoughts

The transition of console gaming to solid state storage will change the landscape of video game design and development. A dam is breaking, and game developers will soon be free to ignore the limitations of hard drives and start exploring the possibilities of fast storage. It may take a while for games to fully utilize the performance of the new console SSDs, but there will be many tangible improvements available at launch.

The effects of this transition will also spill over into the PC gaming market, exerting pressure to help finally push hard drives out of low-end gaming PCs, and allowing gamers with high-end PCs to start enjoying more performance from their heretofore underutilized fast SSDs. And changes to to the Windows operating system itself are already underway because of these new consoles.

Ultimately, it will be interesting to see whether the novel parts of the new console storage subsystems end up being a real advantage that influences the direction of PC hardware development, or if they end up just being interesting quirks that get left in the dust as PC hardware eventually overtakes the consoles with superior raw performance. NVMe SSDs arrived at the high end of the consumer market five years ago. Now, they're crossing a tipping point and are well on the way to becoming the mainstream standard for storage.