
Original Link: https://www.anandtech.com/show/7118/windows-81-and-vs2013-bring-gpu-computing-updates-to-direct3d-and-c-amp-
Windows 8.1 and VS2013 bring GPU computing updates to Direct3D and C++ AMP
by Rahul Garg on July 2, 2013 8:30 AM EST- Posted in
- graphics
- GPUs
- Windows 8.1
- Compute
Windows 8.1 is bringing a new incremental update to the driver model to WDDM 1.3, which will enable incremental new GPU computing functionality. One of the important pieces is the ability to "map default buffer" (which I will call as MDB), which should be particularly interesting for compute shaders running on APUs/SoCs which combine CPU and GPU on a single chip.
We can explain the feature as follows. In a typical discrete card, GPU has it's own onboard graphics memory. The application allocates memory on the GPU buffer, and the shaders read/write data from this memory. The buffers allocated in GPU memory are called "default buffers" in Direct3D parlance. Let us assume the GPU shader has written some output that you want to read on the CPU. Currently this is done in multiple stages. First, the application allocates a "staging buffer", which is allocated by the Direct3D driver in a special area of system memory such that the GPU can transfer data between the GPU default buffers and staging buffers over the PCI Express bus efficiently. GPU copies the data from GPU buffer to the staging buffer. The CPU then issues a "map" command that allows the CPU to read/write from the staging buffer. This multi-stage process is inefficient for APUs/SoCs where the GPU shares the physical memory with the CPU. In Direct3D 11.2, the staging buffer and the extra copy operation will no longer be required on supported hardware and the CPU will be able to access the GPU buffers directly. Thus, MDB will be a big win for many GPU computing scenarios due to the reduced copy overhead on APUs/SoCs.
Intel recently rolled it's own extension called InstantAccess for Haswell. My understanding is that InstantAccess is a bit more general than MDB because InstantAccess allows mapping of textures as well as buffers whereas D3D 11.2 only allows mapping of default buffers but not textures. Extensions similar to MDB are also common in OpenCL. Both Intel and AMD allow the CPU to read/write from OpenCL GPU buffers. In addition, Intel also exposes some ability for the GPU to read/write from preallocated CPU memory which afaik is not allowed in Direct3D yet. The efficiency of different solutions is still a question that we don't know much about. For example, AMD's OpenCL extension allows the CPU to access GPU memory on Llano, but the CPU reads the data from GPU memory at a very slow speed while writing the data is still pretty fast.
UPDATE: Intel confirmed support for MDB on Ivy Bridge onwards.
At this time, there is no official confirmation about which hardware will support MDB. My expectation is that MDB will likely be available on all recent single chip CPU/GPU systems such as AMD's Trinity and Kabini as well as Intel's Haswell and Ivy Bridge. AMD has already rolled out WDDM 1.3 drivers but curiosly those do not work on Llano and Zacate APUs so I am a little pessimistic about whether those APUs will support this new feature. Microsoft for its part only stated that they expect it to be "broadly available" once WDDM 1.3 drivers are rolled out. I will update the article when we get official word from the vendors about the hardware support status.
Apart from MDB, Microsoft has also added support for runtime shader linking. This will be quite useful for both compute and graphics shaders. The idea is that one can precompile functions in the shader before hand and ship the compiled code, while linking can be done at runtime. Separate compilation and linking has been available under CUDA 5 and OpenCL 1.2 as well. Runtime shader linking is a software feature and will be available on all hardware on Windows 8.1.
C++ AMP, Microsoft's C++ extension for GPU computing, has also been updated with the upcoming VS2013. I think the biggest feature update is that C++ AMP programs will also gain a shared memory feature on APUs/SoCs where the compiler and runtime will be able to eliminate extra data copies between CPU and GPU. This feature will also be available only on Windows 8.1 and it is likely built on top of the "map default buffer" as Microsoft's AMP implementation uses Direct3D under the hood. C++ AMP also brings some other nice additions including enhanced texture support and better debugging abilities.
In addition to compute, Microsoft also introduced a number of graphics updates such as tiled resources but we will likely cover those separately. More information about Direct3D changes can be found in preliminary docs for D3D 11.2 and a talk at BUILD.