Nvidia Actively Working To Implement DirectX 12 Async Compute With Oxide Games In Ashes Of The Singularity
Whether Nvidia hardware supports DirectX 12 Async Compute or not has been a hot issue of debate, exacerbated by the company’s silence. Nvidia continues to be silent on the matter and despite the best efforts of the media to get the company to comment and elaborate on the matter we have yet to see the company come out with any statements thus far.
Fortunately however, the vocal nature of Oxide Game’s engagement in this debate may help us answer some of the leering questions about Nvidia’s support for Async Compute. We’ve seen several comments from a developer at Oxide Games talking about the DirectX 12 benchmark for Ashes Of The Singularity in the past week. In which the developer elaborated on several aspects of the DirectX 12 benchmark and why Nvidia GPUs seemingly struggled with it, or at least did not perform as gracefully as their AMD Radeon counterparts in it. And the developer concluded that it was down to an advantage that AMD GPUs posses over their Nvidia counterparts with a feature dubbed Asynchronous Shading/Shaders/Compute.
Nvidia Is Actively Working With Oxide Games To Implement DirectX 12 Async Compute Support For GeForce 900 Series GPUs In Ashes Of The Singularity
Yesterday the same developer issued a status update on the very same topic of DX12 Async Compute via a comment in that same vibrant Ashes Of The Singularity overclock.net thread.
Regarding Async compute, a couple of points on this. FIrst, though we are the first D3D12 title, I wouldn’t hold us up as the prime example of this feature.
There are probably better demonstrations of it. This is a pretty complex topic and to fully understand it will require significant understanding of the particular GPU in question that only an IHV can provide. I certainly wouldn’t hold Ashes up as the premier example of this feature.
We actually just chatted with Nvidia about Async Compute, indeed the driver hasn’t fully implemented it yet, but it appeared like it was. We are working closely with them as they fully implement Async Compute. We’ll keep everyone posted as we learn more.
As we have already detailed in an in-depth editorial two days ago, Nvidia GTX 900 series GPUs do have the hardware capability to support asynchronous shading/computing. A question arises however if that can only be achieved through the heavy use of pre-emption and context switch, which in turn adds substantial latency and defeats the purpose of the feature which is to reduce latency / improve performance. AMD claims that this is indeed the case. Nvidia however has not yet provided us with an answer to this question when asked although they have promised to do so and as soon as that happens we will make sure to bring you an update.
In the meantime the Oxide Games developer has stated that Nvidia is actively working with them to implement support for the feature.
As Async Compute has not been yet been implemented fully in the latest DirectX 12 ready drivers from Nvidia. We will make sure to revisit the benchmark again once some type of support for Async Compute is achieved on GTX 900 series GPUs to see what affect it may have on performance. Although we should point out that older GeForce generations, prior to 900 series, do not have support for asynchronous shading, so the feature should have no bearing on the performance of those cards.
DirectX 12 Asynchronous Compute : What It Is And Why It’s Beneficial
AMD has clearly been a far more vocal proponent of Async Compute than its rival. The company put this hardware feature under the limelight for the first time last year and attention has been directed towards it more so this year as the imminent launch of the DirectX 12 API was looming ever closer. Prior to that the technology remained, for the most part, out of sight.
Asynchronous Shaders/Compute or what’s otherwise known as Asynchronous Shading is one of the more exciting hardware features that DirectX12 and Vulkan – as well as Mantle before them – exposed. This feature allows tasks to be submitted and processed by shader units inside GPUs ( what Nvidia calls CUDA cores and AMD dubs Stream Processors ) simultaneously and asynchronously in a multi-threaded fashion. In layman’s terms it’s similar to CPU multi-threading, what intel dubs hyperthreading. It works to fill the gaps in the machine by making sure that as much of the hardware resources inside the chip as possible are fully utilized to drive performance up and that nothing is left idling.
One would’ve thought that with multiple thousands of shader units inside modern GPUs that proper multi-threading support would have already existed in DX11. In fact one would argue that comprehensive multi-threading is crucial to maximize performance and minimize latency. But the truth is that DX11 only supports basic multi-threading methods that can’t fully take advantage of the thousands of shader units inside modern GPUs. This meant that GPUs could never reach their full potential as many of their resources would be left untapped.
Multithreaded graphics in DX11 does not allow for multiple tasks to be scheduled simultaneously without adding considerable complexity to the design. This meant that a great number of GPU resources would spend their time idling with no task to process because the command stream simply can’t keep up. This in turn meant that GPUs could never be fully utilized, leaving a deep well of untapped performance and potential that programmers could not reach.
Other complementary technologies attempted to improve the situation by enabling prioritization of important tasks over others. Graphics pre-emption allowed for prioritizing tasks but just like multi-threaded graphics in DX11 it did not solve the fundamental problem. As it could not enable multiple tasks to be handled and submitted simultaneously independently of one another. A crude analogy would be that what graphics pre-emption does is merely add a traffic light to the road rather than add an additional lane.
Out of this problem a solution was born, one that’s very effective and readily available to programmers with DX12, Vulkan and Mantle. It’s called Asynchronous Shaders and just as we’ve explained above it enables a genuine multi-threaded approach to graphics. It allows for tasks to be simultaneously processed independently of one another. So that each one of the multiple thousands of shader units inside a modern GPU can be put to as much use as possible to drive performance.
However to enable this feature the GPU must be built from the ground up to support it. In AMD’s Graphics Core Next based GPUs this feature is enabled through the Asynchronous Compute Engines integrated into each GPU. These are structures which are built inside the chip itself and they serve as the multi-lane highway by which tasks are delivered to the stream processors.
Each ACE is capable of handling eight queues and every GCN 1.2 based GPU has a minimum of eight ACEs, for a total of 64 queues. ACEs debuted with AMD’s first GCN (GCN 1.0 ) based GPU code named Tahiti in late 2011 which had two Asynchronous Compute Engines. They were originally added to GPUs mainly to handle compute tasks because they could not be leveraged with graphics APIs of the time. Today however ACEs can take on a more prominent role in gaming through modern APIs such as DirectX 12, Vulkan and Mantle.
Earlier this year AMD debuted a demo for this hardware feature that showcased a performance improvement of 46%. So far however, Nvidia has not talked much if at all about the feature neither did it showcase a beneficial use case scenario for it in its hardware like its rival. Which is understandably where most of the questions and the controversy is stemming from. The company was not shy however from promoting other DX12 hardware features like conservative rasterization and raster ordered views which is where we believe the DX12 focus for the company has been directed.
Speaking of GPUs in general, while modern GPU architectures of the day like GCN, which powers all of the new gaming consoles and AMD’s roster of graphics cards, or Maxwell, which powers Nvidia’s latest Tegra mobile processors and its roster of graphics cards, have grown to accumulate far more similarities than differences, different hardware will always inherent different architectural traits. There will always be one thing that a specific architecture does better than another. This diversity is dictated by the needs of the market and the diversity of the minds through which the technology was conceived. The semantics will always be there, and while it can be fun to discuss them, looking at the whole picture will yield the only substantial progress.