Mantle to bring PS4’s Asynchronous ‘fine-grain’ Compute to PC

At AMD’s recent APU13 developer conference in San Jose, California, CEO of Nixxes Jurjen Katsman discussed the different ways in which Mantle can help improve GPU performance, among which is the scheduling of compute work in parallel with graphics. He brought up the flagship Radeon R9 290X and discussed how a bulk of its compute muscle isn’t utilized by existing APIs, which only allow the queuing of synchronous workloads. With Mantle’s optimizations, the idle portions of the GPU don’t go to waste, as both graphics and compute work can be handled asynchronously (thanks Techreport).

With Mantle . . . we can schedule compute work in parallel with the normal graphics work. That allows for some really interesting optimizations that will really help your overall frame rate and how . . . with less power, you can achieve higher frame rates.

What we’d see, for example—say we’re rendering shadow maps. There’s really not much compute going on. . . . Compute units are basically sitting there being idle. If, at the same time, we are able to do post-processing effects—say maybe even the post-processing from a previous frame, or what we could do in Tomb Raider, [where] we have TressFX hair simulations, which can be quite expensive—we can do that in parallel, in compute, with these other graphics tasks, and effectively, they can become close to zero cost.

Playstation 4 chief architect Mark Cerny expressed a similar vision over the potential of GPU compute during past interviews leading up to the console’s launch. He highlighted asynchronous fine-grain compute as a key feature that will give developers the headroom to explore the platform for years to come. Cerny explained that there are many situations during game applications where functions pertaining to graphics take a backseat, and that the PS4 hardware has been designed to make the most of the under-utilized portions of the GPU during such frames. Like Katsman, he also cited the rendering of shadow maps as an example of such frame instances in his Q&A session with both Gamasutra and Digital Foundry.

Mark Cenry
Mark Cenry

“If you look at the portion of the GPU available to compute throughout the frame, it varies dramatically from instant to instant. For example, something like opaque shadow map rendering doesn’t even use a pixel shader, it’s entirely done by vertex shaders and the rasterization hardware — so graphics aren’t using most of the 1.8 teraflops of ALU available in the CUs. Times like that during the game frame are an opportunity to say, ‘Okay, all that compute you wanted to do, turn it up to 11 now.”

“If you look at how the GPU and its various sub-components are utilised throughout the frame, there are many portions throughout the frame – for example during the rendering of opaque shadowmaps – that the bulk of the GPU is unused. And so if you’re doing compute for collision detection, physics or ray-casting for audio during those times you’re not really affecting the graphics. You’re utilising portions of the GPU that at that instant are otherwise under-utilised. And if you look through the frame you can see that depending on what phase it is, what portion is really available to use for compute.”

Both Katsman and Cerny are essentially talking about the same process of performing non-graphics related tasks in parallel by utilizing wasted compute resources in the GPU.

Last month, we reported that AMD’s Hawaii GPU (R9 290/290X) features an expanded Asynchronous Compute Engine (ACE) pipeline for the purpose of scheduling and parallel distribution of compute work, which is equivalent in depth to the ACE pipeline found in the PS4. With Mantle, AMD is looking to reap the same long-term benefits from its own GPUs. This could theoretically yield common optimizations for the development of multiplatform games across the PS4 and Mantle.

Whether this holds any meaningful advantage to either AMD or Sony is anyone’s guess. Let us know what you think in the comments below.