• Hey, guest user. Hope you're enjoying NeoGAF! Have you considered registering for an account? Come join us and add your take to the daily discourse.

What your mother never told you about DXR (or RTX)


Seeing that RDNA2 ray-tracing implementation is a hot topic and all the strange ideas flying in other threads I've decided to open up some DXR documentation and code to see what is this all about.

TL;DR RTX or DXR is not ray-tracing, it should have been called Software-Defined Rendering and NV 3xxx cards and any future hardware won't improve its performance much.

I will refer to NV RTX and DXR implementation using one term: DXR from now on.
To understand what DXR brings to the table we need to understand rasterization.

How does the rasterization work?
You have your 3D scene in your game engine, it's a bunch of 3D objects organized in something that's called scene graph. Scene graph makes it easy to get the objects you want to render in each scene.
And more importantly scene graph makes it easy to find out what you absolutely do not want to render at all (stuff behind the camera, stuff underground, etc.).
In indoor games scene graphs are even more important (i.e. BSP graphs in first-person shooters) because you don't want to render parts of the level you do not "see" right now, although they are in front of the camera (all the stuff behind doors and walls).

So, after you have decided what to render you send all these objects to the GPU for rendering.
GPU takes these and starts all kinds of shaders that process your objects: vertex shaders, geometry shaders, tessellation/hull shaders and so on.
These shaders transform your objects into a 3D space in such a way that resembles how a real world camera would see them: far objects are smaller, perspective lines are not parallel, etc, etc.
After all the transformations are finished the rasterization kicks in.

It takes all the objects in a transformed space (usually called "camera space") and creates "fragments".
Fragment is defined like this: you divide the camera viewport rectangle into square chunks (usually exactly like the screen resolution, ex. 1920x1080) and shoot a ray from each of them, strait from the camera, when the ray touches the object the fragment is created.
Rasterization also uses something that's called a depth buffer or "z-buffer". You see when a ray touches some object we do not know if that object is the closest one to the camera, so we record the "length" of that ray for that fragment in a buffer.
If the next ray that touches some other object is shorter - it is closer to the camera and thus the older fragment is discarded and a new length is written down.

Obviously there are a lot of optimizations in place: we do not really shoot rays, we calculate them once (they are all exactly the same in the same direction). We calculate fragments from objects and not from camera. And so on.

The fragments that were created are passed to pixel shaders for post-processing and then written to the output buffer (render target).
In the outdoor example there are some places in the camera rectangle where no objects are found, so we render a skybox there: essentially just a picture of the sky (maybe with some "weather"-related objects like clouds).

How does DXR work?
You assemble your scene into BVH structure, that makes it easy to find intersections.
You define special shaders:
- ray generation shader, that generates rays into scene
- ray closest hit shader that executes when some ray touches an object
- ray miss shader that executes when ray did not meet any objects
- ray any-hit shader that decides if the ray should stop or continue (to implement transparancy)
- ray intersection shader that is essentially executed for each ray to create some virtual geometry (it's a pretty complex issue, will not dive into how BVH can be altered for virtual geometry here)
And then you start the pipeline by calling the ray generation shader program.
Do you see what they did here?
They essentially created a programmable shader for each step of the rasterization pipeline above!
Now you can shoot rays however you want (and not just one ray per pixel parallel to camera).
You can generate fragments however you want and not by computing z-buffer closest hit.
You can implement any sot of transparency by using any-hit shader (in rasterization pipeline you needed to draw the transparent things in a specific order or use per-pxiel lists).
You can create any hacky pipeline you want, and you can also do it for specific objects, for whole scene, for a specific effect, etc.

What's the catch then?
The catch is that it is gonna be slower than a tailored hw solution.
But it all depends on how well you will use it.
It's kind of like how the programmable sahders started in the PS360 days.
Fixed transformation and lighting (T&L, I hope people remember it) pipeline was faster, but the programmable one was able to surpass it eventually.

From now on it mostly depends on the software. Yep, hardware will become faster, but don't build on a faster RT support too much, most of the work will happen in shaders, how well these shaders are written, how good the future lighting algorithms will be - that's how things will get faster and better looking.
Top Bottom