Programming Memoirs

3D (depth) composition of CUDA ray traced images with OpenGL rasterized images using CUDA Driver API

Depth composition of CUDA ray traced image with OpenGL rasterised object transformation gizmos

Ray tracing is a great method of generating synthetic images. It has many benefits over traditionally used (e.g. in computer game) rasterization. Ray tracing stays great up to the moment when you need to render e.g. a line segment placed in your 3D space (which is potential occluded by other 3D objects).

Why would you want to ray trace a line or a line segment? Say you want to create some kind of transformation gizmo for you 3D objects which blends nicely into the scene, or a bounding box depicting the boundaries of an object, or include wire frame of meshes in your ray traced scene, or …  There are many potential uses.

You cannot just mathematically test for camera ray vs. line segment collision and expect the line segment to appear on the ray traced rendering. Chance of a camera ray colliding a line segment are too slim for the line segment to be visible.

You can try simulating a line by drawing a thin cylinder, or an ‘x’ made of 2 quads. But this is far from an elegant solution. Additionally, your ‘line’ stops being a mathematical line, as it suddenly has a width. Thus, it becomes thicker closer to the camera, and thinner further away from it (considering you use perspective projection).

Fortunately there is a solution for drawing lines in ray traced content. The solution involves using some some kind of rasterization based renderer, such as OpenGL to draw the lines (or any objects) separately from the ray tracing pass and then performing a 3D composition of the two images.  The problem becomes even more interesting if you obtain the ray traced image using CUDA and CUDA Driver API.

CUDA and OpenGL interoperability using CUDA Driver API

Before we compose CUDA ray traced content with OpenGL rasterised content we need to obtain the former. If you’re a CUDA programmer you know that CUDA does not posses any native graphics output and You have to manually take care of displaying the generated data. One of the solutions for this is to write pixel data into a OpenGL Pixel Buffer Object (PBO), copying the content of PBO to an OpenGL texture, and rendering a quad textured using this texture into the viewport. Fortunately for us CUDA and OpenGL interoperability is quite robust and this can be done quite easily, even when using lower-level CUDA Driver API.

// general initialisation

GLuint pbo; // Pixel buffer object
GLuint tex; // Texture
unsigned int width = 512; // horizontal render resolution
unsigned int height = 512; // vertical render resolution

// Init PBO
glGenBuffersARB(1, &pbo);
glBufferDataARB(GL_PIXEL_UNPACK_BUFFER_ARB, width*height*sizeof(GLubyte)*4, 0, GL_STREAM_DRAW_ARB);

// Init Texture
glGenTextures(1, &tex);
glBindTexture(GL_TEXTURE_2D, tex);
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA8, width, height, 0, GL_RGBA, GL_UNSIGNED_BYTE, NULL);
glBindTexture(GL_TEXTURE_2D, 0);

// Allocate CUDA graphic Resource (Notice that I'm using Driver API function calls)
CUgraphicsResource * cuda_pbo_resource = (CUgraphicsResource*) malloc(sizeof(CUgraphicsResource));
cuGraphicsGLRegisterBuffer(cuda_pbo_resource, // returns pointer to the CUDA resource
 pbo, // we pass here the OpenGL resource

Rendering pass #1 — Ray tracing in CUDA

We render our scene (only solid objects, i.e. no lines etc.) using ray tracing. We store the raw pixel data in a PBO mapped as CUDA resource.
Note that PBO resource has to be unmapped from CUDA for OpenGL to be able to reliably access it. Thus we map the PBO resource to CUDA only for the time of ray tracing.

// map PBO resource
cuGraphicsMapResources(1, cuda_pbo_resource, 0); //mapping is only temporary

// get pointer in CUDA memory space for to the resource
CUdeviceptr d_pbo; // CUDA pointer through which the mapped graphics resources may be accessed
size_t num_bytes; //size of memory which may be accessed from that pointer
cuGraphicsResourceGetMappedPointer(&d_pbo, &num_bytes, *cuda_pbo_resource); // the d_pbo will be passed to rendering kernel

// Start ray tracing kernel
start_CUDA_ray_tracing_kernel(d_pbo, num_bytes); // render to PBO. Pass the pointer to the memory space where the data is to be saved

// unmap PBO resource
cuGraphicsUnmapResources(1, cuda_pbo_resource, 0);

After the PBO is filled with data by CUDA kernel we copy the content of the PBO to an OpenGL texture. On current hardware this step takes less than a milisecond for a 24 bits per pixel with a 512×512 resolution, thus it’s quite fast. Alternatively. You could attempt to render from CUDA directly to OpenGL texture memory space.

// copy from PBO to texture
glPixelStorei(GL_UNPACK_ALIGNMENT, 1);
glBindTexture(GL_TEXTURE_2D, tex);  // bind our texture!
glTexSubImage2D(GL_TEXTURE_2D, 0, 0, 0, width, height, GL_RGBA, GL_UNSIGNED_BYTE, 0); // copy data
glBindBufferARB(GL_PIXEL_UNPACK_BUFFER_ARB, 0); // unbind PBO

Displaying CUDA rendered content with OpenGL

We can now render the texture onto a quad which is filling the viewport (using orthogonal projection).

// select matrix mode
glMatrixMode(GL_PROJECTION); // we select to  work with projection matrix stack
glLoadIdentity(); // make sure the there is nothing on the stack

glOrtho(0,1, 0,1,
0,             // zNear
999.0f);    // zFar

// clear color and depth buffer
glDisable(GL_DEPTH_TEST); // we dont need it right now

// draw binded textured to a quad filling the viewport
	glTexCoord2f(0, 0); glVertex2f(0, 0);
	glTexCoord2f(1, 0); glVertex2f(1, 0);
	glTexCoord2f(1, 1); glVertex2f(1, 1);
	glTexCoord2f(0, 1); glVertex2f(0, 1);

glBindTexture(GL_TEXTURE_2D, 0);  // unbind texture

Rendering pass #2 — Rasterisation with OpenGL

We render the whole 3D scene (solid object only) once more, this time using OpenGL. You would probably want to switch the view mode from orthogonal to perspective projection with the same parameters as you used in your ray tracer (i.e. camera position, FOV, camera ‘look at’ point). I recommend setting the perspective manually using glFrustum() as it allows to set the camera parameters in a similar manner as you have probably used them in the CUDA kernel for ray tracing.

Note, that now we want to render the scene only to the OpenGL depth buffer — set you color buffer channel’s masks to false, and have depth testing enabled. Also, we will not need any OpenGL lights, textures, fog etc. — have them disabled.  As we are rasterizing only depth information this additional rendering pass is quite fast even for complex scenes.

// set perspective to the same as in the ray tracing pass
GLdouble fl_left = camera->plane_height;    // width of left
GLdouble fl_right = -camera->plane_height; // width of right (note the MINUS)
GLdouble fl_top = camera->plane_width;	// width of top
GLdouble fl_bottom = -camera->plane_width; // width of bottom (note the MINUS)
GLdouble zNear = camera->plane_distance;  // near clipping distance
GLdouble zFar = camera->plane_distance + 100; // far clipping distance

glFrustum(fl_left, fl_right, fl_bottom, fl_top, zNear, zFar);

 // camera position
 // look_at is stored as a normal vector, thus we add it to the camera position to determine a point in space where to look
 camera->origin.x + camera->look_at.x,
 camera->origin.y + camera->look_at.y,
 camera->origin.z + camera->look_at.z,
 // camera up direction

// render only to depth buffer
glColorMask(GL_FALSE,GL_FALSE,GL_FALSE,GL_FALSE); // we dont want to render to color buffer
glEnable(GL_DEPTH_TEST); // it is important to be enabled now

// Render the meshes in the scene to depth buffer only (do not render lines yet)

Now, we want to enable color mask, and while having the depth testing still on, we rasterize remaining objects (lines, points etc.) to the color and depth buffers of OpenGL. However we can choose to also rasterize any additional solid objects at this point, basically any OpenGL scene!

glColorMask(GL_TRUE, GL_TRUE, GL_TRUE, GL_TRUE); // we now want to render colors

As the depth buffer already contains scene depth information, rasterized object and ray traced objects will seem to occlude each other in a manner which you would expect. :-) You can also achieve a partial occlution by CUDA object (transparency effect), but first rasterizing the misc. OpenGL objects partly transparent with GL_DEPTH_TEST disabled, and later rendering them fully opaque with depth testing enabled.

Oh, don’t forget to swap buffers if you were using double buffering!


The major limitation of this method is the obvious fact that the lines (or any misc OpenGL rendered content) present behind translucent object (which in ray tracing refract light) won’t appear refracted. Same goes for reflections. Fortunately, for transformation gismos, bounding boxes etc. this is not really a problem. It might be even desirable.

Additional resources

You might also want to check this resource for more on depth composition of images in OpenGL.

More on PBOs can be found here and here.

Triers CUDA ray tracing tutorial — a great tutorial with source code available.

Hope this helps.

4 ResponsesLeave one →

Leave a Reply