Early stories of shading on graphics hardware

I know you can’t take everything in a PR posting at face value, but the phrase “our invention of programmable shading” in NVIDIA’s announcement of their patent suits against Samsung and Qualcomm definitely rubbed me the wrong way. Maybe it’s something about personally having more of a claim to having invented programmable shading, at least on graphics hardware, than NVIDIA. Since many of the accounts (I’m looking at you Wikipedia) of the background of programmable shading seem to have been written by people who don’t even remember a time before GPUs, this seems like a good excuse for some historical recollections.

In the beginning…

The seeds of programmable shading were planted in 1981 by Turner Whitted and David Weimer. They didn’t have a shading language, but did create the first deferred shading renderer by splitting a scan line renderer into two parts. The first part rasterized all the parameters you’d need for shading (we’d call it a G-buffer now), and the second part could compute the shading from that G-buffer. The revolutionary idea was that you could make changes and re-shade the image without needing to redo the (then expensive) rasterization. Admittedly, no shading language, so you’d better be comfortable writing your shading code in C.

The real invention of the shading language

In 1984 (yes, 30 years ago), Rob Cook published a system called “Shade Trees”, that let you write shading expressions that it parsed. I’ve seen some mis-interpretation (maybe because of the name?) that this was a graphical node/network interface for creating shaders. It wasn’t. That was Abram and Whitted’s Building Block Shaders in 1990. Shade Trees was more like writing a single expression a C-like language, without loops or branches. It also introduced the shader types of surface, light, atmosphere, etc. still present in RenderMan today.

In 1985, Ken Perlin’s Image Synthesizer expanded this into a full language, with functions, loops and branching. This is the same paper that introduced his noise function — talk about packing a lot into one paper!

Over the next few years, Pixar built a shading language based on these ideas into RenderMan. This was published in The RenderMan Companion by Steve Upstill in 1989, with more technical detail in Hanrahan and Lawson’s 1990 SIGGRAPH paper.

Shading comes to graphics hardware

In 1990, I started as a new grad student at the University of North Carolina. I was working on the Pixel-Planes 5 project, which, among other things, featured a 512×512 processor-per-pixel SIMD array. It only had 208 bits per pixel, but had a 1-bit ALU, so you could make your data any size you wanted, not just multiples of bytes (13 bit normals? No problem!). This was, of course, important to give you any chance of having everything (data and computation) fit into just 26 bytes. I was writing shading code for it inside the driver/graphics library in something that basically looked like assembly language.

By 1992, a group of others at UNC created an assembly language interface that could be directly programmed without having to change the guts of the graphics library. This is really the first example of end-user programmable shading in graphics hardware. Unfortunately, the limitations of the underlying system made it really, really hard to do anything complex, so it didn’t end up being used outside the Pixel-Planes team.

Meanwhile, we were making plans for the next machine. This ended up being PixelFlow, largely an implementation of ideas Steve Molnar had just finished in his dissertation, with shading accommodations for what I was planning for my dissertation. I had this crazy idea that you ought to be able to compile high-level shading code for graphics hardware, and if you abstracted away enough of the hardware details and relied on the compiler to manage the mapping between shading code (what I want to do) and implementation (how to make it happen), that you’d get something an actual non-hardware person would be able to use.

It took a while, and a bunch of people to make it work, but the result of actual high-level programmable shading on actual graphics hardware was published in 1998. I followed the RenderMan model of surface/light shaders, rather than the vertex/pixel division that became popular in later hardware. I still think the “what I want to do” choice is better than the “how I think you should to do it”, though the latter does have advantages when you are working near the limits of what the hardware can do.

The SIGGRAPH paper described just the language and the surface and light shaders, along with a few of the implementation/translation problems I had to solve to make it work. There’s more in my dissertation itself, including shading stages for transformation and for primitives. The latter was combination of what you can do in geometry shaders with a shading interface for rasterization (much like pixel shader/pixel discard approaches for billboards or some non-linear distortion correction rendering).

Note that both Pixel-Planes 5 and PixelFlow were SIMD engines, processing a bunch of pixels at once, so they actually had quite a bit in common with that aspect of current GPUs. Much more so than some of the intermediate steps that the actual GPUs went through before they got to the same place.

OK, but how about on commercial hardware?

PixelFlow was developed in partnership with HP, and they did demo it as a product at SIGGRAPH, but it was cancelled before any shipped. After I graduated in 1998, I went to SGI to work on adding shading to a commercial product. At first, we were working on a true RenderMan implementation on a new hardware product (that never shipped). That hardware would have done just one or two operations per pass, and relied on streaming and prefetching of data per pixel to hide the framebuffer accesses.

After it was cancelled, we switched to something that would use a similar approach on existing hardware. The language ended up being being very assembly-like, with one operation per statement, but we did actually ship that and had at least a few external customers who used it. Both shading systems were described in a 2000 SIGGRAPH paper.

In 2000, I co-taught a class at Stanford with Bill Mark. That helped spur their work in hardware shading compilers. Someone who was there would have to say whether they were doing anything before that class, though I would not be surprised if they were already looking at it, given Pat Hanrahan’s involvement in the original RenderMan. In any case in 2001 they published a paper about their RTSL language and compiler, which could compile a high level shading language to the assembly-language vertex shaders NVIDIA had introduced, to the NVIDIA register combiner insanity, and to multiple rendering passes in the way the SGI stuff worked, if necessary.

RTSL was also the origin of the Vertex/Pixel division that still exists today.

And on GPUs

And now we leave the personal reminiscing part of this post. I did organize a series of SIGGRAPH courses on programmable shading in graphics hardware from 2000 through 2006, but since I wasn’t actually at NVIDIA, ATI, 3DLabs or Microsoft, I don’t know the details of when some of these efforts started or what else might have been going on behind the scenes.

Around 1999, NVIDIA’s register combiners were probably the first step from fixed-function to true programmability. Each of the two (later eight) combiner stages could do two vector multiples and add or dot product the result. You just had to make separate API calls to set each of the four inputs, each of the three outputs, and the function. In the SGI stuff, we were using blend, color transform and multi-texture operations as ALU ops for the compiler to target. The Stanford RTSL compiler could do the same type of compilation with the register combiners as well. Without something like RTSL, it was pretty ugly, and definitely wins the award for most lines of code per ALU operation.

Better, was the assembler vertex programs in the GeForce3, around 2000. It didn’t allow branching or looping, but executed in a deep instruction-per-stage pipeline. Among other things, that meant that as long as your program fit, doing one instruction was exactly the same cost as doing the maximum your hardware supported.

Assembly-level fragment programs came in around 2001-2002, and shared originally shared that same characteristic — anywhere from 1 to 1024 instructions with no branching at the same performance.

Around 2002, there was an explosion of high-level shading languages targeting GPUs, including NVIDIA’s Cg, the closely related DirectX HLSL, and the OpenGL Shading Language.

This list of dates is pretty NVIDIA centric, and they were definitely pushing the feature envelope on many of these elements. On the other hand, most of these also were connected to DirectX versions requiring similar features, so soon everyone had some kind of programmability. NVIDIA’s Cg tutorial put’s the first generation of programmable GPUs as appearing around 2001. ATI and 3DLabs also started to introduce programmable shading in a similar time period (some of which was represented in my 2002 SIGGRAPH course).

As a particular example of multiple companies all working toward similar goals, NVIDIA’s work, especially on Cg, had a huge influence on DirectX. Meanwhile, 3DLabs was introducing their own programmable hardware that I believe was a bit more flexible, and they had a big influence on the OpenGL Shading Language. As a result, though they were very similar in many ways, especially in the early versions there was a significant difference in philosophy between exposing hardware limitations in Direct3D vs. generality (even when slow on a particular GPU) in OpenGL. In hindsight, though generality makes sense now, on that original generation of GPUs, it lead too often to unexpected performance cliffs, which certainly hurt OpenGL’s reputation among game developers.


Gregory D. Abram and Turner Whitted. 1990. Building block shaders. In Proceedings of the 17th annual conference on Computer graphics and interactive techniques (SIGGRAPH ’90). ACM, New York, NY, USA, 283-288. DOI=10.1145/97879.97910

Robert L. Cook. 1984. Shade trees. In Proceedings of the 11th annual conference on Computer graphics and interactive techniques (SIGGRAPH ’84), ACM, New York, NY, USA, 223-231. DOI=10.1145/800031.808602

Pat Hanrahan and Jim Lawson. 1990. A language for shading and lighting calculations. In Proceedings of the 17th annual conference on Computer graphics and interactive techniques (SIGGRAPH ’90). ACM, New York, NY, USA, 289-298. DOI=10.1145/97879.97911 

Steven Molnar, John Eyles, and John Poulton. 1992. PixelFlow: high-speed rendering using image composition. In Proceedings of the 19th annual conference on Computer graphics and interactive techniques (SIGGRAPH ’92), James J. Thomas (Ed.). ACM, New York, NY, USA, 231-240. DOI=10.1145/133994.134067

Marc Olano and Anselmo Lastra. 1998. A shading language on graphics hardware: the pixelflow shading system. In Proceedings of the 25th annual conference on Computer graphics and interactive techniques (SIGGRAPH ’98). ACM, New York, NY, USA, 159-168. DOI=10.1145/280814.280857

Mark S. Peercy, Marc Olano, John Airey, and P. Jeffrey Ungar. 2000. Interactive multi-pass programmable shading. In Proceedings of the 27th annual conference on Computer graphics and interactive techniques (SIGGRAPH ’00). ACM Press/Addison-Wesley Publishing Co., New York, NY, USA, 425-432. DOI=10.1145/344779.344976

Ken Perlin. 1985. An image synthesizer. In Proceedings of the 12th annual conference on Computer graphics and interactive techniques (SIGGRAPH ’85). ACM, New York, NY, USA, 287-296. DOI=10.1145/325334.325247

Kekoa Proudfoot, William R. Mark, Svetoslav Tzvetkov, and Pat Hanrahan. 2001. A real-time procedural shading system for programmable graphics hardware. In Proceedings of the 28th annual conference on Computer graphics and interactive techniques (SIGGRAPH ’01). ACM, New York, NY, USA, 159-170. DOI=10.1145/383259.383275

John Rhoades, Greg Turk, Andrew Bell, Andrei State, Ulrich Neumann, and Amitabh Varshney. 1992. Real-time procedural textures. In Proceedings of the 1992 symposium on Interactive 3D graphics (I3D ’92). ACM, New York, NY, USA, 95-100. DOI=10.1145/147156.147171 

Steve Upstill. 1989. Renderman Companion: A Programmer’s Guide to Realistic Computer Graphics. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA.

Turner Whitted and David M. Weimer. 1981. A software test-bed for the development of 3-D raster graphics systems. In Proceedings of the 8th annual conference on Computer graphics and interactive techniques (SIGGRAPH ’81). ACM, New York, NY, USA, 271-277. DOI=10.1145/800224.806815