The History of CPU off-loading | BASeCamp Programming Blog

Today, on most computer configurations, the CPU is basically the “main” processor, and various add-on cards off-load some processing from the CPU. Let’s look at a history of these particular devices, and how they perform their CPU off-loading, and why.

Video

Originally, a Video card was basically just something that the CPU sent data too; the CPU, or software, had to deal with drawing things like rectangles, circles, ellipses, and all that guff. This was fine early on. The only thing you could truly tell a Video card to do was to set a specific pixel to a given colour. Most graphics libraries provided with Programming languages, or separately, handled this stuff- they did all the math and geometry and other stuff to rasterize lines and other primitives based on being able to set a single pixel.

However, as Graphics became more prevalent, And particularly with the release of Windows 3.0, this simply wasn’t enough oomph. What happened then was Video card manufacturers created “Video Accelerator Boards”. What these cards did in tandem with their software drivers was basically offload the tasks of drawing some graphics primitives to be done by the on-board processor on that accelerator card. This accelerated the drawing of things like circles, rectangles, and so forth. This was faster, even though the Graphics Accelerator chips were typically a far lower clock speed than CPUs at the time. The primary advantage here was that the processing was localized; all the CPU had to send the video card was the instructions to draw- for example, instead of having to send the data to draw several hundred pixels, the software only had to say “hey, draw a circle here, with this radius and this aspect ratio” and the Processor on the Video card did the heavy lifting. This was a HUGE advantage at the time, because in those days everything used the ISA bus, which was quite slow- it was limited to transferring iirc something like around 10 megabits in a second, which wasn’t nearly enough for graphics intensive programs like windows or AutoCAD.

Around Windows 95, Video accelerators were standard with PCs. The standard Bus was standardizing around PCI, which also allowed a lot more data transfer- this meant Video accelerators were made even faster, so that you could send more of those instructions to the card in a given time frame to take advantage of the higher bandwidth of the bus. However, on the horizon was 3-D graphics. 3-D games were done long ago- back to before Windows itself, and many games performed admirably (Doom, Duke Nukem 3d, etc) under those conditions. However, This required direct addressing of the graphics cards registers and direct control of the hardware, which under the Windows environment was not allowed- instead everything was done through the Device Driver. Even with Video accelerators helping with primitives, drawing and rasterizing 3-D objects was very software intensive, which limited 3-D games to having very low poly counts, very small textures, and/or a very low resolution.

Thus, the concept of 3-D Accelerator cards was devised. A 3-D accelerator card worked in a similar idea to a 2-D accelerator card, but instead of being told “hey, draw a circle” it was told about the polygons, the view matrix, the projection matrix, and other information about the scene it was to create, and then did the work of rasterizing that scene. This still had the problem that there wasn’t really a universal way to address these new capabilities. Some early 3-D vendors created their own APIs- for example, the Voodoo cards used an API they called “Glide”. If this continued, with each manufacturer having their own special API, games would have to support all sorts of different APIs to work on different 3-D Accelerators. Thankfully, unification prevailed- in this case, the two unifying factors were OpenGL and Direct3D. OpenGL was first, and basically defined a Library interface that a driver vendor could implement. The DLL provided by the manufacturer would work with the hardware driver directly, since it knew about it (and usually the OpenGL driver was provided and installed in the same package). The Game or program would just treat it like any other OpenGL implementation. DirectX, and in particular, Direct3D, addressed this similarly, by providing a way to address various bits of hardware directly, the aim was to allow Windows to be a viable platform for game development, which at the time most gamers were on the fence about. In those days, a “PC gamer” would run DOS. Direct3D pretty much works in a similar way to OpenGL, but is dealt with using a object-based COM object model. the Manufacturer provided a set of DLLs implementing proper interfaces and Direct3D used them to talk directly to the driver to use it’s acceleration features. It was weak in early versions (and it had another name which I cannot remember, something like Windows Game Library), but at this point, they are both pretty much functionally equivalent. One nice advantage of Direct3D, particularly early on, was the fact that, unlike OpenGL, you did not require hardware acceleration at all. Direct3D came with a software-based implementation of everything. this was naturally slower, but at least it let you play the game. (And since most developer machines of the time didn’t have accelerator cards, this made it a lot easier to test games as well). Another advantage was the software rasterizer allowed for any effect in Direct3D to be used. As an example, some early cards didn’t support texture mapping, or things such as Transform and Lighting. With OpenGL, you were pretty much fucked if you needed those. With Direct3D, the software implementation handled those features.

Sound

Originally, Sound cards worked in a similar way to Video cards- that is, they just sorta got used for the software to pump some digitized sound to play. The only early thing that could be called a “acceleration” feature was the use of MIDI synthesis, which in many ways off-loaded the task of digitizing sound to the driver or card. Some Sound cards that have this feature include the Sound Blaster AWE32, which let you load some installable SIMM memory with high-quality digital samples of instruments for use with said MIDI playback.

Off-loading the more traditional usage- playing digital samples- didn’t really come to the fore until the release of DirectSound and DirectMusic. Manufacturers of course wanted their products to say that they were “accelerated” to add bullet points on their product box, so they rushed to do so. At the same time, however, some of this acceleration isn’t actually off-loading, but merely being done at the driver level. a Good example of a card that fakes this is the Sound Blaster Audigy SE, the Sound Blaster Live! Value, and the X-Fi extreme Audio, all of which lack the titular Audigy, Live!, and X-Fi DSP processor for their names. Instead, their “acceleration” is handled by the driver, but instead of delegating it to the chip on the card, it just performs the task using the CPU. For example, with an Audigy SE, a program will say that the system has hardware accelerated EAX and DSP processing and all that, but it’s actually being done on the Processor. An additional note is that most Integrated Sound solutions actually do the same thing.

The actual effect this has on your games and programs really depends. When I replaced my generic Sound Blaster PCI card with a Audigy SE, I expected a nice boost in performance, but instead, most of my games became downright unplayable! This was because my processor was only 350Mhz, and the software-based processing being used the the driver- which usually had negligible impact on a good system- pretty much monopolized the processor.

Overall, the off-loading of CPU time to dedicated chips installed on add-on cards was done almost by necessity, but it continues to be done because the gains are massive, and really there is no reason not to; sure, we can always have faster CPUs, but why not have special chips to deal with some parts of the processing task? This is really helpful, particularly in the emerging domain of parallel processing which is the road we are inevitably heading down, as the actual speeds of a single core start to reach an asymptote.

Have something to say about this post? Comment!