Core Techniques and Algorithms in Game Programming2003

We have seen how a display list offers very good performance for static geometry. Storing data in the GPU saves bus time and makes rendering faster on those cards that accelerate display lists.

Vertex arrays (VAs) use a totally different approach to offer increased performance over immediate mode. Some types of VAs will be faster than display lists, others slower. Even worse, performance will vary depending on the graphics card, so display lists might outperform VAs in some hardware configurations but not be as fast as VAs in others. However, VAs have an added advantage over display lists: They are designed to work with both static and dynamic geometry, so using VAs is the best way to get good performance on dynamic meshes.

A VA, as the name implies, is an array full of vertex data. VAs can contain vertex coordinates, colors, texture coordinates, normals, and indices to indexed meshes. Data can be organized in a single array (interleaving the different types of data) or by using several arrays, one per data type. Additionally, most data types are indeed optional. You need to specify vertex coordinates, but other than that, you can choose whatever else you want to represent. For example, you can have two arrays (one for vertices, one for colors) or a single array interleaving vertices, texture coordinates, and colors.

VAs are allocated in user memory space and filled with data by the user. They can be freely modified, so they are both valid for static and dynamic data. Then, they can be sent to the GPU with a single OpenGL call. This is key to their efficiency. By sending the entire array at once, the graphics drivers can optimize this block transfer, thus reaching better performance than immediate mode rendering.

VAs were progressively introduced in OpenGL as extensions. Several flavors exist, and some of them are still supported only as vendor-specific extensions. Generally, there are three types of VAs: regular, compiled, and server side.

Regular VAs enable you to define arrays of geometric data in user memory. The data is then sent efficiently through the bus on a frame-by-frame basis. Performance comes from block-transferring all the data in a single (or very few) call(s).

Compiled VAs extend regular VAs by allowing, under certain circumstances, reuse of the arrays on the graphics card, much like cache memory. The VAs are sent once and can be rendered many times from local memory. This makes them faster than regular VAs, especially for static data.

Server-side VAs allow VAs to be defined not in user memory, but directly on the graphics card. This way the rendering does not need to access the bus at all, and performance reaches its maximum. Some specific functions must be defined in order to allocate this special type of memory, but once we have gained access to it, we can render primitives efficiently and still be able to dynamically change the vertex data as needed. Server-side VAs are still implemented with vendor-specific extensions to OpenGL (as of OpenGL 1.4). In the near future, they might get a broader implementation. But today they exist under two commercial names depending on the hardware's manufacturer: Vertex Array Range (VAR) on NVIDIA cards and Vertex Array Object (VAO) for ATI. Both share the same philosophy, but each one has specific advantages and disadvantages.

Using Regular Vertex Arrays

Regular VAs are easy to understand and code because they resemble the data structure you would normally use to store geometric data. As an example, let's examine the simplest VA: a noninterleaved, nonindexed VA of vertices, colors, and texture coordinates. Here is the code:

// data structure definition int numfaces=64; float *vtx=new float[numfaces*9]; float *texcoords=new float[numfaces*6]; unsigned char *colors=new float[numfaces*9]; // here you would fill the arrays as needed... type your own code // rendering code glEnableClientState(GL_VERTEX_ARRAY); glVertexPointer(3,GL_FLOAT,0,&( vtx[0])); glEnableClientState(GL_TEXTURE_COORD_ARRAY); glTexCoordPointer(2,GL_FLOAT,0,&( texcoords[0])); glEnableClientState(GL_COLOR_ARRAY); glColorPointer(3,GL_UNSIGNED_BYTE,0,&( colors[0])); glDrawArrays(GL_TRIANGLES,0,3*numfaces);

There are three key steps to using VAs. You first need to declare them. They are just user-defined arrays, so they can be filled using conventional C/C++ code. In the second step, once they are declared and filled, you must enable the required arrays and pass a pointer so OpenGL has access to the data. Here the calls always have the same syntax:

gl*Pointer(int num, GLenum type, int stride, void *data)

The first parameter indicates how many components each element consists of. For example, a color as RGB has three components, a texture coordinate has two (U and V), and so on. The second parameter is the type of each component. This can be any of the generic OpenGL types, with the following restrictions:

  • Vertices: GL_FLOAT, GL_DOUBLE

  • TexCoords: GL_FLOAT, GL_DOUBLE

  • Normals: GL_FLOAT, GL_DOUBLE

  • Color: GL_FLOAT, GL_DOUBLE, GL_UNSIGNED_BYTE

The third parameter (stride) is used to create interleaved arrays. It specifies the number of array positions between the end of an element and the beginning of the next one. In a noninterleaved array elements follow each other in a sequence, thus 0 is used. For interleaved arrays, positive integers specify the spacing between elements. Additionally, the fourth and last parameter passes a pointer to the array data. Notice that you can pass a pointer to the beginning of the array as in the previous example or set the beginning of the array at some other point by using:

glTexCoordPointer(2,GL_FLOAT,0,&( texcoords[34]));

The third step in rendering a VA is effectively drawing the data using the specified arrays. Here the syntax for glDrawArrays again needs an explanation. The first parameter tells OpenGL which rendering primitive to use. The second parameter specifies the starting point from the arrays (in elements). For the third parameter, you must specify the number of elements to render.

Notice that the final parameter generally specifies the number of vertices to render, and hence the multiplication in the preceding code, because each triangle has three vertices. A similar approach can be used to render GL_POINTS, GL_LINES, and so on. A very good option is to draw arrays of triangle strips because these are usually very tightly packed (the first triangle takes three vertices and each subsequent triangle only takes one extra vertex).

The combination of VAs and triangle strips yields optimal performance. But we can further improve it by organizing data in the arrays in a different way. When drawing geometry, whether it's plain triangles or triangle strips, we will sometimes repeat a vertex because two faces might be sharing some vertices. This is especially important in arrays of GL_TRIANGLES, but can also happen with strips. We are wasting precious bandwidth resending vertex coordinates (along with normals, colors, and so on, which will probably be repeated as well). To prevent this and ensure optimal geometry packing, we can add a new array (the index array), which allows us to send each vertex exactly once.

Compiled Vertex Arrays

A compiled VA is just a regular VA that we can sometimes cache on the server side for repeated calls. Static geometry that gets rendered frequently, such as interfaces, static objects, and so on, can greatly benefit from its performance. Unfortunately, compiled VAs are available only in some newer hardware, so they must be handled with care.

Server-Side Vertex Arrays

Along with display lists, server-side VAs offer top performance in most rendering pipelines. As mentioned earlier, display lists have the shortcoming of being limited to static geometry because their initialization process is slow. Server-side VAs overcome this limitation, and thus can be used for dynamic data as well. As usual, benefits don't come completely free. Some programming details of server-side arrays are a bit unintuitive and reaching top speed using them might take some time and careful programming. Because server-side VAs are completely vendor specific, I will explore both NVIDIA's and ATI's approach, and hint at a common interface that should be available when this book hits store shelves. For now, take a look at Figure B.6 for a diagram of the difference between regular and server-side VAs.

Figure B.6. Regular VAs (left) use the system bus per-frame, whereas server-side systems, such as VAR, VAO, and vertex buffer object (VBO), store data on the GPU side.

Vertex Array Range

VARs are offered on NVIDIA cards starting with the GeForce2. The technology is based on regular VAs, with the added benefit of being able to allocate the arrays both in video or in Accelerated Graphics Port (AGP) memory. Video memory is scarce because it is the memory available on the graphics card. Keep in mind that a graphics card must allocate, among other structures, the double-buffered frame buffer, the Z-buffer, texture maps, and so on. So, we can use available video memory to allocate a VA there. AGP memory, on the other hand, is just regular system memory, which has the special property of being easily accessed by the video card using the AGP bus. AGP memory is more plentiful and can offer similar performance to video memory, especially in systems with fast AGP buses.

To work with VARs, you basically request a block of memory in the GPU (either AGP or video memory). Then, you can create a different set of arrays in user memory space and fill those with data. Whenever you want to update the version on the GPU, all you have to do is block-copy data to the VAR declared in GPU space. It is very important not to access VAR memory in any way other than by doing a block-copy (memcpy), because VAR memory is slow to write.

Specifically, video memory is slow in all systems except those supporting Fast Writes. Only in these systems will write performance be good. But a good rule of thumb is to use video memory for static geometry that is used very frequently. This way you can maximize VAR speed and avoid the relatively poor video memory write performance. AGP memory, on the contrary, can be used for dynamic data. But remember that AGP memory is uncached, so random access is cost-prohibitive. Hence, the reason to work with regular, cached system memory and then block-copy data to AGP space is to write sequentially, taking full advantage of the write combiners.

Vertex Array Object

VAOs are ATI's version of server-side VAs. Like VARs, they allow programmers to declare a VA in video memory for improved performance. Unlike VARs, a VAO cannot reside in AGP memory.

Vertex Buffer Object

VBOs are the vendor-independent variant of server-side VAs. Supported by all major card vendors, they provide a common interface to declare buffers on video memory, so we can fill them with data, modify them, and render them efficiently. At the time of this writing, the specification has been approved and an implementation of the mechanism should surface soon. Quite likely, they will supersede both VARs and VAOs.

The main difference between VBOs and previous iterations is that VBOs allow developers to specify how are they going to use the buffer, so the API can make proper optimization choices. To declare a buffer, the following call must be used:

glBufferDataARB(target, size, *data, usage)

The first parameter must be ARRAY_BUFFER_ARB. The second is the size of the buffer in bytes. The third is the data to be mapped to the server-side buffer. The fourth parameter is where the usage is specified, with nine different usage patterns. Basically, the buffers can be characterized in three categories. They can be used for streaming (specify once, render once, as in a regular VA), for static geometry (specify once, render many times the same data, much like VARs), or for dynamic geometry (specify many times, render many times). Using a second criterion, buffers can be used to draw geometry, to read it back from video to CPU memory, or to copy it repeatedly onto GPU memory (for example for shader use). Then, by merging these two classifications, nine usage patterns emerge as shown in Table B.4.

Table B.4. Usage Patterns

 

Draw

Read

Copy

Stream

ARB_STREAM_DRAW

ARB_STREAM_READ

ARB_STREAM_COPY

Static

ARB_STATIC_DRAW

ARB_STATIC_READ

ARB_STATIC_COPY

Dynamic

ARB_DYNAMIC_DRAW

ARB_DYNAMIC_READ

ARB_DYNAMIC_COPY

Overall, the system provides very fine-grained control over geometry and somehow resembles similar mechanisms found in DirectX.

Категории