Core Techniques and Algorithms in Game Programming2003
Game code analysis is essential to ensure good performance. But analyzing code is like peeking thru a keyhole. You get very limited access to the interior, and it is hard to gather the relevant information. However, we need to find a way to shed some light on our code so we can visualize what goes on inside and, essentially, where our resources (CPU cycles, bus bandwidth, memory) are being spent. There are a number of techniques that have evolved through the years. They range from plain tricks to really involved solutions. However, all of them should give the applications programmer a clear picture of what is going on in each game loop, and what is taking away all the precious cycles. In this first section, we will explore a variety of analysis and profiling techniques in detail. We will learn how to trace memory, time critical code sections, and extract important data from our code. In the next section, we will study how specialized tools can help us carry out this same task. To begin with, let's specify the goals of our analysis. Fundamentally, we need to determine, for any given section of code:
By computing these values repeatedly through the code, we will be able to create full cost breakdowns: a complete picture of where clock cycles are spent and what is eating up all the RAM. Timing
Computing the time it takes to run specific routines is relatively straightforward. All operating systems have an internal real-time clock that can be queried for the current time. The syntax might differ from one system to another, but the overall philosophy is the same. In the Win32 programming environment, for example, we can use the call: long timeGetTime(); This call returns the amount of time, in milliseconds, elapsed since Windows booted. We have seen this call many times in the book already. You should also remember that by doing the following, we can compute the duration, in milliseconds, of any routine: long t1=timeGetTime(); // here goes the code we need to time long t2=timeGetTime(); long elapsed= t2-t1; This is the easiest way to analyze time costs and, incidentally, one of the best. I recommend that you first always try to use timers when debugging long game loops. It is a good idea to break code down into sections and use timeGetTime() to take time samples, thus getting a clear portrait of how much each component takes to execute. For example, you could do the following: long t1=timeGetTime(); recompute_logic(); long t2=timeGetTime(); compute_collisions(); long t3=timeGetTime(); render_walls(); long t4=timeGetTime(); render_AIs(); long t5=timeGetTime(); render_FX(); long t6=timeGetTime(); long timeFX=t6-t5; long timeAI=t5-t4; long timeWalls=t4-t3; long timeCollisions=t3-t2; long timeLogic=t2-t1; By printing the different values to the screen (and adding other timers if needed), you can have an onscreen performance monitor that measures the different core components. Even better, you can convert them to percentage representation to see how the different pieces hold together in a graphical bar chart. This data will become priceless to ensure smooth performance. It will become trivial to detect which sections of a game level slow the engine down, who is responsible, and so on. Then, whenever the team decides one section is too slow, it is all a matter of putting extra timers inside that section until you get an exact picture of who is responsible and how to fix it. Rigorous use of this policy has been very helpful in countless projects, because day-to-day coding frequently adds inefficient routines to games. Using the performance monitor (and saving the numbers daily in an Excel sheet, for example) is a good way to ensure that performance stays consistent through the end of the project. A potential issue with this method is how to take timings on really fast routines. Software timers have limited resolution, and sometimes, this will make faster routines seem like they are taking zero time to execute. Win32's timeGetTime, for example, has a resolution of one millisecond. Although that is reasonably accurate for most uses, sometimes a higher resolution is desirable. After all, a game running at a stable 50 fps takes 20 milliseconds to render one frame. With that ballpark figure, it is not surprising that some components take less than one millisecond to execute, and thus cannot be measured with a call to timeGetTime. The way to profile those components is to run them in batches, so we time not one execution but several at once. This way we can divide by the number of iterations afterward and get a better feel for the real time cost. This is really handy if you have to profile mathematical routines, simpler tests, and so on, which take less than one millisecond to execute. Here is an example: long time1=timeGetTime(); for (i=0;i<iterations;i++) { // really important routine goes here } long time2=timeGetTime(); double elapsed=(double)(time2-time1)/iterations; After this sequence, elapsed holds the average duration of the routine in milliseconds. The larger the number of iterations, the less relevant the overhead of the for loop, and thus the more reliable the measure. Loop unrolling can often be used for more accurate measurements because sometimes the loop will introduce perturbations in the measurements. You have to be careful with this approach, though, because the measurements will not be completely correct. Performance will be a bit on the optimistic side due to the fact that repeated calls can somehow take advantage of the code and data caches found in most modern-day CPUs, causing the code to execute faster than it would in normal conditions. But the result will provide a qualitative estimate of your code's speed. Memory Footprints
Once you are in control of all the timing in your game loops, it is then time to begin profiling memory use. Measuring how much memory is spent in certain portions of your program is extremely important. Memory leaks can cause your software to malfunction, and bad memory management can sometimes yield worse than expected performance due to memory trashing. Optimizing for speed only is never a good policy. Memory management errors frequently render the game unplayable. We will begin by analyzing the memory balance of a given call. By testing the total available memory before and after the call, we can perform a subtraction and discover how much memory was taken away (or returned) by the call. The approach is very similar to timing a critical portion of our code, as shown in the following code snippet: long m1=available_memory(); // here goes the code to test memory for long m2=available_memory(); long consumed_memory=m2-m1; Unfortunately, the implementation details for such a mechanism are a little more complex than simple timing routines. In multitasking environments, memory can be taken away by another program, so using the available memory as an indicator can be tricky. Should we query for the total physical memory or the virtual memory available to this application? A very popular technique is to go one step further and write our own memory manager, which keeps track of memory allocated and deallocated by the application. The most popular method is to overwrite the memory allocation calls. So instead of using the standard new and delete calls, a user-provided set is called, and memory is traced in the process. This is possible if you create a new set of calls and assign them to the standard namespace used by system calls. Here is an example: using namespace std; // overwrite the standard namespace calls unsigned long usedmemory=0; // Count of blocks allocated. unsigned long maxusedmemory=0; // Count of blocks allocated. unsigned long minusedmemory=0; // Count of blocks allocated. int numallocations; unsigned long maxallocations=0; // Count of blocks allocated. // User-defined operator new. void *operator new(size_t blocksize) { numallocations++; usedmemory+=blocksize; if (usedmemory>maxusedmemory) maxusedmemory=usedmemory; if (numallocations>maxallocations) maxallocations=numallocations; return malloc(blocksize); } Notice how this operation stores some relevant information, such as number of undeleted allocations, total used memory, and so on. In addition, we call the C malloc function to ensure that memory allocation is actually performed. For completeness, here is the equivalent code for the delete operator: void operator delete(void *pointer) { numallocations--; usedmemory-=_msize(pointer); if (usedmemory<minusedmemory) minusedmemory=usedmemory; free(pointer); } Notice how we use the _msize function, provided with all C implementations, which tells us the allocated size of a given pointer. Thus, we can keep track of the number of open handles (undeleted memory blocks), total amount of memory, and so on. Now all you have to do is display the memory allocation information at runtime. The number of blocks and total used memory should remain more or less stable. This is a clear sign that all calls to new have their corresponding delete. Additionally, as soon as objects are destroyed at application shut down, we should see a diminishing number of allocated blocks to finally end up with zero blocks and reserved memory. Any change in this behavior is most likely caused by a memory leak. As a more involved solution, we can add functionality to the overloaded new and delete calls, so memory management is actually performed by the application. We would allocate a large memory pool at boot time and use our homegrown new and delete operators to manage it. This way we can implement a specific memory management scheme that increases performance for our title. Knowing the memory access patterns of an application definitely gives the application programmer some advantage over the standard, multipurpose new and delete calls. In console titles, for example, it is very important to reduce internal fragmentation because memory is usually scarce. Thus, implementing the memory architecture manually provides an additional level of control. |