Developing for Windows Phone and Xbox Live : Graphics Performance

9/17/2011 6:11:25 PM

For some reason, when people discuss performance in games, a lot of times, they talk about graphics performance specifically. It’s rare they speak of physics performance or audio performance—it’s almost always graphics. This is probably because many people don’t understand graphics performance or what it means.

One of the biggest conceptual problems people have when attempting to understand graphics performance is that it doesn’t behave similarly to other parts of the system. In a normal application, you tell the computer what you want it to do, and it generally does exactly what you ask it to do in the order you ask it to do it.

Modern graphics hardware doesn’t behave anything like that. It has a bunch of stuff to do and it says, “Hey, I’m just going to do it all at once.” Top of the line graphics cards today can process thousands of operations simultaneously.

On top of that, people can get confused because they measure performance and see that their Draw calls are fast, but then there is a slowdown in the Present method that they didn’t even call! This is because anytime a graphics operation is called, the computer doesn’t actually tell the graphics hardware until later (normally, the Present operation); it just remembers what you asked it to do, and then it batches it up and sends it all to the graphics hardware at once.

Because the graphics hardware and the computer run at the same time, you might have realized that this means the graphics hardware will render your scene while your next frame’s Update is executed, as you see in Figure 1 . This is the basis of knowledge you need to understand to make good decisions about graphics performance.

Figure 1. The perfect balance of CPU and GPU

In a perfect world, this is how all games would run. As soon as the graphics hardware is done drawing the scene, the CPU gives it the next set of instructions to render. Neither piece of hardware waits, on the other, and they run perfectly in sync. This rarely happens, though; normally one waits for the other.

Let’s look at hypothetical numbers for a game. The work the CPU does takes 16ms, and the graphics hardware renders the scene in 3ms. So now, when the CPU has told the graphics hardware to start rendering and restarts its loop, the graphics hardware is done rendering and waits for its next set of instructions while the CPU still has 13ms of work left to do before it gives the graphics hardware anything new to do.

This is called being CPU bound, and it is important to know the distinction between it and “GPU bound . If your game is CPU bound and you spend a bunch of time optimizing your graphics performance, you won’t see a single bit of improvement, because the graphics hardware is already sitting around idle! In reality, you can actually add more graphics features here for free (if they were completely on the GPU), or you move some of your code from the CPU to the GPU.

Conversely, if these numbers are swapped and your CPU takes 3ms to do its work while the graphics hardware takes 16ms to render the scene, you run into the situation where the CPU is ready to give the next set of instructions to the graphics hardware only for the graphics hardware to complain, “I’m not done with the last section yet, hold on a second.” Now, the CPU sits around waiting for the GPU, and this is called being GPU bound. In this scenario, optimizing your CPU code to run faster has no impact on performance because the CPU is already sitting around waiting as it is.

Knowing if your game is CPU bound or GPU bound can go a long way in determining how to address performance issues. The obvious question here is, “How do I detect which one I am?” Unfortunately, it’s not as easy an answer as you migth think.

Sometimes you might notice that the Present call takes an awfully long time, causing your game to slow down. Then you think, “If Present takes so long, that must mean the graphics hardware is still busy, so I must be GPU bound!” This might be the case, but it can also be the case that you simply misinterpreted what happened in Present!

By default, an XNA game runs in fixed time step mode and runs with SynchronizeVerticalRetrace set to true. When this property is true, it tells the graphics hardware that it should render to the screen only when the monitor refreshes. A typical monitor has a refresh rate of 60hz, so if the monitor has that refresh rate, it refreshes 60 times per second (or every 16.6667ms). If rendering code takes 2ms to complete, the driver can still wait another 14.66667ms for the monitor refresh before completing the draw. However, if that property is set to false, the graphics hardware attempts to draw as fast as it can.

Note

Naturally, if your monitor runs at a different refresh rate from 60Hz, then the previous numbers would be different.

Of course, if you turn this off, you run into the opposite problem. Now, your graphics hardware runs as fast as it can, but because the system runs on a fixed time step (which is set to run at the default target speed for the platform: 60 frames per second on Xbox and Windows and 30 on Windows Phone 7), if you measured, you would appear to be CPU bound. This is because the game sits around and does not continue the loop until your next scheduled Update call!

So, to get a true measure of your performance, you need to run with SynchronizeVerticalRetrace and IsFixedTimeStep set to false. We recommend you create a new configuration called Profile to do your performance measurements and testing.

Note

Sometimes, any graphics call can cause a huge spike. This happens when the internal buffer being used to store graphics commands gets full. At this time, they’re all sent to the hardware for processing. If you see this happen, you probably should think about doing fewer operations per frame by batching them up.

With these two items taken care of, let’s take a look at measuring performance in your game.