For some reason, when people discuss performance in
games, a lot of times, they talk about graphics performance
specifically. It’s rare they speak of physics performance or audio
performance—it’s almost always graphics. This is probably because many
people don’t understand graphics performance or what it means.
One of the biggest
conceptual problems people have when attempting to understand graphics
performance is that it doesn’t behave similarly to other parts of the
system. In a normal application, you tell the computer what you want it
to do, and it generally does exactly what you ask it to do in the order
you ask it to do it.
Modern graphics
hardware doesn’t behave anything like that. It has a bunch of stuff to
do and it says, “Hey, I’m just going to do it all at once.” Top of the
line graphics cards today can process thousands of operations
simultaneously.
On top of that, people can get confused because they measure performance and see that their Draw calls are fast, but then there is a slowdown in the Present
method that they didn’t even call! This is because anytime a graphics
operation is called, the computer doesn’t actually tell the graphics
hardware until later (normally, the Present
operation); it just remembers what you asked it to do, and then it
batches it up and sends it all to the graphics hardware at once.
Because the graphics
hardware and the computer run at the same time, you might have realized
that this means the graphics hardware will render your scene while your
next frame’s Update is executed, as you see in Figure 1. This is the basis of knowledge you need to understand to make good decisions about graphics performance.
In a perfect world, this is how
all games would run. As soon as the graphics hardware is done drawing
the scene, the CPU gives it the next set of instructions to render.
Neither piece of hardware waits, on the other, and they run perfectly in
sync. This rarely happens, though; normally one waits for the other.
Let’s look at hypothetical
numbers for a game. The work the CPU does takes 16ms, and the graphics
hardware renders the scene in 3ms. So now, when the CPU has told the
graphics hardware to start rendering and restarts its loop, the graphics
hardware is done rendering and waits for its next set of instructions
while the CPU still has 13ms of work left to do before it gives the
graphics hardware anything new to do.
This is called being CPU bound,
and it is important to know the distinction between it and “GPU bound . If your game is CPU bound
and you spend a bunch of time optimizing your graphics performance, you
won’t see a single bit of improvement, because the graphics hardware is
already sitting around idle! In reality, you can actually add more
graphics features here for free (if they were completely on the GPU), or
you move some of your code from the CPU to the GPU.
Conversely, if these
numbers are swapped and your CPU takes 3ms to do its work while the
graphics hardware takes 16ms to render the scene, you run into the
situation where the CPU is ready to give the next set of instructions to
the graphics hardware only for the graphics hardware to complain, “I’m
not done with the last section yet, hold on a second.” Now, the CPU sits
around waiting for the GPU, and this is called being GPU bound. In this
scenario, optimizing your CPU code to run faster has no impact on
performance because the CPU is already sitting around waiting as it is.
Knowing if your game is CPU
bound or GPU bound can go a long way in determining how to address
performance issues. The obvious question here is, “How do I detect which
one I am?” Unfortunately, it’s not as easy an answer as you migth
think.
Sometimes you might notice that the Present call takes an awfully long time, causing your game to slow down. Then you think, “If Present
takes so long, that must mean the graphics hardware is still busy, so I
must be GPU bound!” This might be the case, but it can also be the case
that you simply misinterpreted what happened in Present!
By default, an XNA game runs in fixed time step mode and runs with SynchronizeVerticalRetrace
set to true. When this property is true, it tells the graphics hardware
that it should render to the screen only when the monitor refreshes. A
typical monitor has a refresh rate of 60hz, so if the monitor has that
refresh rate, it refreshes 60
times per second (or every 16.6667ms). If rendering code takes 2ms to
complete, the driver can still wait another 14.66667ms for the monitor
refresh before completing the draw. However, if that property is set to
false, the graphics hardware attempts to draw as fast as it can.
Note
Naturally, if your monitor runs at a different refresh rate from 60Hz, then the previous numbers would be different.
Of course, if you turn this
off, you run into the opposite problem. Now, your graphics hardware runs
as fast as it can, but because the system runs on a fixed time step
(which is set to run at the default target speed for the platform: 60
frames per second on Xbox and Windows and 30 on Windows Phone 7), if you
measured, you would appear to be CPU bound. This is because the game
sits around and does not continue the loop until your next scheduled Update call!
So, to get a true measure of your performance, you need to run with SynchronizeVerticalRetrace and IsFixedTimeStep set to false. We recommend you create a new configuration called Profile to do your performance measurements and testing.
Note
Sometimes, any graphics
call can cause a huge spike. This happens when the internal buffer being
used to store graphics commands gets full. At this time, they’re all
sent to the hardware for processing. If you see this happen, you
probably should think about doing fewer operations per frame by batching
them up.
With these two items taken care of, let’s take a look at measuring performance in your game.