The intent is to warn people about the potential
problems that can arise from attempting to prematurely optimize your
code. Optimizing code that is already “fast enough” just makes the code
harder to read and maintain, or worse yet, it can introduce new bugs and
regressions into the code. Performance is important, but just like
anything else, you shouldn’t try to fix it, unless you know what you’re
off to fix and why.
One of the important aspects
of getting great performance out of your applications (and games) is an
understanding of what various portions of your code cost. For example,
do you know what the following code snippet might cost?
MediaPlayer.Play(Song.FromUri("mysong.wma", new Uri("http://www.someplace.com")));
If
you guessed quite a lot, you’d be correct! Although it is a single line
of code, there are a whole lot of things going on here. There are two
orphaned objects (the UriSong) that will eventually be collected by the garbage collector. There is FromUri
method that (with the code used here) attempts to connect to the
Internet to try to download music. What if you were on a phone with no
connectivity? What if your latency was so high, it took 5 seconds to
even start the download? Does the method download the entire file before
attempting to play or does it try to stream? Does the method return
instantly or block until sometime in the future when it has a valid song
object? You should have a basic idea of the performance (or cost) of a
method before calling it. and the
Cost can mean a number of
different things. Perhaps a method allocates an extraordinary amount of
memory; this would certainly add to its cost, potentially in a variety
of ways. For example, it can cause the garbage collector to collect more
frequently or it can push data out of the cache. A method can simply be
computationally expensive, because it’s a complex physics engine
handling millions of objects. You can have a method that is fast in most
cases, but sometimes it needs to wait for data and blocks execution
until it has it.
One easy-to-hit
performance problem in early versions of the managed runtimes was what
is called boxing. In .NET, anything can be used as an object, which is a
reference type. Before we move on to boxing, what exactly is a
reference type?
At the most basic level, a
reference type is a type that you do not access the value of directly,
but through a reference (hence, the name). This means that the object is
allocated on the heap, and the garbage collector collects it later when
it no longer has any references. Because reference types are referred
to by reference, assigning two variables to the same reference means
each variable has the same data. For example, imagine a class with a
method SetInteger that sets an integer and GetInteger that returns the value of that integer, which is shown in this code snippet:
MyClass firstRef = new MyClass();
MyClass secondRef = firstRef;
secondRef.SetInteger(5);
firstRef.SetInteger(3);
int x = secondRef.GetInteger();
This produces a value of 3 for
the variable x. This is because each variable refers to the same
reference, so modifying one naturally modifies the other. If a type
isn’t a reference type, then it is considered a value type. Most integer
primitives (and structs) are value types, and they are quite different
than reference types because assigning one value type to another does a
full copy rather than make the variables share the same reference. For
example, in the previous example, if you were using integers directly
(which are value types) rather than reference types, you would get a
different result:
int firstInt = 5;
int secondInt = firstInt;
secondInt = 3;
firstInt = 7;
int x = secondInt;
int y = firstInt;
At the end of this snippet x is 3, and y
is 7. Unlike reference types, changing one doesn’t change the other. A
common misconception of value types is that they are always on the
stack. If they are declared as a local variable, then this is normally
true, but they are not always on the stack. For example, an array of
value types is a reference type, and each individual value type is on
the heap. Similarly, if a value type is a member of a class, it is
stored on the heap, too.
Value types that are
local variables can sometimes be performance wins, though, as they are
allocated on the stack, and hence, they are not going to be garbage
collected later. However, they can also be a performance hit if you
don’t know what you’re doing with them. For example, if you have a large
value type (such as the Matrix
structure), and you declare six of them as local variables so they are
on the stack, but then pass all six into a method, this may be a
performance hit even though they are on the stack! Because assigning a
value type creates a deep copy, you allocate 12 of the structures in
total (six for your original local variables and six for the method
parameters), plus (and more importantly) you copy all the data for each
of those structures into the parameters (the Matrix
structure is 64 bytes, so that is copying 384 bytes just to call that
method). Again, this is more affirmation that you should have a rough
idea of the cost of your methods.
So, now we come back to
boxing. Although this class of performance behavior is much less
prominent than before, it can still be hit. The act of boxing is
essentially taking a value type on the stack, and using it as an object
type (which is a reference type). Doing this allocates a new reference
type and stores the data of the value type inside it. The potential
pitfall here is that all of those new objects need to be collected, and
this causes the garbage collector to kick in sooner. With the advent of
generics, the majority of common cases where people accidently box
something are gone, but not all of them.
Who Takes Out the Garbage?
By far, the most
common cause of unexpected performance problems is related to the
garbage collector. For the most part, modern day garbage collectors are
good at what they do, so before we get into potential performance
issues, let’s first take a look at what a garbage collector actually
does.
In the .NET runtime, objects are
created on the managed heap, which is in essence a large block of
memory. The runtime keeps track of this memory and remembers where the
start of the memory block is. When you allocate new objects, it is
almost free because all the runtime needs to do is move its pointer at
the top of the heap down, as seen in Figure 1.
As you allocate more and more
objects, some go out of scope, some stay in scope, and that allocation
pointer just keeps moving down the block of memory. Eventually, the
system decides to check to see if you are still using all the memory you
allocated. It does that by starting at the beginning of the memory
block and examining all of the objects that have been created. If it
finds an object that is orphaned, it marks that object, and then it
continues on.
Note
All of the points made here
about the garbage collector are at a high level. Many resources are
available to learn more about the garbage collector if you want more in
depth detail.
An object is orphaned when it
has no outstanding references to it anywhere in the application. For
example, in this code snippet, the object o is orphaned as soon as the method returns and is available to be collected:
public void SomeMethod()
{
object o = new object();
}
However, in this snippet, the object o is not orphaned when the method ends because the class itself still holds a reference to it:
public class SomeClass
{
object o;
public void SomeMethod()
{
o = new object();
}
}
If
the class becomes orphaned, then the object contained in it does, too
(provided it doesn’t have other outstanding references).
After the garbage collector
has gone through the entire memory block finding orphaned objects, it
goes and compacts the memory block. This means it goes through that
memory block and moves things around until it has a single, big
contiguous memory block again, destroying objects that have no
references.
The Windows desktop runtime is
actually even a little smarter than that. It has what is called a
generational garbage collector. When an object is first created, it
assumes it will be short lived, and it will be created in what is called
generation zero, for the short lived objects. When a certain threshold
is hit, the runtime decides to do a collection, and it looks only
through the generation zero objects, marks the ones that are orphaned,
and then collects them. However, anything that is in generation zero
that is not orphaned gets promoted to the next generation (generation
one). If another threshold is met, a generation one collection occurs,
and things not orphaned get promoted to generation two, which is the end
of the line for the objects. A generation two collection is the one
that walks the entire memory block.
As you can imagine, examining
every object you create, and then compacting the memory after a
collection is time-consuming, particularly if you have a large number of
objects. Because the generation two collection is the one that looks at
all objects, it is the worst performing collection of the group.
However, the compact
framework runtime that runs on the Xbox and Windows Phone 7 does not
have a generational garbage collector. Every collection on those
platforms looks at every object. If the garbage collector runs all the
time, your game sees noticeable slowdowns.
Note
The garbage collector has a set
of heuristics that determine when it should start a collection, but they
are not configurable, and you cannot predict when it will happen.
I’m sure the question then is,
“Well, how do I avoid that?” The answer is easy; don’t create garbage!
Now, obviously you can’t run your entire game without creating objects.
You just need to make sure that you don’t create objects all the time
during performance-critical sections of your game (such as during game
play). For example, if you create a new state object every frame to set
the device state, you will have collections during game play.
Normally though, it’s the
unexpected allocations that cause the collections, more proof that you
want to have an idea of what methods cost. As an example, did you know
that using foreach on a collection can potentially create garbage? This is what foreach breaks down into:
IEnumerator e = collection.GetEnumerator();
while (e.MoveNext())
{
var obj = e.Current;
// Your code
}
That
first line creates an object. If that object is not a value type, that
object is created on the heap and produces garbage. Most collection
types in the runtimes do not create garbage during a foreach loop, but the fact that it can is often a surprise to people.
Multithreading
Another area where you
can see an improvement in performance (particularly on Xbox) is in
multithreading. Now, this doesn’t mean you should add multithreading
everywhere because poor use of multithreading can actually make
performance worse.
If you have portions of your
code that are computationally expensive and don’t require them to be
done at a certain point in time (for example, you aren’t waiting on the
results), using another thread can be a big win for you. Creating a
thread and starting one can take a bit of time, though, so if think you
need multiple threads, you should create them all at startup to not have
to pay that cost at runtime.
Another thing to keep in mind
when using multiple threads on Xbox is what core they run on. The Xbox
has three physical cores each with two hardware threads, which means it
can run six threads simultaneously. However, unlike on Windows, you must
explicitly tell it which thread you want to run on. You do this with
the SetProcessorAffinity method as you see in the following snippet:
Thread worker = new Thread(new ThreadStart(() =>
{
Thread.CurrentThread.SetProcessorAffinity(4);
DoComplexCalculations();
}));
worker.Start();
Note
You need to set the
processor affinity as the first line in your thread method; you cannot
set it before the thread has started nor can you call it on any thread
other than the one on which it executes.
Although the Xbox has a total
of six hardware threads you can use, two of them are reserved by the XNA
runtime (thread zero and thread two). You should use the other four
hardware threads to spread your calculations across each of the hardware
threads.
One thing you want
to avoid in a multithreaded game is waiting on one of your worker
threads to continue the game. If your game is paused to wait for another
thread, you potentially don’t gain any performance.
Let’s imagine a scenario that has characteristics such as this. Your Update method tells a worker thread to perform a task that takes 1ms, while the rest of the code in the Update
method also takes 1ms to complete. However, it needs the rest of the
data from the worker thread, so it needs to wait for that thread to
finish, which after the overhead, the thread can take more than the 1ms
you attempted to save, and you would take less time by doing all the
work in your Update method.
Conversely, imagine your Update method takes 7ms to complete, and the worker thread takes 8ms to complete. Sure, your Update
method would end up waiting a millisecond or two waiting on the data,
but that’s much faster than the 15ms it would have had to wait if it was
all done in the Update call.