5. How to Use Windows Memory Diagnostics
Memory
problems are one of the most common types of hardware problem. Memory
problems can prevent Windows from starting and cause unpredictable Stop
errors when Windows Vista has started. Because memory-related problems can cause intermittent failures, they can be difficult to identify.
Because
of the massive number of memory chips hardware manufacturers produce,
and the high standards customers have for reliability, memory testing
is a highly refined science. Different memory tests are designed to
detect specific types of common failures, including:
A
bit may always return 1, even if set to 0. Similarly, a bit may always
return 0, even if set to 1. This is known as a Stuck-At Fault (SAF). The wrong bit is addressed when attempting to read or write a specific bit. This is known as an Address Decoder Fault (AF). A section of memory may not allow values to change. This is known as a Transition Fault (TF). A section of memory changes when being read. This is called a Read Disturb Fault (RDF). One
or more bits lose their contents after a period of time. This is known
as a Retention Fault (RF), and can be one of the more challenging types
of failures to detect. A change to one
bit affects another bit. This is known as a Coupling Fault (CF) if the
faulty bit changes to the same value as the modified bit, an Inversion
Coupling Fault (CFin) if the faulty bit changes to the opposite value
as the modified bit, or an Idempotent Coupling Fault (CFid) if the
faulty bit always becomes a certain value (1 or 0) after any transition
in the modified bit. This behavior can also occur because of a short
between two cells, known as a Bridging Fault (BF).
Given
these types of failures, it’s clear that no single test could properly
diagnose all the problems. For example, a test that wrote all 1s to
memory and then verified that the memory returned all 1s would properly
diagnose an SAF fault where memory was stuck at 0. However, it would
fail to diagnose an SAF fault where memory was stuck at 1, and it would
not be complex enough to find many bridging or coupling faults.
Therefore, to properly diagnose all types of memory failures, Windows
Memory Diagnostics provides several different types of test.
|
Fortunately,
Windows Vista includes Windows Memory Diagnostics, an offline
diagnostic tool that automatically tests your computer’s memory.
Windows Memory Diagnostics tests your memory by repeatedly writing
values to memory and then reading those values from memory to verify
that they have not changed. To identify the widest range of memory
failures, Windows Memory Diagnostics includes three different testing
levels:
Basic
Standard All basic tests, plus:
Extended All standard tests, plus:
While
the specifics of each of these tests is not important for
administrators to understand, it is important to understand that memory
testing is never perfect. Failures are often intermittent, and may only
occur once every several days or weeks in regular usage. Automated
tests such as those done by Windows Memory Diagnostics increase the
likelihood that a failure can be detected; however, you can still have
faulty memory while Windows Memory Diagnostics indicates that no
problems were detected. To minimize this risk, run Extended tests, and
increase the number of repetitions. The more tests you run, the more
confident you can be in the result.
After
Windows Memory Diagnostic completes testing, the computer will
automatically restart. Windows Vista will display a notification bubble
with the test results, as shown in Figure 3, and you can view events in the System Event Log with the source MemoryDiagnosticsResults (event ID 1201).
If
you do identify a memory failure, it is typically not worthwhile to
attempt to repair the memory. Instead, you should replace unreliable
memory. If the computer has multiple memory cards and you are unsure
which card is causing the problem, replace each card and then rerun
Windows Memory Diagnostics until the computer is reliable.
If
problems persist even after replacing the memory, the problem is caused
by an outside source. For example, high temperatures (often found in
mobile PCs) can cause memory to be unreliable. While computer
manufacturers typically choose memory specifically designed to
withstand high temperatures, adding third-party memory that does not
meet the same specifications can cause failure. Besides heat, other
devices inside the computer can cause electrical interference. Finally,
motherboard or processor problems may occasionally cause memory
communication errors that resemble failing memory.