Maintaining Desktop Health : Understanding Windows Error Reporting (part 1) - Error Reporting Cycle, Report Data Overview

4/19/2013 6:53:32 PM

Windows Error Reporting (WER) is the client component for the overall Watson Feedback Platform (WFP), which allows Microsoft to collect reports about failure events that occur on a user’s system, analyze the data contained in those reports, and respond back to the user in a meaningful and actionable manner.

WER is the technology that reports user-mode hangs, user-mode faults, and kernel-mode faults to the back-end servers at Microsoft and replaces Dr. Watson as the default application exception handler.

Note

WER in Windows Vista has support for any kind of problem event as defined by the developer, not just critical failures as in Windows XP.

1. Overview of Windows Error Reporting

The Watson Feedback Platform is illustrated in the high-level flow diagram in Figure 1, with Windows Error Reporting labeled as the Watson Client.

Figure 1. Watson Feedback Platform flow diagram.

In Windows Vista, the user interface for Windows Error Reporting is the Problem Reports and Solutions Control Panel applet. When installing Vista, you can choose if you would like WER to send basic problem reports automatically. Basic problem reports include only the minimum amount of information necessary to search for a solution. Later you can choose to send additional information automatically as well. The goal of the Problem Reports and Solutions control panel is to provide you with one location to simply and efficiently view the problem events that have occurred on your computer, track your reports, manage responses from Microsoft, and act on these responses to prevent failures in the future.

One significant improvement of Windows Error Reporting in Windows Vista is the concept of queuing. In Windows XP, WER reports could only be sent at the time the event occurred, with few exceptions. In Windows Vista, WER provides a flexible queuing architecture where users, administrators, or WER integrators can determine the queuing behavior of their WER events.

2. Error Reporting Cycle

The cycle begins when a report is generated on a user’s system and completes when a response is returned to the user. Overall, five primary steps are involved in this process:

Reporting

The first step is the creation and submission of the report. This can be triggered by a number of events, including an application crash, application hang, or stop error (blue screen). In Windows Vista, applications can also be designed to define their own custom event types, allowing them to initiate the reporting process when any type of problem occurs.

Categorization

After the back-end servers at Microsoft receive the report, it is categorized by problem typeCategorization may be possible with only the event parameters (text descriptors of the event) or it may require additional data (dumps). The end result of categorization is that the event reported by the customer becomes a Watson Bucket ID. This allows the developers investigating the events to determine the most frequently reported problems and focus on the most common issues.

Investigation

After the problem is categorized, development teams may view the report data via the Watson portal. The Watson portal provides the data necessary to understand high-level trends and aggregate data, such as the top errors reported against an application. It also provides a mechanism to investigate the low-level data that was reported to debug the root cause of the problem.

Resolution

After a developer has determined the root cause of a problem, ideally a fix, workaround, or new version will be created that can be made available to the customer.

Response

The final step is to close the loop with the customer that reported the problem by responding to his report with information he can use to mitigate the issue. A customer may receive a response in two ways:

If the issue is understood at the time an error report is submitted, the customer can receive a response in the form of a balloon notification immediately after the categorization step.
If the issue is not understood at the time an error report is submitted, but is resolved some time after the report, you will be able to query for updated knowledge of the problem at a later time. Users can also elect to manually check for new solutions using the Problem Reports And Solutions control panel.

3. Report Data Overview

To optimize the reporting process, the WER error data is divided into first- and second-level data. During first-level communication with the back-end servers, WER determines if more data is needed. If the server returns a request for more data, collection of the second-level data begins immediately. Simultaneously, a second-level consent dialog is displayed.

First-Level Data

First-level data consists of up to 10 string parameters that identify a particular classification of the problem. This data is stored in the report manifest file, report.wer, and is initially submitted to the Watson back-end servers. (The report.wer file is not itself sent—only the parameters are sent.) The included parameters are used to identify a class of problems. For example, the parameters for a crash (Application Name, Application Version, Module Name, Module Version, Module Offset, AppTimeStamp, ModTimeStamp, and ExceptionCode) provide a unique way to accurately classify a crash. The parameters are the only data submitted to the Watson back-end during first-level communication.

Report.wer File

Reports are stored in an archive as a folder structure on the system. Each report subfolder contains, at a minimum, the report manifest text file (report.wer), which describes the contents of the error report. Although the report.wer file is a simple text file, it is not meant to be human-readable or editable. Any files referenced by the report are also placed in this folder. The following major sections appear in most report.wer files:

Version
Event Information
Signature
UI
State
Files
Response

Second-Level Data

Second level data is additional data that may be needed to diagnose and resolve a particular bucket. Since Microsoft usually only needs a small sample of this verbose data, the second level data is submitted only if the back-end server requests it and the user consents to sharing the data. Second level data is split into two categories:

Safe data This is information that the developer feels is unlikely to contain any personal information, such as a small section of memory, a specific registry key, or a log file.
Other data This encompasses everything else, which may or may not contain personal information.

You have the option to always send safe data automatically. Second level data is specified by the back-end Watson servers and can include but is not limited to the following items: