Microsoft Exchange Server 2013 : Mailbox management - Health mailboxes

9/5/2014 4:24:16 AM

Exchange 2013 includes a new Managed Availability feature that is incorporated in the system architecture so Exchange can detect and resolve common problems caused by malfunctioning components. Managed Availability runs on every Exchange 2013 server, where you see it represented as the Health Manager Service process (MSExchangeHMHost.exe) and the Health Manager Worker process (MSExchangeHMWorker.exe).

The basic idea behind Managed Availability is to deploy an extensive set of intelligent probes into the array of services that comprise Exchange. The probes can measure current activity against a norm, defined as the expected state of a service in a healthy state. The data fed back by the probes are assessed by a monitoring engine that compares what is currently happening against the norm. If a difference is detected, the monitoring engine determines whether the difference is sufficient to warrant intervention. In effect, the monitoring engine acts like a super-efficient system administrator who is constantly checking what’s going on across every Exchange service to consider whether the server is healthy. Unlike human administrators, Managed Availability functions 24 hours a day, 7 days a week and never takes a break to refresh itself with coffee or any of the other brews favored by Exchange administrators.

If a condition arises that the monitoring engine considers problematic, it alerts a responder engine, which is equipped with knowledge of a range of actions that can be taken to resolve problems. Again, this can be compared to when a human administrator notices that something is not quite right when he examines some aspect of a server and then decides to do something to restore the situation to one comparable to what he would expect to see if everything is functioning normally. For example, the IIS application pool that services inbound Outlook Web App requests might not be responding. This is difficult for a human administrator to detect because she doesn’t usually check this aspect of a server unless a user reports a problem connecting through Outlook Web App. However, it’s relatively simple for a computer probe to monitor and then report. In this case, the action taken by the responder engine might be multistage, similar to the way a human might try one approach to fix a problem and, if that doesn’t work, then try another. The responder engine can restart the application pool and then test to see whether that attempt worked by making an artificial Outlook Web App connection. If the connection succeeds, all is well, and the problem is resolved. If not, the responder engine escalates its intervention, restarts the underlying service, and again tests to measure the success of the step taken. At this point, the responder engine can do little if its intervention has not restored the server to full health. It could back off and signal a high-priority alert to make a human aware of the issue and seek his help, but all available humans might be in bed or otherwise unavailable. Therefore, the responder engine might proceed to the next step, which is to force a system restart in the hope that this resolves the issue. (Many experienced system administrators will immediately recognize the value of restarting a computer in an attempt to resolve unresponsive problems.)

Managed Availability is undoubtedly in its early days, and the hope is that it will improve and evolve in terms of sophistication and capability over time. It’s worth noting at this point why Microsoft has incorporated such a facility. Briefly, it’s because it has found that it is extraordinarily helpful to build as much automation as possible into servers that are deployed in massive online services such as Office 365. Human intervention is expensive, takes too much time, and is prone to error, whereas computers are very good at following well-defined steps to resolve well-understood problem conditions.

Synthetic transactions are a good way of measuring that everything is working properly in any transaction-based system. Even though it is an email server, you can consider the messages Exchange processes to be transactions. Therefore, it makes sense for Exchange to generate messages and use them to measure whether everything along the path of those messages handles them properly. To mimic the work human users do, the messages have to originate from somewhere and be sent to somewhere else, and that’s where health mailboxes are used. Two health mailboxes are created (with archives) in every mailbox database as soon as the first mailbox is created in the database. The Health service will recreate any mailboxes that are missing, so if you remove the health mailboxes, they will reappear the next time the Health service restarts.

The health mailboxes are associated with user accounts created in the Users OU in Active Directory (Figure 1). You can also retrieve information about the health mailboxes with the Get-Mailbox –Monitoring command; an examination of their properties reveals that health mailboxes have their RecipientTypeDetails property set to MonitoringMailbox. A useful one-liner is the command to report on how much space is occupied by the health mailboxes:

A screen shot of the Active Directory Users And Computers console showing the health mailboxes that Exchange creates to use as part of its Managed Availability framework.

Figure 1. Health mailboxes in Active Directory

Get-Mailbox –Monitoring | Get-MailboxStatistics | Format-Table DisplayName, TotalItemSize, ItemCount

Exchange uses the health mailboxes to establish that email connectivity exists to the various databases in the system by sending artificial messages to and from the mailboxes every five minutes or so. This results in a number of observable side effects, including:

That the health mailboxes are not empty and will report that they store some information if you examine them with Get-MailboxStatistics. This is not an issue because the amount of data is relatively small. If you spot that a health mailbox stores more than 100 MB, you should try to determine why this is so.
That the transactions for health mailboxes contribute to a certain increase in transaction logs and replication between database copies in a DAG. Again, the overall increase and impact is very slight. In fact, the transactions generated by the health mailboxes help keep log replication ticking over because databases are never left without a transaction for very long.
That the messages sent between health mailboxes are recorded in message-tracking logs.
That the messages sent between health mailboxes are journaled if you do not exclude them from your journaling rules. One way of doing this is to mark the health mailboxes by setting a known value into one of the 15 customized attributes available for mailboxes and then excluding any messages generated by a mailbox with that value set.

At this point, you are still learning about the operational considerations you must take into account for both Managed Availability and health mailboxes. A review of the current knowledge on the topic as expressed in blogs and other Internet sources will be useful in understanding how to factor these elements into your deployment.

Other -----------------

- Microsoft Exchange Server 2013 : Mailbox management - Discovery mailboxes - Creating additional discovery mailboxes

- Windows Server 2012 : Administering Active Directory using Windows PowerShell (part 3) - Performing an advanced Active Directory administration task

- Windows Server 2012 : Administering Active Directory using Windows PowerShell (part 2) - Finding Active Directory administration cmdlets

- Windows Server 2012 : Administering Active Directory using Windows PowerShell (part 1) - Managing user accounts with Windows PowerShell

- Windows Server 2012 : Enabling advanced features using ADAC (part 3) - Creating fine-grained password policies

- Windows Server 2012 : Enabling advanced features using ADAC (part 2) - Configuring fine-grained password policies

- Windows Server 2012 : Enabling advanced features using ADAC (part 1) - Enabling and using the Active Directory Recycle Bin

- SQL Server 2012 : Latch Contention Examples - UP Latches in tempdb, Spinlock Contention in Name Resolution

- SQL Server 2012 : Latch Contention Examples - Queuing

- SQL Server 2012 : Latch Contention Examples - Inserts When the Clustered Index Key Is an Identity Field