Exchange 2013 includes a new Managed Availability feature
that is incorporated in the system architecture so Exchange can detect
and resolve common problems caused by malfunctioning components.
Managed Availability runs on every Exchange 2013 server, where you see
it represented as the Health Manager Service process
(MSExchangeHMHost.exe) and the Health Manager Worker process
(MSExchangeHMWorker.exe).
The basic idea behind Managed
Availability is to deploy an extensive set of intelligent probes into
the array of services that comprise Exchange. The probes can measure
current activity against a norm, defined as the expected state of a
service in a healthy state. The data fed back by the probes are
assessed by a monitoring engine that compares what is currently
happening against the norm. If a difference is detected, the monitoring
engine determines whether the difference is sufficient to warrant
intervention. In effect, the monitoring engine acts like a
super-efficient system administrator who is constantly checking what’s
going on across every Exchange service to consider whether the server
is healthy. Unlike human administrators, Managed Availability functions
24 hours a day, 7 days a week and never takes a break to refresh itself
with coffee or any of the other brews favored by Exchange
administrators.
If
a condition arises that the monitoring engine considers problematic, it
alerts a responder engine, which is equipped with knowledge of a range
of actions that can be taken to resolve problems. Again, this can be
compared to when a human administrator notices that something is not
quite right when he examines some aspect of a server and then decides
to do something to restore the situation to one comparable to what he
would expect to see if everything is functioning normally. For example,
the IIS application pool that services inbound Outlook Web App requests
might not be responding. This is difficult for a human administrator to
detect because she doesn’t usually check this aspect of a server unless
a user reports a problem connecting through Outlook Web App. However,
it’s relatively simple for a computer probe to monitor and then report.
In this case, the action taken by the responder engine might be
multistage, similar to the way a human might try one approach to fix a
problem and, if that doesn’t work, then try another. The responder
engine can restart the application pool and then test to see whether
that attempt worked by making an artificial Outlook Web App connection.
If the connection succeeds, all is well, and the problem is resolved.
If not, the responder engine escalates its intervention, restarts the
underlying service, and again tests to measure the success of the step
taken. At this point, the responder engine can do little if its
intervention has not restored the server to full health. It could back
off and signal a high-priority alert to make a human aware of the issue
and seek his help, but all available humans might be in bed or
otherwise unavailable. Therefore, the responder engine might proceed to
the next step, which is to force a system restart in the hope that this
resolves the issue. (Many experienced system administrators will
immediately recognize the value of restarting a computer in an attempt
to resolve unresponsive problems.)
Managed Availability is
undoubtedly in its early days, and the hope is that it will improve and
evolve in terms of sophistication and capability over time. It’s worth
noting at this point why Microsoft has incorporated such a facility.
Briefly, it’s because it has found that it is extraordinarily helpful
to build as much automation as possible into servers that are deployed
in massive online services such as Office 365. Human intervention is
expensive, takes too much time, and is prone to error, whereas
computers are very good at following well-defined steps to resolve
well-understood problem conditions.
Synthetic transactions are a
good way of measuring that everything is working properly in any
transaction-based system. Even though it is an email server, you can
consider the messages Exchange processes to be transactions. Therefore,
it makes sense for Exchange to generate messages and use them to
measure whether everything along the path of those messages handles
them properly. To mimic the work human users do, the messages have to
originate from somewhere and be sent to somewhere else, and that’s
where health mailboxes are used. Two health mailboxes are created (with
archives) in every mailbox database as soon as the first mailbox is
created in the database. The Health service will recreate any mailboxes
that are missing, so if you remove the health mailboxes, they will
reappear the next time the Health service restarts.
The health mailboxes are associated with user accounts created in the Users OU in Active Directory (Figure 1).
You can also retrieve information about the health mailboxes with the
Get-Mailbox –Monitoring command; an examination of their properties
reveals that health mailboxes have their RecipientTypeDetails property
set to MonitoringMailbox. A useful one-liner is the command to report
on how much space is occupied by the health mailboxes:
Get-Mailbox –Monitoring | Get-MailboxStatistics | Format-Table DisplayName, TotalItemSize, ItemCount
Exchange
uses the health mailboxes to establish that email connectivity exists
to the various databases in the system by sending artificial messages
to and from the mailboxes every five minutes or so. This results in a
number of observable side effects, including:
That
the health mailboxes are not empty and will report that they store some
information if you examine them with Get-MailboxStatistics. This is not
an issue because the amount of data is relatively small. If you spot
that a health mailbox stores more than 100 MB, you should try to
determine why this is so.
That
the transactions for health mailboxes contribute to a certain increase
in transaction logs and replication between database copies in a DAG.
Again, the overall increase and impact is very slight. In fact, the
transactions generated by the health mailboxes help keep log
replication ticking over because databases are never left without a
transaction for very long.
That the messages sent between health mailboxes are recorded in message-tracking logs.
That
the messages sent between health mailboxes are journaled if you do not
exclude them from your journaling rules. One way of doing this is to
mark the health mailboxes by setting a known value into one of the 15
customized attributes available for mailboxes and then excluding any
messages generated by a mailbox with that value set.
At
this point, you are still learning about the operational considerations
you must take into account for both Managed Availability and health
mailboxes. A review of the current knowledge on the topic as expressed
in blogs and other Internet sources will be useful in understanding how
to factor these elements into your deployment.