Organizations that are planning to deploy
an archiving system in their environment soon realize that the
deployment can be a daunting task. Having been in this industry for
many years now, we've seen people doing things right and doing things
wrong, and doing something wrong can result in some serious trouble
(the worst is jail time).
Email archiving is a critical application for
driving down the cost of managing email for corporate governance,
litigation support, and regulatory compliance.
1. Storage Management
One of the main reasons that many organizations want
to use archiving is storage management. The offloading of old email
messages to cheaper storage makes sense, and we always describe this as
keeping your IRS tax records on the kitchen table. You don't keep your
IRS records on the table forever; you file them away where you have
easy access to them. In the years that we've been working with and
deploying archiving solutions, one thing has stood out when it comes to
storage management: 99.9 percent of email older than six months is
never accessed again and then you start to wonder why you keep them on
your Exchange server.
Most administrators mistakenly think that the
performance of the Exchange database is related to the size of the
database or the size of the mailbox. The Knowledgebase article 905803 (http://support.microsoft.com/kb/905803)
describes how Microsoft Office Outlook 2003 and 2007 users experience
poor performance when they work with a folder that contains many items
on a server that is running Exchange Server 2007, Exchange Server 2003,
or Exchange 2000 Server. The issue is caused because Outlook must
perform several operations against the Exchange server to retrieve the
contents of a folder, and the more items there are in a folder, the
more time it will take to respond to the requests. The reason for this
is restricted views; see http://technet.microsoft.com/en-us/library/cc535025.aspx
to learn more about this topic. While the article doesn't particularly
mention Exchange 2010, it does apply to this release as well. The
number of items per folder at which performance degradation starts to
take place has now been raised to around 100,000 items per folder
excluding third-party products, so these issues will be a lot less. You
can help avoid the performance degradation in Outlook by managing the
number of items in heavily used folders, including inbox, sent items,
and calendars.
Archiving solutions reduce the storage footprint,
but traditionally administrators will only perform archiving because
they want to allow end users to have transparent access to data. When
that happens, you will run into the item limit counts, because even
though the stubbed archived messages are only a few kilobytes in size,
they count toward the item limits.
A few storage management options are available, and
we'd like to go over two of them that have worked at organizations in
the past:
Time Based
With this option you perform archive data pretty
much from day 1, but you don't create stubs in the mailbox; instead,
you delete all data from the mailbox that's older than a specific age.
The philosophy behind this approach is so you don't possibly confuse
the end user with stubbed or archived messages. The time frame in which
you want to delete the older data depends on how users use email in
your organization, but deleting anything older than six months or a
year is generally a safe time frame. You have to realize that even
though you delete it from the mailbox, the data is in the archiving
system, so end users can get access to the data if they need to do so.
In some situations, however, organizations deploy an organizational
archive and do not allow end users to access the data.
Stub and Time Based
This option combines the first one—deleting data
older than six months or a year—with stubbing or archiving messages.
This means that you can squeeze out a bit more storage savings by
replacing the larger emails that are younger than six months or so.
We can't tell you exactly what will work in your
environment; however, don't create a stubbing policy that acts on data
that is younger than a few days. Not only would that create frustration
for your end users, but it would also result in data ping-pong as end
users would constantly want to restore archived data to their mailbox.
2. Importing PSTs
PSTs are notoriously bad for your environment. We
often compare them to those pesky blackberries in your garden that take
over the entire yard if you don't keep them in line. Most
administrators know what PST files are because we've been using them
daily since we started to use Exchange and Outlook. Archiving these
days has almost become a standard practice as part of a process to get
the rest of the messaging data under centralized management. It has
been nearly 15 years since the first version of Exchange Server (4.0)
was released, and many things that we have available nowadays in
Exchange we take for granted.
Two versions of PST files are available. The most
common and current version, known as the Unicode version, became
available with Office 2003 and replaced the "original" PST file
version. The main difference is that the first-generation PST file has
a 2 GB hardcoded limit and the Unicode file has a theoretical 32 TB
file size limit. In the real world, the Unicode PST file could cause
performance degradation beyond 5 GB in file size if you do not have
adequate performing hardware. Beyond 10 GB, according to Microsoft, you
will encounter short pauses on almost all hardware (see http://support.microsoft.com/?kbid=968009 for more details).
The fact that you didn't have any centralized
management tools available played a major role in the sprawl of PST
files. For more than a decade, users controlled the creation and
location of PST files. As you would expect, this has caused problems. A
company we worked with reported that they had close to 300 TB of data
in PST files that were spread over desktops, laptops, servers, and
backup tapes. The PST file storage far exceeded the storage allocated
and available to their messaging system, resulting in major headaches.
The company couldn't even bring PST files back into Exchange but had to
bring the data under centralized management.
In such a situation, an archiving system can make
your life easier. To comply with laws and regulations, you can't simply
ignore and delete PST files. It fascinates us that organizations often
spend a small fortune on protecting their messaging infrastructure with
data leak prevention software to block sensitive data from leaving the
organization unchecked. By forgetting about PST files, they might have
closed one door, but they have forgotten to close a major security
leak. One of the most common ways for end users to take their mailbox
data with them is to simply export all the contents of their mailbox to
a PST file and store it on a thumb drive or even MP3 player. They then
can walk out the door with your company's sensitive information,
contracts, and IP, all unchecked.
Even if you have managed to retain the information
in your infrastructure, the cost of storing data in PST files is
enormous. The file format itself is so bloated that it uses more
storage than if the data was kept in the Exchange database.
So how do you eradicate PST files from your environment? We recommend implementing a multistep process:
1. Write a project plan.
For smaller companies, writing a project plan
might not be as important, but for larger organizations such a plan
will come in handy. A project plan allows you to prepare and think
about exceptions that you didn't consider. For instance, what are you
going to do with data from employees who have left your organizations?
How are you going to handle password-protected PST files? A good plan
will save you time.
2. Prevent further growth of the problem.
Microsoft has finally made some good Microsoft
system management (Group Policy Object) policies available, which
allows you to restrict users from creating PST files. You can download
them from http://go.microsoft.com/fwlink/?LinkId=78161.
Use them. We love, for instance, that you now have the option Prevent
Users From Adding New Content To PST Files. This option still allows
end users to open their PST files but prevents them from adding any new
content.
3. Discover all existing PST files.
This task probably will take up the most time as you will have to find all
the files on your network. If you run scripts to do this, ensure that
you don't do an all-out search as it will saturate your network with
network traffic. The reason why it takes such a long time is because
you'll find PST files on servers, tapes, laptops, and workstations.
Think about how you are going to deal with people who work remotely.
4. Bring PST data into an archive.
Bringing PST data into an archive allows you to
bring the data back under your control. One of the reasons why you
shouldn't bring it into Exchange directly is because there is a good
chance that you might not have the required storage available. A big
advantage is that if the data is in an archive, it allows you to set
retention and gives you additional benefits when it comes to
eDiscovery, risk management, and early case assessment.
5. Give end users access to their archived PSTs.
Taking away PST files from end users and not
giving them access to their own data is the quickest way to start a
users' revolt. Give end users access to the archived data—they need
access to the data for productivity reasons.
6. Avoid creating excessive stubs.
Stub files are shortcuts in the mailbox pointing
to the archived item that now resides in the archive and no longer on
Exchange. Excessive use of stubs can create problems on Exchange with
whitespace, fragmentation, and major I/O overhead.
7. Disable PST file creation.
This final step is important because, after all,
what good would it do if you bring everything under control and then
you do not prevent your users from creating PST files again? Use the
policies that we referred to in step 2.
3. Retention
Deciding on your retention categories or how long
you want to retain information within the archive will probably take up
the most planning time. This process will involve most of the
departments in your organization, from the storage team to the Exchange
team, management, legal counsel, and even HR.
Retention controls the creation, filing, storage,
and disposal of records in a way that is not only legally correct, but
also administratively possible. Retention has to serve multiple
purposes, fulfill the operational needs, and provide a way to preserve
an adequate historical record of the information. It is very important
to implement and practice proper retention management as it allows your
organization to accomplish the following:
Reduce compliance and litigation risks by
proactively managing the retention and disposition of all potentially
discoverable information
Reduce storage costs by only storing important and relevant information in the archive
Have only the relevant information in the archive, which will also make it easier and faster to find relevant information
Increase
the reliability of information by managing the appropriate versions of
information assets and ensuring that they have high value as evidence
if they are needed in a court of law
As we said earlier, you will most likely spend most
of your time developing your retention policies, and there are
significant benefits to first developing these policies before
automating and implementing an archiving solution:
More Effective Regulatory Compliance
You don't have a choice when it comes to email
retention for regulatory compliance; it is simply an absolute
requirement. The only choice your company will have is in how you meet
the requirements: manually or with an email archiving automation
system. Creating and automating your email retention policy lowers your
overall risk of non-compliance and ensures that all required email is
kept for the required time period.
Better Legal Risk Management
The ability to show a court an updated and
regularly enforced email retention policy can demonstrate retention
policy intent and counter the claims of "spoliation" or purposeful
destruction of evidence by the plaintiff's attorney.
More Consistent Corporate Governance
Organizations these days rely on the
active generation, use, and leverage/reference of data for business
processes and decisions. The data that a business generates has a value
to the business if that data can be used efficiently. An effective
retention policy will ensure that this information will remain
available for some period of time, and an email archiving system allows
for quick search and reference.