The mean gives you a measure of central
tendency by taking all the actual values in a group into account. The
median measures central tendency differently, by giving you the
midpoint of a ranked group of values. The mode takes yet another tack:
It tells you which one of several categories occurs most frequently.
You can get this information from the FREQUENCY() function.
But the MODE() function returns the most frequently occurring
observation only, and it’s a little quicker to use. Furthermore, as
you’ll see in this section, a little work can get MODE() to work with
data on a nominal scale—that’s also possible with FREQUENCY() but it’s
a lot more work.
Suppose you have a set of numbers in a range of cells, as shown in Figure 1. The following formula returns the numeric value that occurs most frequently in that range (in Figure1, the formula is entered in cell C1):
=MODE(A2:A21)
The pivot chart in Figure 1
provides the same information graphically. Notice that the mode
returned by the function in cell C1 is the same value as the most
frequently occurring value shown in the pivot chart.
The problem is that you don’t usually care
about the mode of numeric values. It’s possible that you have at hand a
list of the ages of the people who live on your block, or the weight of
each player on your favorite football team, or the height of each
student in your daughter’s fourth grade class. It’s even conceivable
that you have a good reason to know the most frequently occurring age,
weight, or height in a group of people.
Among other purposes, numeric measures are good for
recording small distinctions: Joe is 33 years old and Jane is 34; Dave
weighs 230 pounds and Don weighs 232; Jake is 47 inches tall and Judy
stands 48 inches. In a group of 18 or 20 people, it’s quite possible
that everyone is of a different age, or a different weight or a
different height. The same is true of most objects and numeric
measurements that you can think of.
In that case, it is not plausible that you would
want to know the modal age, or weight, or height. The mean, yes, or the
median, but why would you want to know that the most frequently
occurring age is 47 years, when the next most frequently occurring age
is 46 and the next is 48?
The mode is seldom a useful statistic when the
variable being studied is numeric and ungrouped. It’s when you are
interested in nominal data,
categories such as brands of cars or children’s given names or
political preferences—that the mode is of interest. It’s worth noting
that the mode is the only sensible measure of central tendency when
you’re dealing with nominal data. The modal boy’s name for newborns in
2009 was Jacob; that statistic is interesting to some people in some
way. But what’s the mean of Jacob, Michael, and Ethan? The median of
Emma, Isabella, and Emily? The mode is the only sensible measure of
central tendency for nominal data.
But Excel’s MODE() function doesn’t work with
nominal data. If you present to it, as its argument, a range that
contains exclusively text data such as names, MODE() returns the #N/A
error value. If one or more text values are included in a list of
numeric values, MODE() simply ignores the text values.
I’ll take this opportunity to complain that it
doesn’t make a lot of sense for Excel to provide analytic support for a
situation that seldom occurs (for example, caring about the modal height
of a group of fourth graders) while it fails to support situations that
occur all the time (“Which model of car did we sell most of last
week?”).
Figure 2 shows a couple of solutions to the problem with MODE().
In contrast to the pivot chart shown in Figure 1, where just one value pokes up above the others because it occurs twice instead of once, the frequency distribution in Figure 2
is more informative. You can see that Ford, the modal value, leads
Toyota by a slim margin and GM by somewhat more. (This report is
genuine and was exported to Excel by a used car dealer from a popular
small business accounting package.)
To create a pivot chart that looks like the one in Figure 2, follow these steps:
1. | Arrange
your raw data in an Excel list format: the field name in the first
column (such as A1) and the values in the cells below the field name
(such as A2:A21).
|
2. | Select
a cell that has several empty columns to its right and several empty
rows below it. This is to avoid overwriting any important data with the
pivot table.
|
3. | Click
the Ribbon’s Insert tab, and click the PivotTable drop-down in the
Tables group. Choose PivotChart from the drop-down list. The dialog box
shown in Figure 3 appears.
|
4. | Identify
the range that contains your raw data (refer to step 1) by dragging
through it with your mouse pointer, by typing its range address, or by
typing its name if it’s a named table or range. The location of the
data should now appear in the Table/Range edit box. Click OK to get the
layout shown in Figure 4.
|
5. | In
the PivotTable Field List, drag the field or fields you’re interested
in down from the list and into the appropriate area at the bottom. In
this example, you would drag Make down into the Axis Fields area and
also drag it into the Values area.
|
The pivot chart and the pivot table that the pivot
chart is based on both update as soon as you’ve dropped a field into an
area. If you started with the data shown in Figure 2, you should get a pivot chart that’s identical, or nearly so, to the pivot chart in that figure.
Note
Excel makes one of two assumptions, depending on
whether the cell that’s active when you begin to create the pivot table
contains data.
One, if you started by selecting an empty cell,
Excel assumes that’s where you want to put the pivot table’s upper-left
corner. Excel puts the active cell’s address in the Location edit box,
as shown in Figure 3.
Two, if you started by selecting a cell that
contains a value or formula, Excel assumes that cell is part of the
source data for the pivot table or pivot chart. Excel finds the
boundaries of the contiguous, filled cells and puts the resulting
address in the Table/Range edit box.
A few comments on this analysis:
The mode is quite a useful statistic when
it’s applied to categories: political parties, consumer brands, days of
the week, states in a region, and so on. Excel really should have a
built-in worksheet function that returns the mode for text values. But
it doesn’t, and the next section shows you how to write your own
worksheet formula for the mode, one that will work for both numeric and
text values.
When you have just a few
distinct categories, consider building a pivot chart to show how many
instances there are of each. A pivot chart that shows the number of
instances of each category is an appealing way to present your data to
an audience. (There is no type of chart that communicates well when
there are many categories to consider. The visual clutter obscures the
message. In that sort of situation, consider combining categories or
omitting some.)
Standard Excel charts do
not show the number of instances per category without some preliminary
work. You would have to get a count of each category before creating
the chart, and that’s the purpose of the pivot table that underlies the
pivot chart. The pivot chart, based on the pivot table, is simply a
faster way to complete the analysis than creating your own table to
count category membership and then basing a standard Excel chart on
that table.
The mode is the only
sensible measure of central tendency when you’re working with nominal
data such as category names. The median requires that you rank order
things in some way: shortest to tallest, least expensive to priciest,
or slowest to fastest.