Getting the Mode of Categories with a Formula
I
have pointed out that Excel’s MODE() function does not work when you
supply it with text values as its arguments. Here is a method for
getting the mode using a worksheet formula. It tells you which text
value occurs most often in your data set. You’ll also see how to enter
a formula that tells you how many instances of the mode exist in your
data.
If you don’t want to resort to a pivot chart to get the mode of a group of text values, you can get their mode with the formula
=INDEX(A2:A21,MODE(MATCH(A2:A21,A2:A21,0)))
assuming that the text values are in A2:A21. (The
range could occupy a single column, as in A2:A21, or a single row, as
in A2:Z2. It will not work properly with a multi-row, multicolumn range
such as A2:Z21.)
If you’re somewhat new to Excel, that formula isn’t
going to make any sense to you at all. I structured it, I’ve been using
Excel frequently since 1994, and I still have to stare at the formula
and think it through before I see why it returns the mode. So if the
formula seems baffling, don’t worry about it. It will become clear in
the fullness of time, and in the meantime you can use it to get the
modal value for any set of text values in a worksheet. Simply replace
the range address A2:A21 with the address of the range that contains
your text values.
Briefly, the components of the formula work as follows:
The MATCH() function returns the position in
the array of values where each individual value first appears. The
third argument to the MATCH() function, 0, tells Excel that in each
case an exact match is required and the array is not necessarily
sorted. So, for each instance of Ford in the array, MATCH() returns 1;
for each instance of Toyota, it returns 2; for each instance of GM, it
returns 4.
The results of the MATCH()
function are used as the argument to MODE(). In this example, there are
twenty values for MODE() to evaluate: some equal 1, some equal 2 and
some equal 4. MODE() returns the most frequently occurring of those
numbers.
The result of MODE() is used as
the second argument to INDEX(). Its first argument is the array to
examine. The second argument tells it how far into the array to look.
Here, it looks at the first value in the array, which is Ford. If, say,
GM had been the most frequently occurring text value, MODE() would have
returned 4 and INDEX() would have used that value to find GM in the
array.
Using an Array Formula to Count the Values
With the modal value (Ford, in this example) in
hand, we still want to know how many instances there are of that mode.
This section describes how to create the array formula that counts the
instances.
Figure 2 also shows, in cell C2, the count of the number of records that belong to the modal value. This formula provides that count:
=SUM(IF(A2:A21=C1,1,0))
The formula is an array formula, and must be entered
using the special keyboard sequence Ctrl+Shift+Enter. You can tell that
a formula has been entered as an array formula if you see curly
brackets around it in the formula box. If you array enter the prior
formula, it will look like this in the formula box:
{=SUM(IF(A2:A21=C1,1,0))}
But don’t supply the curly brackets yourself. If you do, Excel interprets this as text, not as a formula.
Here’s how the formula works: As shown in Figure 2,
cell C1 contains the value “Ford”. So the following fragment of the
array formula tests whether values in the range A2:A21 equal the value
“Ford”:
A2:A21=C1
Because there are 20 cells in the range A2:A21, the
fragment returns an array of TRUE and FALSE values: TRUE when a cell
contains “Ford” and FALSE otherwise. The array looks like this:
{TRUE;FALSE;TRUE;FALSE;FALSE;FALSE;TRUE;FALSE;TRUE;TRUE;
FALSE;FALSE;FALSE;TRUE;TRUE;FALSE;FALSE;TRUE;FALSE;FALSE}
Specifically, cell A2 contains “Ford” and so it
passes the test: The first value in the array is therefore TRUE. Cell
A3 does not contain “Ford” and so it fails the test: The second value
in the array is therefore FALSE—and so on for all 20 cells.
Note
The array of TRUE and FALSE values is an
intermediate result of this array formula (and of many others, of
course). As such, it is not routinely visible to the user, who normally
needs to see only the end result of the formula.
Now step outside that fragment, which, as we’ve just
seen, resolves to an array of TRUE and FALSE values. The array is used
as the first argument to the IF() function. Excel’s IF() function takes
three arguments:
The first argument is a value that can be
TRUE or FALSE. In this example, that’s each value in the array just
shown, returned by the fragment A2:A21=C1.
The
second argument is the value that you want the IF() function to return
when the first argument is TRUE. In the example, this is 1.
The
third argument is the value that you want the IF() function to return
when the first argument is FALSE. In the example, this is 0.
The
IF() function examines each of the values in the array to see if it’s a
TRUE value or a FALSE value. When a value in the array is TRUE, the
IF() function returns, in this example, a 1, and a 0 otherwise.
Therefore, the fragment
IF(A2:A21=C1,1,0)
returns an array of 1’s and 0’s that corresponds to the first array of TRUE and FALSE values. That array looks like this:
{1;0;1;0;0;0;1;0;1;1;0;0;0;1;1;0;0;1;0;0}
A 1 corresponds to a cell in A2:A21 that contains
the value “Ford” and a 0 corresponds to a cell in the same range that
does not contain “Ford”. Finally, the array of 1’s and 0’s is presented
to the SUM() function, which totals the values in the array. Here, that
total is 8.