Friday, February 25, 2011

Summarize Spreadsheet Data With Excel's Array Formulas

http://www.exceluser.com/explore/arrays1.htm

Array formulas may be Excel's most powerful feature for summarizing data. However, they also are one of Excel's
least-used features. Here's how to use this hidden power.

Subtotals can reveal very useful management information.

For example, managers might be interested to learn that sales increased by 10% last month. But they would be fascinated to learn that Pat Smith's sales of Widgets in the Northwest region doubled last month, while the sales of all other people fell sharply.

Array formulas provide a way by which Excel users can discover such useful information.
Introducing the Data

To explain the power of array formulas I'll use this database. It shows sales by Seller, Product, Region, and Customer, with Quantity and Total Sales, in dollars.

I named each column of data with the label shown in row 1. To do so, I selected the range A1:F15; chose Insert, Name, Create; chose Top Row; and then chose OK.

By assigning names in this way we anchor the names in the gray border rows, rows 2 and 15. By doing so, we can add new data to this table between the gray rows and be confident that the names will expand as needed.
Introducing Array Formulas, Example 1

Let's begin our examination of this data by summarizing the sales for Jill and Joe. Cells J3 and J4 perform this summary by using array formulas.

Here's the formula for the cell shown:

J3: {=SUM(IF(Seller=$I3,Total,0))}

Notice the braces that surround this formula. You do NOT enter those characters when you key in the formula. Instead, you type the formula shown within the braces. But when you're done typing you don't press Enter. Instead, you hold down the Ctrl and Shift keys, then press Enter. After you do so, Excel displays the formula in the formula bar with the braces, as shown above.

Here's the key to understanding how array formulas work: Each array formula creates temporary arrays in memory, and then the outside function returns the results from that array.

To illustrate, the "Seller=$I3" part of the formula creates a temporary array like this: {FALSE;TRUE;FALSE;FALSE;FALSE;TRUE;FALSE;TRUE; FALSE;TRUE;FALSE;TRUE;FALSE;FALSE} Here, the second element in the array is TRUE. That is, the second cell of the Seller range, cell A3 contains the text "Joe".

The formula says that wherever TRUE appears, return the corresponding value from the Total column. Therefore, the statement
"IF(Seller=$I3,Total,0)" returns the temporary array: {0;12600;0;0;0; 5060;0;1980;0;1540;0;4500;0;0}

Finally, the SUM function returns the sum of this temporary array: 25,680.

The second formula is similar:

J3: {=SUM(IF(Seller=$I4,Total,0))}

You can copy this formula from cell I3, or enter it for the practice. Again, if you enter it, do NOT manually enter the braces. Instead, when you enter the formula using Ctrl-Shift Enter, the braces will appear automatically.
More Examples of Array Formulas

The best way to gain an understanding of array formulas is to see several examples. All of these refer to the database at the top of this page.

Example 2

This table summarizes sales by sales person by product. The grand total, as you can see, matches the grand total of Example 1.

This array formula is slightly longer:

J9: {=SUM(IF((Seller=$I9)*(Prod=J$8),Total,0))}

Here, we multiply the first array ("Seller=$19") by a second array ("Prod=J8"). This returns an array that contains TRUE only when both corresponding cells in the array are TRUE.

That is, when you work with array formulas, you multiply when you want an AND relationship.

Also, notice that I enclosed each test in parentheses. Not only does Excel require these parentheses, they make the formula easier to understand when you read it.

When you copy the array formula to the remainder of this display, you must copy in two steps, not one. With normal formulas you could copy cell J9 to the range J9:L10. But when you try to follow the same approach with an array formula, Excel complains. That is, Excel objects when you try to copy an array formula to a multi-cell range that includes itself.

Therefore, you first copy cell J9 to cell J10, then copy the range J9:J10 to the range K9:L9.

The formulas in row 11 and in column L are merely SUM formulas for the columns and rows respectively, as shown here:

J11: =SUM(J9:J10)
L9: =SUM(J9:K9)

For convenience, in the following examples I'll call the range of calculations above and to the left of the SUMs the Working Range.

Example 3

We can dig deeper into the data by adding an additional argument. Here, we enter the label "Hats" in cell K13 to report only on sales of hats.

The array formulas for this example are slightly longer, but follow the same pattern as before:

J15: {=SUM(IF((Seller=$I15)*(Region=J$14)*(Prod=$K$13),Total,0))}

Copy this formula to the Working Range, J15:L16.

Example 4

Array formulas work with more functions than merely the SUM function. And you can perform additional calculations within the formula.

For example, this formula returns the largest price for Hats that Joe has sold:

J28: {=MAX(IF((Seller=$I28)*(Prod=J$27),Total/Qty,0))}

Copy this formula to the Working Range.

Example 5

In several of the following examples we're going to need the units sold by sales person by product. So let's calculate those numbers now.

The formula is:

P3: {=SUM(IF((Seller=$O3)*(Prod=P$2),Qty,0))}

As before, copy this formula to the Working Range.

Example 6

We can calculate the average price that Joe has charged for Hats.

Here's one way to do this:

P9: {=AVERAGE(IF((Seller=$O9)*(Prod=P$8),Total/Qty,""))}

Notice the null string in the last argument of this formula. If we had used a zero, the AVERAGE function would have included all those zeros in the calculation of its average, which we don't want. Instead, by using a null string, we cause the AVERAGE function to ignore the cells that don't satisfy our criteria. Alternatively, we could have used FALSE, like this:

P9: {=AVERAGE(IF((Seller=$O9)*(Prod=P$8),Total/Qty,FALSE))}

No matter which way we calculate this formula, it has a problem: Its average is not weighted by the number of units sold. Therefore, if you multiply each average price by each number of units sold in Example 5, you won't get a correct grand total.

Example 7

This version calculates a weighted average price by sales person by product. When we multiply each value by the equivalent number of units sold from Example 5, and total the results, we'll get the grand totals shown in Examples 1 and 2.

The key formula is:

P15: {=SUM(IF((Seller=$O15)*(Prod=P$14),Total,""))/
SUM(IF((Seller=$O15)*(Prod=P$14),Qty,""))}

(Because it's too long to fit in one row, I'm showing this formula in two rows. You would enter it in one long line, of course.)

In this specific formula, we calculate Joe's total sales for hats and then divide by the total number of hats that Joe sold. This gives us the weighted-average price for the hats that Joe sold.

As always, copy this formula to the Working Range.

Example 8

This display checks our results. The first formula is simply:

P21: =P3*P15

This formula multiplies the number of hats that Joe sold (Example 5) by the weighted average sales price (Example 7) to calculate Joe's total sales of hats, in dollars. As always, copy this formula to the Working Range. The bottom-right cell in this display shows that the total of all sales matches the grant total of the data at the beginning of this article.

Example 9

In Example 2, I said that multiplying two tests creates an AND relationship. That is, the result is TRUE only if all values that you multiply also are TRUE.

Similarly, summing two results creates an OR relationship.

To illustrate, suppose the sales manager is thinking about merging the North and the West regions. So he wants a report that treats those regions as one region.

Here's the first formula:

P29: {=SUM(IF((Seller=$O29)*
((Region=P$27)+(Region=P$28)),Total,0))}

(Because it's too long to fit in one row, I'm showing this formula in two rows. You would enter it in one long line, of course.)

Here, if the Seller is Joe and if the Region is either North or West, the array returns the Total; otherwise, it returns 0. As usual, the SUM function returns the total of the array.

When you copy the formula to cell Q29, the formula returns the total of Joe's sales for Regions that contain "South" or that are blank.
Extending Array Formulas

Array formulas can be extended in some surprising ways. Specifically, the Excel functions CHOOSE and OFFSET offer some options to keep in mind.

To illustrate, consider this formula:

D20: {=SUM(IF(Seller=$A$20,Total,0))}

Suppose you want to expand this formula. Suppose that sometimes you want to return the Total when Seller equals the label in cell A20, and at other times you want to return the Total when Region equals the label in cell B20. You want to specify the test to use by entering the value 1 (for Seller) or 2 (for Region) in cell C20.

To set up this summary, you could use the CHOOSE function in two different ways:

D20: {=SUM(IF(CHOOSE($C20,Seller=$A$20,Region=$B20),Total,0))}

D20: {=CHOOSE($C20,SUM(IF(Seller=$A$20,Total,0)),
SUM(IF(Region=$B$20,Total,0)))}

(Because it's too long to fit in one row, I'm showing the second formula in two rows. You would enter it in one long line, of course.)

Either option works the same in this specific example. But in real life, you could use these options in different ways. For example, the second approach would allow you to choose completely different formulas merely by changing the number in cell C20.

Similarly, the OFFSET function allows you to choose different rows or columns to test or sum. For example:

E20: {=SUM(IF(OFFSET(Seller,0,$A22)=$B22,Total,0))}

Here if the value in cell A22 is 1, the OFFSET function causes the formula to compare the label in cell B22 to the first column to the right of Seller, or the Prod column in the example. If the value in cell A22 is 2, the OFFSET function compares the label to the second column, which is the Region column in the example.
In Conclusion

Array formulas are the ideal tool to use whenever you need to summarize a spreadsheet database. Once you start to experiment with them, you'll see what I mean.

No comments:

Post a Comment