Wednesday, February 23, 2011

Optimizing VBA

Accessing Cells In A Range


You do not need to use the .Cells method to access specific cells in a range. For example, you can use

Range("MyRange")(1,2) rather than
Range("MyRange").Cells(1,2)

See Alan Beban's explanation of this method for more details.

Related to this is the shortcut method of refering to cells. VBA will allow you reference cells with [A1] rather than Range("A1"). While the [A1] syntax is easier to type, it is slower to execute than the Range("A1") syntax.



Calculation Mode


Normally, Excel will recalculate a cell or a range of cells when that cell's or range's precedents have changed. This may cause your workbook to recalculate too often, which will slow down performance. You can prevent Excel from recalculating the workbook by using the statement:

Application.Calculation = xlCalculationManual

At the end of your code, you can set the calculation mode back to automatic with the statement:

Application.Calculation = xlCalculationAutomatic

Remember, though, that when the calculation mode is xlCalculationManual, Excel doesn't update values in cells. If your macro relies on an updated cell value, you must force a Calculate event, with the .Calculate method, which may be applied to either a specific range (Range("MyRange").Calculate) or to the entire workbook (Calculate).



Collection Indexes


An individual item of a collection object may be accessed by either its name or by its index into the collection. For example, if you have three worksheets ("Sheet1", "Sheet2", and "Sheet3") in a workbook ("MyWorkbook"), you can reference "Sheet2" with either

Worksheets("Sheet2") or
Worksheets(2)

In general, the index number method (Worksheets(2)) is considerably faster than the index name method (Worksheets("Sheet2")).

However, the number and order of items in a collection may change, so it is usually safer and easier to refer to items in a collection by their name, rather than their index number.

Constants


Whenever you can, declare values as constants, rather than variables. Since their values never change, they are evaluated only once when your code is compiled, rather than each time they are used at run time.



Early Binding


This is closely tied with Specific Object Type Declaration. If you're going to work with another application, such as Word, declare your OLE object directly, rather than as an Object type variable. By doing so, a great deal of overhead is done at compile time ("Early Binding") rather than at run time ("Late Binding"). For example, use

Dim WordObj As Word.Application rather than
Dim WordObj As Object

Also see Using Variables (Properly) In VBA.

FOR EACH Loops


When looping through a collection it is usually faster than the FOR EACH statement rather than using the index. For example, the first code loop is faster than the second:

Dim WS as Worksheet
For Each WS In Worksheets
MsgBox WS.Name
Next WS

Dim i as Integer
For i = 1 To Worksheets.Count
MsgBox Worksheets(i).Name
Next i



Range Objects Not Selection Object


Generally, it is not necessary to select a range before working with it. For example, it is more efficient to use

Range("A1").Font.Bold = True

Rather than

Range("A1").Select
Selection.Font.Bold = True



Screen Updating


You can turn off screen updating so that Excel does not update the screen image as your code executes. This can greatly speed up your code.

Application.ScreenUpdating = FALSE

Be sure to restore the setting to True at the end of your macro. Older version of Excel would automatically restore the setting; Excel97 does not.



Simple Objects Rather Than Compound Objects


If you've got to make repeated references to an object, such a range, declare an object of that type, set it to the target object, and then use your object to refer to the target. For example,

Dim MyCell As Range
Set MyCell = Workbooks("Book2").Worksheets("Sheet3").Range("C3")
'....
MyCell.Value = 123

By referring directly to MyCell , VBA can access the object directly, rather than resolving the complete path to the object each time. This method is useful only when you are accessing an object several times during code execution.



Specific Object Type Declaration


If possible avoid using the Object or Variant data types. These data types require considerable overhead to determine their types. Instead, use explicit data types, such as

Dim MySheet As Worksheet rather than
Dim MySheet As Object
Or
Dim NumRows As Long rather than
Dim NumRows As Variant

This is especially true with index variables in For Next loops, since a Variant type has to be re-evaluated each iteration of the loop.

Also see Using Variables (Properly) In VBA.



WITH Statements


If you are using several statement in a row that apply to the same object, use a WITH statement, rather than fully qualifying the object each time. For example,

With Worksheets("Sheet1").Range("A1")
.Font.Bold = True
.Value = 123
End With



Worksheet Functions


You can use Excel's standard worksheet functions in your VBA code, rather than writing the functions in VBA. Since these are fully executable instructions in native code, rather than interpreted VBA code, they run much faster. For example, use

MySum = Application.WorksheetFunction.Sum(Range("A1:A100"))

rather than

For Each C In Range("A1:A100")
MySum = MySum + C.Value
Next C
+++++++++++++++++++++++
Turn Off Everything But the Essentials While Your Code is Running

This optimization explicitly turns off Excel functionality you don't need to happen (over and over and over) while your code runs. Note that in the code sample below we grab the current state of these properties, turn them off, and then restore them at the end of code execution.

One reason this helps is that if you're updating (via VBA) several different ranges with new values, or copy / pasting from several ranges to create a consolidated table of data, you likely do not want to have Excel taking time and resources to recalculate formulas, display paste progress, or even redraw the grid, especially after every single operation (even more so if your code uses loops). Just one recalculation and one redraw at the end of your code execution is enough to get the workbook current with all your changes.

Here's some sample code that shows how and what to shut off while your code runs. Doing this should help improve the performance of your code:

'Get current state of various Excel settings; put this at the beginning of your code

screenUpdateState = Application.ScreenUpdating

statusBarState = Application.DisplayStatusBar

calcState = Application.Calculation

eventsState = Application.EnableEvents

displayPageBreakState = ActiveSheet.DisplayPageBreaks 'note this is a sheet-level setting

'turn off some Excel functionality so your code runs faster

Application.ScreenUpdating = False

Application.DisplayStatusBar = False

Application.Calculation = xlCalculationManual

Application.EnableEvents = False

ActiveSheet.DisplayPageBreaks = False 'note this is a sheet-level setting

'>>your code goes here<<

'after your code runs, restore state; put this at the end of your code

Application.ScreenUpdating = screenUpdateState

Application.DisplayStatusBar = statusBarState

Application.Calculation = calcState

Application.EnableEvents = eventsState

ActiveSheet.DisplayPageBreaks = displayPageBreaksState 'note this is a sheet-level setting

Here's a quick description for each of these settings:

Application.ScreenUpdating: This setting tells Excel to not redraw the screen while False. The benefit here is that you probably don't need Excel using up resources trying to draw the screen since it's changing faster than the user can perceive. Since it requires lots of resources to draw the screen so frequently, just turn off drawing the screen until the end of your code execution. Be sure to turn it back on right before your code ends.

Application.DisplayStatusBar: This setting tells Excel to stop showing status while False. For example, if you use VBA to copy/paste a range, while the paste is completing Excel will show the progress of that operation on the status bar. Turning off screen updating is separate from turning off the status bar display so that you can disable screen updating but still provide feedback to the user, if desired. Again, turn it back on right before your code ends execution.

Application.Calculation: This setting allows you to programmatically set Excel's calculation mode. "Manual" (xlCalculationManual) mode means Excel waits for the user (or your code) to explicitly initiate calculation. "Automatic" is the default and means that Excel decides when to recalculate the workbook (e.g. when you enter a new formula on the sheet). Since recalculating your workbook can be time and resource intensive, you might not want Excel triggering a recalc every time you change a cell value. Turn off calculation while your code executes, then set the mode back. Note: setting the mode back to “Automatic” (xlCalculationAutomatic) will trigger a recalc.

Application.EnableEvents: This setting tells Excel to not fire events while False. While looking into Excel VBA performance issues I learned that some desktop search tools implement event listeners (probably to better track document contents as it changes). You might not want Excel firing an event for every cell you're changing via code, and turning off events will speed up your VBA code performance if there is a COM Add-In listening in on Excel events. (Thanks to Doug Jenkins for pointing this out in my earlier post).

ActiveSheet.DisplayPageBreaks: A good description of this setting already exists: http://support.microsoft.com/kb/199505 (Thanks to David McRitchie for pointing this out).

Read/Write Large Blocks of Cells in a Single Operation

This optimization explicitly reduces the number of times data is transferred between Excel and your code. Instead of looping through cells one at a time and getting or setting a value, do the same operation over the whole range in one line, using an array variable to store values as needed.

For each of the code examples below, I had put random values (not formulas) into cells A1:C10000.

Here's a slow, looping method:

Dim DataRange as Range
Dim Irow as Long
Dim Icol as Integer
Dim MyVar as Double
Set DataRange=Range("A1:C10000")

For Irow=1 to 10000
For icol=1 to 3
MyVar=DataRange(Irow,Icol) 'Read values from the Excel grid 30K times
If MyVar > 0 then
MyVar=MyVar*Myvar ' Change the value
DataRange(Irow,Icol)=MyVar 'Write values back into the Excel grid 30K times
End If
Next Icol
Next Irow

Here's the fast version of that code:

Dim DataRange As Variant
Dim Irow As Long
Dim Icol As Integer
Dim MyVar As Double
DataRange = Range("A1:C10000").Value ' read all the values at once from the Excel grid, put into an array

For Irow = 1 To 10000
For Icol = 1 To 3
MyVar = DataRange(Irow, Icol)
If MyVar > 0 Then
MyVar=MyVar*Myvar ' Change the values in the array
DataRange(Irow, Icol) = MyVar
End If
Next Icol
Next Irow
Range("A1:C10000").Value = DataRange ' writes all the results back to the range at once

Note: I first learned of this concept by reading a web page by John Walkenbach found here: http://www.dailydoseofexcel.com/archives/2006/12/04/writing-to-a-range-using-vba/

A previous Excel blog entry by Dany Hoter also compares these two methods, along with a selection / offset method as well: http://blogs.msdn.com/excel/archive/2008/10/03/what-is-the-fastest-way-to-scan-a-large-range-in-excel.aspx

...which leads me to my next point.

Avoid Selecting / Activating Objects

Notice that in the above-referenced blog post, the selection method of updating a range was the slowest. This next optimization minimizes how frequently Excel has to respond to the selection changing in the workbook by minimizing the selection changing as much as possible.

Range Example: Again, see the Excel blog post quoted above. It demonstrates that using selection is the slowest of the 3 methods discussed for reading and writing to ranges.

Shapes Example: Setup: I have 40 shapes on a sheet, and I want to write "Hello" in each of them.

Using the slower "selection" method, the code looks like this:

For i = 0 To ActiveSheet.Shapes.Count

ActiveSheet.Shapes(i).Select

Selection.Text = "Hello"

Next i

The much faster method is to avoid selection completely and directly reference the shape:

For i = 0 To ActiveSheet.Shapes.Count

ActiveSheet.Shapes(i).TextEffect.Text = "Hello"

Next i

The concepts illustrated by the examples above can also be applied to objects other than Ranges and Shapes.

Note: I first learned of this concept, in the context of shapes, by reading a web page by Ron de Bruin found here: http://www.rondebruin.nl/shape.htm

Related Performance Paper

See the "Improving Performance in Excel 2007" paper on MSDN: http://msdn.microsoft.com/en-us/library/aa730921.aspx

This is a fairly detailed and comprehensive paper that introduces the bigger grid and increased limits in Excel 2007, and primarily focuses on Excel calculation performance and debugging calculation performance bottlenecks. There's also a short section on how to write faster VBA macros.

Other Performance Optimizations

While the above optimizations are what I consider the most important, there are a few other "honorable mention" optimizations I will mention briefly for you to consider.

Consider the performance gains by implementing your code's functionality via XLL / C-API. An overview and supporting materials for the SDK can be found here: http://msdn.microsoft.com/en-us/library/bb687827.aspx .

Declare variables with explicit types to avoid the overhead of determining the data type (repetitively if used in a loop) during code execution.

For simple functions used by your code in high frequency, implement them yourself in VBA instead of using the WorksheetFunction object.

Use Range.SpecialCells() to scope down the number of cells your code needs to work with.
+++++++++++++++++++++++
Tip #1: minimise traffic between worksheets and VBA.

There is substantial overhead in communications between worksheets and VBA. It works both ways. So if you have a custom VBA function which you call from hundreds of worksheet cells, it could be s-l-o-w. Equally, if you use VBA to read in lots of cells to memory, or write lots of data back to cells, it can be real slow.

When working in VBA, the trick is to read/write data in chunks. We had a real life example, where we needed to read in 1000 records off a worksheet, each of 20 cells, do some calculations, and write back 14 cells at the end of each record. That's 34,000 hits on the worksheet. We cut it to just 15. Here's how.

We read in 125 records at once (that's 125x20 = 2500 cells) into an array, then looped through and calculated the 14 output cells for each of these records, and stored them in a memory array. Then we read the next batch of 125 records, and so on, until we hit empty cells. So far we've only made 8 (input) hits on the worksheet. Then we wrote the 14,000 results back by writing columns instead of rows - in fact, two columns at a time, each containing all the 1000 or so records. That only requires 7 writes to take care of the 14 output columns, or just 15 hits in total. The speed improvement was phenomenal.

You may have some questions about this:

How do you read in/write out lots of cells at once?

'to read in
Dim A as Variant 'MUST be variant, no brackets
A = Range("SomeRange").Resize(10,20) 'reads 10x20 array starting at range SomeRange
'(NB I've used Resize above but you can specify a range of cells any way you want)

'to write back to sheet
Range("SomeRange").Resize(10,20) = A
'A can be any data type but MUST be two dimensional even if you are only writing one
'column or row - first dimension is used for rows, and the second for columns
'this can be slow - see third question below for workaround..

Why read in 125 records at a time? Why not read in all 1000?

Excel 5&7 can't read in more than about 3000 cells at once or you get an error. The same goes for writing back to a worksheet. So we had to do it in chunks.

Reading in is quick, but writing back takes forever.

Excel 5&7 have a very slow method of writing back to the sheet, so someone came up with a neat workaround. This is 5-10 times faster than setting a range equal to a VBA array. (Don't use this in Excel 8, it is faster to say Range = Array).

Sub SuperBlastArrayToSheet(TheArray As Variant, TheRange As Range)
With TheRange.Parent.Parent 'the workbook the range is in
.Names.Add Name:="wstempdata", RefersToR1C1:=TheArray
With TheRange
.FormulaArray = "=wstempdata"
.Copy
.PasteSpecial Paste:=xlValues
End With
.Names("wstempdata").Delete
End With
End Sub

What about VBA functions used in worksheets?

It's harder to minimise traffic when your worksheet is calling VBA functions. However, something you can do is avoid reinitialising or reading in values over and over. Suppose your function needs 5 parameters, 4 of which are set at the top of the sheet, and only one of which is set by the calling cell. You can cut down the calculation load, by loading the 4 parameters the first time only, by putting a test in your function like "If A=0 Then Initialise", to call an initialisation routine. This can be done because VBA keeps variable values in memory once they are set (unless they are dimensioned within a subroutine or function, in which case they disappear when the function ends) until either you close the worksheet or make any change to the VBA code. So you can set the variables once and use them forever.

Apart from this, you will generally find that it pays to avoid VBA functions in big spreadsheets. Personally, I try to use either VBA or worksheet formulae, rather than a mixture.

Tip #2 : Declare variables for speed

This is pretty well known stuff so I'll keep it brief. Variants are powerful but slow. Use them only when you have to or when it doesn't matter. Ideally, use integers or booleans wherever possible, followed by long, single and double.

And watch this trap..

Dim A, B, C As Integer 'only defines C as integer, the others will be variants
Dim A As Integer, B As Integer, C As Integer 'defines all 3 as integer

And this one. The function Function Something() returns a variant, whereas Function Something() As Integer returns an integer. Lesson: define functions as something.

Tip #3 : avoid overuse of worksheet functions in code

Here's an example. VBA doesn't have a Max or Min function, but Excel does. So you could write..

C = Application.Max (A, B) 'uses Excel worksheet function in VBA to return max of A & B

or do it the hard way in code like this

If A>=B Then C = A Else C = B

Now you won't believe the next part if you haven't seen it already. The code method is about 150 times faster (on my PC, anyway). So if you are doing some big loops, avoid worksheet functions if you can, or at least test comparative speed.

Some built-in VBA functions are real slow too. It's worth testing the speed on any you use a lot, just to be sure.

Tip #4 : Turn off screen updating, recalculation

If you're working with multiple sheets, screen refresh can slow you down and distract your user. Put Application.ScreenUpdating = False at the top of your routine and it will freeze screen updating until all your code has stopped executing.

Similarly, you can suspend auto-recalculation with Application.Calculation = xlManual, and return it to auto at the end with Application.Calculation = xlAutomatic

Tip #5 : Don't select sheet objects to use them

The macro recorder selects objects before working with them. This is not necessary and slows down execution. So you can change this ..

Sheets("Sheet2").Select
Range("D9").Select
ActiveCell.FormulaR1C1 = "3"

to

Sheets("Sheet2").Range("D9") = 3

Remember, Excel is pretty smart, so you only need to describe an object enough to make it unambiguous. In fact, it pays (in speed) not to mention unnecessary objects, because every object you mention has to be resolved by VBA.

So Range("Hobbes") is faster than Workbooks("Comic").Sheets("Calvin").Range("Hobbes").

Tip #6 : Psychological slowness - Give the user feedback

Isn't it frustrating when something is taking forever and you have no idea when it will finish? Time seems to crawl. You can use the statusbar to let users know how things are going. Below is an example based on reading in a datafile. It includes a way of measuring the percentage read so far.

'get length of file initially, divide by 128 for reason given later
'we will show percentage completed
LenF = FileLen(MyFile) / 128
Open MyFile For Input As #1
Do While Not EOF(1) ' read in file until finished
Line Input #1,SomeText
N = N + 1 'counts lines read
'update statusbar every 100 records - not every record!
'Loc returns position in file, in multiples of 128 bytes. (That's why we divided the LenF
'variable by 128. Loc / LenF will give us the percentage read in).
'you could equally show the number of records read
If N Mod 100 = 0 Then Application.StatusBar = "Processed " & Format(Loc(1)/LenF,"0%")
Loop
Close
'give statusbar back to Excel
Application.StatusBar = False

In addition, if your spreadsheet loads slowly because of initialisation routines, you might consider showing a splash screen (say, a textbox attractively labelled with the name of your program, etc) for a few seconds. It all helps pass the time for the user.

Tip #7 : Read/write data to file quickly

A couple of quickies here.. To read a lot of data quickly, try this..

Open MyFile For Input As #1
myVar = Input$(10000,1) 'reads 10000 chars OR
myVar = Input$(Lof(1),1) 'reads whole file

To read/write delimited data instantly between a file and a memory array, try this (Excel 8 only, sorry).

'to write the data
'dimension and fill array A, then ....
Open "c:\temp\test.txt" For Binary As #1
Put #1,,A 'writes whole of A to file
Close

'to read it back
Dim A(30,10) As Single
Open "c:\temp\test.txt" For Binary As #1
Get #1,,A 'reads whole of A
Close

No comments:

Post a Comment