Friday, January 7, 2011

Tutorial: Numerical Analysis in Excel using C# with ExcelDna and AlgLib

Quite often, I need to use numerical techniques more advanced than those that ship with Excel. Sometimes this involves writing my own algorithms from scratch, but when possible, I prefer to make use of existing libraries. Tens or hundreds of thousands of open source developers have worked to make great libraries - not only for numerical analysis, but for messaging, web scraping, and a jillion other tasks. Exposing this functionality to Excel makes it far more powerful. With that in mind, I'd like to share the workflow I use to access an open source numerical library, AlgLib using C#.

C#

So why C# rather than VBA?

1. C# lets me use version control. Once you've used version control, you can't go back.
2. C# is more maintainable than VBA. Even with the best of intentions, VBA code tends to become a bit of a mess - functions grow to several pages, algorithms become enmeshed with code to manipulate the sheet. C# is a well-designed modern object-oriented language. It's the easiest language I've ever come across to just bash code out in, and be pretty sure that you will be able to understand it when you come back in 6 months.
3. A C# library is more manageable than VBA modules. When I start a new spreadsheet, I have one add-in to load, rather than 20 separate VBA modules.

You can download Visual C# 2010 Express from Microsoft at http://www.microsoft.com/express/Downloads/#2010-Visual-CS, which will be sufficient for this article. Standard or Professional editions are recommended for more serious work.

AlgLib

In this article, I'll be looking at accessing numerical routines from the AlgLib library. AlgLib is a neat project run by Sergey Bochkanov and Vladimir Bystritsky, aiming to provide robust implementations of numerical algorithms for several languages, including C#. This is great for several reasons. Having code in the language you are familiar with makes it very easy to integrate into your project or existing codebase. But more importantly, at a deeper level than that, it encourages you to look at the code, to play with it, and to adapt it to your own needs. Numerical algorithms are intrinsically hard - you need a lot of maths to come up with an efficient solver or integration routine - and I'm not anticipating that every one who looks at the code will be able to do that. Nonetheless, the most important property of numerical routines is that they be robust. There is nothing worse than running a solver for a day, just to find it has crashed, or given you, very precisely, the wrong solution. And the way numerical routines become robust is by people pounding on them, and fixing edge cases. And for this, AlgLib is a much better base for the 2010s, than JPL's excellent, but nearly dead, Fortran routines - code may not decay, but code ecosystems seem to, and there aren't so many Fortran programmers left.

I should mention the other open source numerical libraries. GSL is robust, mainly written in C, which can make it a challenge to integrate, memory-management-wise. Boost (C++) now has some mathematical libraries, although fewer, and less advanced, but its coverage of statistical distributions is quite complete. On the commercial side, IMSL and NAG are well recognized as robust numerical libraries, with hefty price tags. If there is interest, I can cover how I expose C++ libraries to Excel in another post.

You can download the latest version of AlgLib from their website at http://www.alglib.net/.

ExcelDna

I'll be using ExcelDna to expose C# functions to Excel. ExcelDna is a magical, but under-publicized library, which can be used to automatically expose all public static functions in a .NET assembly to Excel through an xll addin. Everyone I have recommended ExcelDna to has either immediately switched to using it, or lamented (hard) the fact that for one reason or another they were unable to immediately switch to it.

You can download the latest version of ExcelDna from http://exceldna.codeplex.com/.

How to put it all together

The first thing to do is to expose C# functions to Excel. To do that, follow these instructions:

* Open Visual C# 2010 Express, and select start a new project from the start page. In the new project dialog box, select the Class Library template and enter "AwesomeNumericalLibrary" as the name at the bottom of the dialog.
* Put the cursor in the name of the class, "Class1", and hit F2 to rename the class to "ExcelFunctions". You should also rename the files. This is personal choice, but my preference is to have the functions I expose to Excel to be in a single file, which only contains functions exposed to Excel, rather than intermingled with other code. As the library grows, I tend to split the file based on categories of function eg statistical distributions in one file, interpolation routines in another.
* Add a simple function, so we can test that we can expose functions to Excel. Again, as a personal choice I tend to begin all Excel functions in a single library with a common prefix. This allows them to be easily found using Excel's Insert Function dialog or intellitype.
view plaincopy to clipboardprint?
1. public static double AwesomeLifeUniverseEverything()
2. {
3. return 42;
4. }

public static double AwesomeLifeUniverseEverything()
{
return 42;
}


* From the ExcelDna zip file, add ExcelDna-0.25\Distribution\ExcelDna.xll to your project. You should add it to the project root itself, rather than a sub-folder. This can be done by dragging the ExcelDna.xll file from a Windows Explorer window onto the text AwesomeNumericalLibrary in the Visual C# Solution Explorer.
* In the solution explorer, rename ExcelDna.xll to AwesomeNumericalLibrary.xll.
* Add a new text file to the project root, and rename it to AwesomeNumericalLibrary.dna. The xll file looks for a dna file with a similar name to tell it what .NET assemblies it should load and expose functions to Excel from. In our case, there will be a single dll, so add the following text to AwesomeNumericalLibrary.dna:
view plaincopy to clipboardprint?
1.
2.
3.






* You must tell Excel to copy the .dna and .xll files to the output directory when it builds the project. To do this, select the two files in the solution explorer. Then right click and select properties from the context menu. In the properties dock, change the Copy To Output Directory setting from "Do not copy" to "Copy always".
* At the time of writing, ExcelDna does not support .NET 4.0 assemblies. To fix this, right click on the project in the solution explorer and hit properties. In the properties window, change Target Framework to ".NET Framework 3.5". There will be a warning message about the CSharp reference. Expand the References section for the project in the solution explorer, and remove the reference to CSharp.
* Now we are ready to test whether our function is accessible from Excel, so save the project (easiest way seems to be to close and re-open it!) and build the project (by hitting F6).
* Fire up Excel, and open the Add-Ins dialog box. In Excel 2003 and older, it is Tools..Addins. If you are using Excel 2007 or newer, the menus are hidden - the magic key combo to get to the Add-Ins dialog box is "Alt-T, I" (it is also available through Excel options).
* Click Browse, and then navigate to where you saved the project, and then within the "bin\Release" directory, select AwesomeNumericalRecipes.xll and hit Open.
* In a cell in the sheet, type "=AwesomeLifeUniverseEverything(10)" and verify that you get the result 42.

Well done! You have just exposed your first C# function to Excel.

Now to incorporate AlgLib into our project, and expose a function to Excel:

* Unfortunately AlgLib does not have a project file, so we have to create our own. Right click the solution in the solution explorer, and click Add..New Project. In the Add New Project dialog, select Class Library and enter the name "AlgLib".
* Delete the class file, Class1.cs, that is created by default.
* From the AlgLib zip file, add all the .cs files in the csharp\src directory to the AlgLib project. You can do this by selecting the files in Windows Explorer and dragging them to the AlgLib project in the Visual Studio Solution Explorer.
* Again, you must tell Visual Studio you want a .NET Framework 3.5 library, using the steps above on the AlgLib project.
* Add a reference to the AlgLib project in the AwesomeNumericalLibrary project. In the Solution Explorer, expand the AwesomeNumericalLibrary project. Right-click References, and click Add Reference. In the Add Reference dialog, select the Projects tab, select the AlgLib project and click OK.
* Add a function to the ExcelFunctions class in the AwesomeNumericalLibrary project. I've chosen here to expose Dawson's integral:
view plaincopy to clipboardprint?
1. public static double AwesomeDawson(double x)
2. {
3. return alglib.dawson.dawsonintegral(x);
4. }

public static double AwesomeDawson(double x)
{
return alglib.dawson.dawsonintegral(x);
}


* Build the solution (hit F6). The AlgLib project may generate many warnings. You may get an error that AwesomeNumericalLibrary.dll is open by another process and cannot be written. If this is the case, then open up the Add-Ins dialog in Excel, and remove the tick from by "AwesomeNumericalLibrary" to unload the dll from Excel. Occasionally you may also have to restart Excel.
* Go back to Excel and reload the library. It should remain in the Addins dialog, so you can just re-select it. Then in a cell, type "=AwesomeDawson(2)". It should give you the result 0.30134.

Congratulations! You have just exposed a numerical routine from AlgLib to Excel.

AlgLib has a large number of functions, so I'll let you expose the rest as you need. I've chosen a pretty simple function to expose here, but ExcelDna also makes it possible to take ranges as inputs, and give ranges as outputs using array formulae. This can be useful for linear algebra, or generating large amounts of random data. Error handling is another issue - this function takes one number and always returns a result, but not so with all functions. If you're interested, let me know, and I'll cover these issues in a future blog post.

How to Distribute

To distribute an Excel add-in you have made using ExcelDna you must distribute the (renamed) ExcelDna.dll file, your .dna file, and any .dll files that your add-in is using directly or indirectly.

AlgLib is distributed under a GPL license, so if you distribute your add-in, you must also make the source code available under the GPL license. ExcelDna is distributed under a more permissive license. Please check for yourself that you are complying with the respective projects' license terms.

Further steps

There are several directions you can take from this quick tutorial. I'd recommend looking through the ExcelDna documentation a little. ExcelDna is able to take object arrays and other types as parameters, and to return them as results, which can be powerful. You can also use attributes on your functions to better control how they work, and how they appear in Excel.

You might want to look at exposing other libraries or functionality to Excel. Some things I've done that were quite fun and useful were a web scraper, which allows me to suck out values from a live web page using XPath right in the Excel formula, and more targetted add-ins for Yahoo finance and Betfair.

No comments:

Post a Comment