The MPA Distinguished Speaker Lyceum is one of the most important traditions in the MPA program. Last Tuesday we hosted Ms. Camille Stovall, a partner at Deloitte and the Chief Operating Officer of Deloitte Financial Advisory Services (FAS). The conversational interview between Ms. Stovall, Professor Steve Limberg, and my fellow MPA students ranged from how to approach difficult restructurings to the importance of analytics. The latter prompted Prof. Limberg to ask just how analytics are used in the real world.
The current popularity of the word “analytics” leaves many accountants unimpressed: isn’t this what we’ve been doing all along? Calculating ratios, sampling, analyzing variances, etc… so we have computers to helps us now, big whoop! Unfortunately this view underestimates the importance of data mining within analytics. It will eventually allow us to quickly sift through large sets of transactions to pinpoint exceptions rather than just substantively testing a sample.
During my industry internship last summer I worked with data mining tools developed by Microsoft as part of SQL Server Analysis Services (SSAS) to find actionable insights in a loan servicing database. Similar tools include SAS, R, and Oracle Data Mining.
The first step to data mining is organizing the data so that it can be fed into fickle algorithms. IT folk who work with data-warehousing and business intelligence call this the Extract, Transform, and Load (ETL) process. While most parts are automated, it often also involves manually cleaning gaps, outliers, and making adjustments with custom SQL queries. It’s hard to understate the importance of properly preparing data as algorithms unavoidably suffer from the “garbage-in, garbage-out” problem.
Now the fun part; choosing which type of algorithm is most appropriate. There are five kinds of algorithms:
- Classification algos are used when you want to make a binary prediction. For example, a forensic accountant could estimate the individual probability of transactions being fraudulent or not. Decision trees learning and the naive Bayes classifier are most commonly used for this purpose.
- Regression algos are ideal for predicting trends. An auditor could use these to assess the risk of a material misstatement by comparing the results from the regressions with the reported numbers. This is especially helpful since these algorithms can take into account seasonality in revenues and expenditures. An obvious choice for this task is a simple linear regression, but time series algos like ARIMA and ARTXP are much more accurate.
- Segmentation algos create groups of datapoints with similarities. One use for this is clustering expense reports, trade confirmations, or invoices so as to identify which groups need more or less sampling according to their risk profile. Segmentation is often done with expectation-maximization (EM), k-means, and my favorite, DBSCAN clustering.
- Association algos like Apriori are most often used on websites like Amazon to suggest items based on your search history and shopping cart. Association rules can also be built to intelligently increase audit samples based on sets of noted exceptions.
- Sequencing algos identify and match patterns in datasets using Markov chains. Originally developed for researching DNA/RNA, I used sequence clustering combined with segmentation for predicting prepayment rates and delinquency curing, this was especially useful for accurately predicting cash flows and changes in the client’s borrowing base.
Choosing the appropriate combination of algorithms is easy compared to fine-tuning the model and interpreting the results. For someone starting out like myself, the only path forward is trial, error, tinkering, and effective communication of (non-)results. Thankfully this is facilitated by techniques like drilling-through, splitting datasets into training and testing groups, lift charts, and visualizations. The latter are the most persuasive when communicating results with management, especially when they are colorful and interactive, but that’s worthy of a post of its own.
Thanks to cloud computing and improved data capture the only bottleneck left in analytics is human expertise. What resources have you found most helpful in improving your data science skills? How do/would you use data mining techniques to find insights in accounting information?