Financial statement fraud detection using supervised learning methods

Student thesis: Doctoral Thesis


Famous frauds such as at Enron, WorldCom and HealthSouth are potent reminders that the detection of financial statement fraud needs to be improved. Studies estimate the median loss from a single financial statement fraud scheme to be at least one million US dollars. The annual cost of financial statement fraud exceeds 1.2 trillion US dollars worldwide and 377 billion dollars in the US (ACFE, 2014).
Many business decisions rely on the accuracy of financial statements, but resources are not available to comprehensively investigate all of them. Moreover, detection of fraud in financial statements is difficult. Consequently, there is a need for better decision aids such as detection models developed using supervised learning methods. Standard parametric regression-based techniques, particularly logistic regression, have been extensively studied for detecting financial statement fraud. More investigation is needed into non-parametric techniques such as decision trees and ensemble techniques that combine multiple models such as bagging and boosting. Using data about companies listed on US stock exchanges, multiple statistical modelling techniques new to the field are compared with established techniques for detecting this type of fraud. Comparisons are made using a range of ratios for the cost of failing to detect fraud relative to the cost of falsely alleging it, as these costs differ depending on the stakeholder. Newly developed ensemble models that include decision-tree based techniques performed particularly well.
A large number of potential indicators (explanatory variables) of financial statement fraud are investigated in order to study which are the most useful to detection models. These include financial information, non-financial information and comparisons of the two. Empirical support has been found for both financial and non-financial explanatory variables, including new variables. A new framework, the Fraud Detection Triangle, is also developed to assist in the selection of explanatory variables for financial statement fraud detection models. Empirical evidence is provided to support the use of this new framework.
Using models developed in this research, financial statements can be automatically classified as either fraudulent or legitimate, as well as being ranked according to their likelihood of being fraudulent. This information can be used to improve early detection, which would mitigate the costs of fraud and help deter it from occurring by increasing the probability of being detected. Beneficiaries of this information include auditors, investors, financiers, employees, customers, suppliers, regulators, company directors and the financial markets as a whole through improved integrity and allocation of resources.
Date of Award10 Oct 2015
Original languageEnglish
SupervisorKuldeep Kumar (Supervisor) & Sukanto Bhattacharya (Supervisor)

Cite this