Enron Investigation

Published on 14 Jun 2021. Written by Amitesh Kumar

The mysterious bankruptcy of the Enron Corporation has led to the development of this project. This project is built to investigate this case on the huge data set of this fraud business, which took place in December 2001. The data set mainly comprises the millions of e-mails sent to and from the executives of the company during the year 2000-2002. The nature of emails was reported to be suspicious, and hence it was not possible for anyone to decide nature.

To decide the nature based on the patterns of data led to the need for a machine learning project. The financial information contains a huge of numeric values, which again becomes a tiring job for anyone to classify. A machine learning application will classify the data itself and give the desired output.

Project Implementation

The first step is to explore the huge data which has around 21 variables and 146 observations. The Outlier investigation consists of checking the odd pattern of data like some of the employees were recorded to earn a huge amount of salary. Then we have to create for POI for received and sent emails. Then select the important feature required for observations, which are stock options, shared receipt, loan advance, long term incentive, salary, etc.

The Algorithms which are found perfect for the study of data are Gaussian Naïve, Support vector machine and, Decision Tree Classifier. The most crucial part of machine learning is to tune and implement the algorithm. GridSearchCV tool is used to tune the algorithm, which is provided in Scikit learn. To extract most of the information from the data, a validation strategy is used, such as Nested Stratified Shuffle Cross-Validation.

This method will help us to extract the essential information from all that heap of data. Hyperparameter optimization is the process of optimizing the performance of machine learning using parameter tuning. The cross-validation method will help to cross-check the pattern of data and give the desired results. The tree classifier uses the cross-validation method, which is defined in the tester.py function.

Results and Conclusion

The application will hence be able to classify that huge data which almost 1.67 emails. The data will be processed through the algorithms and methods which will detect the real problem. It will show the odd data, which can be considered as fraud elements, and it can play an important role in the investigation of Enron.

Kit required to develop Enron Investigation:

Technologies you will learn by working on Enron Investigation:

Machine Learning

Enron Investigation

Any Questions?

Subscribe for more project ideas