- Anomaly Detection using Unsupervised Machine Learning
- Using PCA
1.Goal:-
In this challenge, we will use Multivariate and Unsupervised Machine Learning Algorithm, Isolation Forest, for detecting outliers in our unlabelled data. Moreover, we will combine Isolation Forest with PCA technique for reducing the dimension of our data while keeping as much as possible information and for visualizing outliers in 2D and in 3D.
It’s known that most outlier detection systems are effective in 99% cases. For the remaining cases the model either inaccurately labels the observation as outlier while it is not an outlier (False Positive) and in other cases the model inaccurately labels the observation as not an outlier while it is actually an outlier (False Negative). Which of these two type of mistakes is more dangerous depends on the use-case and what are the consequences of making these mistakes. For example in Fraud Detection high FP might significantly affect the customer satisfaction hence one might want to use more complex techniques such as Deep Learning techniques (RNN with LSTMs) to avoid making many FP