Comparative Analysis of Outlier Elimination Algorithms on Pulsar Stars Dataset

Tejaswini Katale, Waidehi Gautam, Aakash Chotrani, Aishwarya Katale,Parikshit N. Mahalle,Gitanjali R. Shinde

2024 International Conference on Emerging Smart Computing and Informatics (ESCI)(2024)

Cited 0|Views8
No score
Abstract
Outliers are uncommon observations that are dispersed from most of the other Data Points. These points, which can be abnormally high or low, can significantly affect the entire analysis and forecasting of the Data. The purpose of this Research Paper is to examine the implementation of Outlier Elimination Algorithms on Pulsar Stars Dataset. It is attributed as an HTRU2 (High Time Resolution Universe Survey) project, which focuses on identifying new pulsar candidates based on the data from Parkes radio telescope in Australia. On this Dataset, a total of 6 Outlier Elimination Algorithms are implemented; Z-Score, DBSCAN(Density-based spatial clustering of applications with noise), Mahalanobis Distance, Median Absolute Deviation, Multivariate Normal Distribution for N Dimension, Angle-based Outlier Detection (ABOD). After removal of the identified Outliers, to determine which Outlier Technique was more effective, 4 Machine Learning Techniques are deployed - Random Forest, Decision Tree, Naive Bayes and XGBoost. Furthermore, Accuracies, Recall, Precision and F1 Score values were compared. Every deployed Outlier algorithm must have a consistent quantity of Outliers in order to be evaluated for efficacy. For the same around 2600–2800 number of Outliers are eliminated for each Technique. Multivariate Normal Distribution for N Dimension Outlier Algorithm with Random Forest yields Highest Accuracy of 98.2%. ABOD with Naive Bayes gives Highest Recall Value of 0.93, Z-Score with Random Forest results in Highest Precision value of 0.96 and ABOD with Random Forest yields Highest F1-Score value of 0.92. From the information provided above, it is also clear that Random Forest produces the greatest Accuracy, Precision, and F1-Score values among all the Machine Learning algorithms that have been deployed. Therefore, for improved outcomes, the Random Forest technique should be utilized after the outliers are removed.
More
Translated text
Key words
Outliers,Z-Score,DBSCAN (Density-based spatial clustering of applications with noise),Mahalanobis Distance,Median Absolute Deviation (MAD),Multivariate Normal Distribution for N Dimension,Angle-based Outlier Detection (ABOD)
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined