Abstrak/Abstract |
The identification of outliers in data, commonly referred to as outlier detection, is a fundamental step in data
preprocessing. An outlier is an observation that deviates significantly from the majority of the observations. Outliers
can provide new perspectives, whereas existing information derived from data can be recognized through careful and
comprehensive analysis. Outliers in a dataset can substantially impact the findings of data analysis. Several methods
for identifying outliers have been proposed in recent years. This paper presents multivariate outlier identification using
statistics based on Mahalanobis distance for cross-section data and a mean algorithm for spatial data. The statistic,
which is derived based on Mahalanobis distance, will have the property of distribution with degrees of freedom
equal to the number of variables employed. When this statistic exceeds the threshold value of distribution, it
indicates the existence of outliers. Meanwhile, the mean algorithm for spatial outlier detection identifies outliers by
comparing the Mahalanobis distance between a spatial location and the mean value of its nearest neighbors to a
specified threshold. The methods effectively identify multivariate outliers. In this paper, we present empirical examples
of the methods using earthquake data near the Bengkulu Province, Indonesia. |