| Abstrak/Abstract |
Accuracy improvement of classification model becomes
main research objective in various fields. Selecting
important features and removing outliers of a dataset are
two effective solutions for improving model accuracy.
Information Gain is one of the feature selection methods
that can be considered as a solution for selecting important
features of a dataset. Information Gain selects the variable
that maximizes the information gain, which in turn
minimizes the entropy and best splits the dataset into
groups for effective classification. Aside of selecting
important feature, removing outlier is also necessary for
improving accuracy of the classification model.
Density-Based Spatial Clustering of Applications with
Noise (DBSCAN) is one of the powerful outlier removal
methods which can identify with significant accuracy the
clusters of random shape and size in large databases
corrupted with noise. Therefore, in this study, we propose
the accuracy improvement of heart disease classification
model using Information Gain and DBSCAN applied to
various machine learning algorithms. One publicly
available heart disease dataset (Cleveland) is utilized in
this study to build the classification model. The results
showed that after implementing Information Gain, the
accuracy of the model applied to Gaussian Naïve Bayes,
Logistic Regression, Multi-Layer Perceptron, Support
Vector Machine, Decision Tree, Random Forest, and
Extreme Gradient Boosting algorithms increases as much
as 1.31% in average. The accuracy also increases when
DBSCAN is applied to the model after utilizing Information
Gain, with the number of improvements is around 0.62%. |