| Penulis/Author |
Frendy Jaya Kusuma (1); Eri Widianto (2); Wahyono, Ph.D. (3); Dr. Iman Santoso, S.Si., M.Sc. (4); Prof. Sholihun, S.Si., M.Sc., Ph.D.Sc. (5); Moh. Adhib Ulil Absor, S.Si., M.Sc., Ph.D. (6); Setyawan Purnomo Sakti (7); Prof. Dr. Eng. Kuwat Triyana, M.Si. (8) |
| Abstrak/Abstract |
We developed machine learning (ML) models to classify the nature of the band gap in single and double perovskite materials, specifically whether it is a direct or indirect band gap. A key challenge for ML algorithms in accurately identifying the nature of the band gap stems from the imbalanced data distribution, where indirect band gap perovskites outnumber direct band gap ones by a 25:75 ratio. This study investigates approaches to enhance ML performance in predicting band gap nature through a comprehensive evaluation of feature extraction methods, imbalanced data handling techniques, feature selection, and model selection. Integrating multiple feature extraction techniques, including Meredig, Magpie, and MEGNet, proved to be more effective in boosting model performance than using a single method. Additionally, cost-sensitive learning provided more favorable outcomes than resampling approaches under the tested conditions. Conversely, feature selection methods such as Recursive Feature Elimination, Least Absolute Shrinkage and Selection Operator, and Genetic Algorithm resulted in performance declines. The cost-sensitive Extreme Gradient Boosting model delivered the best performance. It exhibited a precision of 0.846, recall of 0.763, and F1-score of 0.802 when used to predict materials possessing a direct band gap and an overall accuracy of 0.908. The model was finally utilized for high-throughput screening, identifying 2027 direct band gap perovskites from 21,021 formable candidates. |