Chemical Feature Engineering and Defect-Aware Structural Fingerprint Representations for Complex Defects in 2D Materials
Issued Date
2026-02-23
Resource Type
ISSN
15499596
eISSN
1549960X
Scopus ID
2-s2.0-105030656731
Journal Title
Journal of Chemical Information and Modeling
Volume
66
Issue
4
Start Page
2017
End Page
2029
Rights Holder(s)
SCOPUS
Bibliographic Citation
Journal of Chemical Information and Modeling Vol.66 No.4 (2026) , 2017-2029
Suggested Citation
Na Talang C., Kesorn A., Cholsuk C., Vogl T., Hunkao R., Sinsarp A., Suwanna S., Yuma S. Chemical Feature Engineering and Defect-Aware Structural Fingerprint Representations for Complex Defects in 2D Materials. Journal of Chemical Information and Modeling Vol.66 No.4 (2026) , 2017-2029. 2029. doi:10.1021/acs.jcim.5c02100 Retrieved from: https://repository.li.mahidol.ac.th/handle/123456789/115455
Title
Chemical Feature Engineering and Defect-Aware Structural Fingerprint Representations for Complex Defects in 2D Materials
Corresponding Author(s)
Other Contributor(s)
Abstract
Designing descriptors for multiple defects in two-dimensional materials is challenging due to the diverse local atomic environments created by different defect types and arrangements. Existing physics-informed descriptors struggle to distinguish distinct defect configurations with identical composition, while deep learning models, though powerful, require large data sets and are less interpretable. In this work, we address this limitation by engineering chemical descriptors and constructing structural features from nearest-neighbor distributions provided by the classical force-field-inspired descriptors (CFID). We show that our engineering method, combined with defect-aware structural features derived from the Hellinger distance, even excluding the full distribution features, improves data point discrimination in high-dimensional feature space while reducing the number of features by 50%. In predicting formation energy per defect site, this extended feature set balances reliance on a few dominant features, enhancing model interpretation and generalization at the cost of a marginal 10% increase in prediction error compared to baseline descriptors. This generalization capability is empirically validated on an external out-of-distribution data set of bulk hBN defects, where our model exhibits lower uncertainty and superior stability within the applicable physical domain (− 1 < E<inf>f</inf> < 5 eV). However, predicting a highly complex and nonlinear target, such as the HOMO–LUMO gap, remains challenging, as none of our extensions outperform the baseline. This physics-informed approach offers an interpretable and computationally efficient alternative to deep learning models, providing new insights into defect representations in 2D materials and serving as a tool for the high-throughput prescreening of stable defect candidates prior to expensive first-principles calculations.
