Na Talang C.Kesorn A.Cholsuk C.Vogl T.Hunkao R.Sinsarp A.Suwanna S.Yuma S.Mahidol University2026-02-282026-02-282026-02-23Journal of Chemical Information and Modeling Vol.66 No.4 (2026) , 2017-202915499596https://repository.li.mahidol.ac.th/handle/123456789/115455Designing descriptors for multiple defects in two-dimensional materials is challenging due to the diverse local atomic environments created by different defect types and arrangements. Existing physics-informed descriptors struggle to distinguish distinct defect configurations with identical composition, while deep learning models, though powerful, require large data sets and are less interpretable. In this work, we address this limitation by engineering chemical descriptors and constructing structural features from nearest-neighbor distributions provided by the classical force-field-inspired descriptors (CFID). We show that our engineering method, combined with defect-aware structural features derived from the Hellinger distance, even excluding the full distribution features, improves data point discrimination in high-dimensional feature space while reducing the number of features by 50%. In predicting formation energy per defect site, this extended feature set balances reliance on a few dominant features, enhancing model interpretation and generalization at the cost of a marginal 10% increase in prediction error compared to baseline descriptors. This generalization capability is empirically validated on an external out-of-distribution data set of bulk hBN defects, where our model exhibits lower uncertainty and superior stability within the applicable physical domain (− 1 < E<inf>f</inf> < 5 eV). However, predicting a highly complex and nonlinear target, such as the HOMO–LUMO gap, remains challenging, as none of our extensions outperform the baseline. This physics-informed approach offers an interpretable and computationally efficient alternative to deep learning models, providing new insights into defect representations in 2D materials and serving as a tool for the high-throughput prescreening of stable defect candidates prior to expensive first-principles calculations.Chemical EngineeringChemistryComputer ScienceSocial SciencesChemical Feature Engineering and Defect-Aware Structural Fingerprint Representations for Complex Defects in 2D MaterialsArticleSCOPUS10.1021/acs.jcim.5c021002-s2.0-1050306567311549960X