Chemical Feature Engineering and Defect-Aware Structural Fingerprint Representations for Complex Defects in 2D Materials
| dc.contributor.author | Na Talang C. | |
| dc.contributor.author | Kesorn A. | |
| dc.contributor.author | Cholsuk C. | |
| dc.contributor.author | Vogl T. | |
| dc.contributor.author | Hunkao R. | |
| dc.contributor.author | Sinsarp A. | |
| dc.contributor.author | Suwanna S. | |
| dc.contributor.author | Yuma S. | |
| dc.contributor.correspondence | Na Talang C. | |
| dc.contributor.other | Mahidol University | |
| dc.date.accessioned | 2026-02-28T18:25:37Z | |
| dc.date.available | 2026-02-28T18:25:37Z | |
| dc.date.issued | 2026-02-23 | |
| dc.description.abstract | Designing descriptors for multiple defects in two-dimensional materials is challenging due to the diverse local atomic environments created by different defect types and arrangements. Existing physics-informed descriptors struggle to distinguish distinct defect configurations with identical composition, while deep learning models, though powerful, require large data sets and are less interpretable. In this work, we address this limitation by engineering chemical descriptors and constructing structural features from nearest-neighbor distributions provided by the classical force-field-inspired descriptors (CFID). We show that our engineering method, combined with defect-aware structural features derived from the Hellinger distance, even excluding the full distribution features, improves data point discrimination in high-dimensional feature space while reducing the number of features by 50%. In predicting formation energy per defect site, this extended feature set balances reliance on a few dominant features, enhancing model interpretation and generalization at the cost of a marginal 10% increase in prediction error compared to baseline descriptors. This generalization capability is empirically validated on an external out-of-distribution data set of bulk hBN defects, where our model exhibits lower uncertainty and superior stability within the applicable physical domain (− 1 < E<inf>f</inf> < 5 eV). However, predicting a highly complex and nonlinear target, such as the HOMO–LUMO gap, remains challenging, as none of our extensions outperform the baseline. This physics-informed approach offers an interpretable and computationally efficient alternative to deep learning models, providing new insights into defect representations in 2D materials and serving as a tool for the high-throughput prescreening of stable defect candidates prior to expensive first-principles calculations. | |
| dc.identifier.citation | Journal of Chemical Information and Modeling Vol.66 No.4 (2026) , 2017-2029 | |
| dc.identifier.doi | 10.1021/acs.jcim.5c02100 | |
| dc.identifier.eissn | 1549960X | |
| dc.identifier.issn | 15499596 | |
| dc.identifier.scopus | 2-s2.0-105030656731 | |
| dc.identifier.uri | https://repository.li.mahidol.ac.th/handle/123456789/115455 | |
| dc.rights.holder | SCOPUS | |
| dc.subject | Chemical Engineering | |
| dc.subject | Chemistry | |
| dc.subject | Computer Science | |
| dc.subject | Social Sciences | |
| dc.title | Chemical Feature Engineering and Defect-Aware Structural Fingerprint Representations for Complex Defects in 2D Materials | |
| dc.type | Article | |
| mu.datasource.scopus | https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=105030656731&origin=inward | |
| oaire.citation.endPage | 2029 | |
| oaire.citation.issue | 4 | |
| oaire.citation.startPage | 2017 | |
| oaire.citation.title | Journal of Chemical Information and Modeling | |
| oaire.citation.volume | 66 | |
| oairecerif.author.affiliation | Technische Universität München | |
| oairecerif.author.affiliation | Trinity College Dublin | |
| oairecerif.author.affiliation | Friedrich-Schiller-Universität Jena | |
| oairecerif.author.affiliation | Faculty of Science, Mahidol University | |
| oairecerif.author.affiliation | Munich Center for Quantum Science and Technology (MCQST) |
