Chemical Feature Engineering and Defect-Aware Structural Fingerprint Representations for Complex Defects in 2D Materials

dc.contributor.authorNa Talang C.
dc.contributor.authorKesorn A.
dc.contributor.authorCholsuk C.
dc.contributor.authorVogl T.
dc.contributor.authorHunkao R.
dc.contributor.authorSinsarp A.
dc.contributor.authorSuwanna S.
dc.contributor.authorYuma S.
dc.contributor.correspondenceNa Talang C.
dc.contributor.otherMahidol University
dc.date.accessioned2026-02-28T18:25:37Z
dc.date.available2026-02-28T18:25:37Z
dc.date.issued2026-02-23
dc.description.abstractDesigning descriptors for multiple defects in two-dimensional materials is challenging due to the diverse local atomic environments created by different defect types and arrangements. Existing physics-informed descriptors struggle to distinguish distinct defect configurations with identical composition, while deep learning models, though powerful, require large data sets and are less interpretable. In this work, we address this limitation by engineering chemical descriptors and constructing structural features from nearest-neighbor distributions provided by the classical force-field-inspired descriptors (CFID). We show that our engineering method, combined with defect-aware structural features derived from the Hellinger distance, even excluding the full distribution features, improves data point discrimination in high-dimensional feature space while reducing the number of features by 50%. In predicting formation energy per defect site, this extended feature set balances reliance on a few dominant features, enhancing model interpretation and generalization at the cost of a marginal 10% increase in prediction error compared to baseline descriptors. This generalization capability is empirically validated on an external out-of-distribution data set of bulk hBN defects, where our model exhibits lower uncertainty and superior stability within the applicable physical domain (− 1 < E<inf>f</inf> < 5 eV). However, predicting a highly complex and nonlinear target, such as the HOMO–LUMO gap, remains challenging, as none of our extensions outperform the baseline. This physics-informed approach offers an interpretable and computationally efficient alternative to deep learning models, providing new insights into defect representations in 2D materials and serving as a tool for the high-throughput prescreening of stable defect candidates prior to expensive first-principles calculations.
dc.identifier.citationJournal of Chemical Information and Modeling Vol.66 No.4 (2026) , 2017-2029
dc.identifier.doi10.1021/acs.jcim.5c02100
dc.identifier.eissn1549960X
dc.identifier.issn15499596
dc.identifier.scopus2-s2.0-105030656731
dc.identifier.urihttps://repository.li.mahidol.ac.th/handle/123456789/115455
dc.rights.holderSCOPUS
dc.subjectChemical Engineering
dc.subjectChemistry
dc.subjectComputer Science
dc.subjectSocial Sciences
dc.titleChemical Feature Engineering and Defect-Aware Structural Fingerprint Representations for Complex Defects in 2D Materials
dc.typeArticle
mu.datasource.scopushttps://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=105030656731&origin=inward
oaire.citation.endPage2029
oaire.citation.issue4
oaire.citation.startPage2017
oaire.citation.titleJournal of Chemical Information and Modeling
oaire.citation.volume66
oairecerif.author.affiliationTechnische Universität München
oairecerif.author.affiliationTrinity College Dublin
oairecerif.author.affiliationFriedrich-Schiller-Universität Jena
oairecerif.author.affiliationFaculty of Science, Mahidol University
oairecerif.author.affiliationMunich Center for Quantum Science and Technology (MCQST)

Files

Collections