A systematic evaluation of grayscale conversion methods for mitigating color variation in deep learning-based histopathological image analysis

dc.contributor.authorSrisermphoak N.
dc.contributor.authorAmornphimoltham P.
dc.contributor.authorChaisuparat R.
dc.contributor.authorAchararit P.
dc.contributor.authorFuangrod T.
dc.contributor.correspondenceSrisermphoak N.
dc.contributor.otherMahidol University
dc.date.accessioned2026-03-31T18:25:41Z
dc.date.available2026-03-31T18:25:41Z
dc.date.issued2026-04-01
dc.description.abstractThe clinical adoption of deep learning (DL) for histopathological image analysis is hindered by performance degradation caused by color variations arising from disparate staining protocols and scanning technologies. As morphological features may effectively provide the diagnostic information in hematoxylin and eosin slides, this study investigated grayscale conversion as an approach to standardize input for DL. We evaluated six grayscale algorithms against RGB across: (1) a single-center baseline, (2) a mixed multicenter training, (3) a cross-scanner generalization test, and (4) a cross-center generalization test. Furthermore, a novel attention-based grayscale conversion method (ACSRM) was investigated. It utilizes the transformer's attention mechanism to preserve critical color information through long-range pixel dependencies. In homogeneous settings, the best-performing grayscale methods achieved performance comparable to RGB (All-class F1 differences: −0.01 to 0.04 and no intersection over union differences). In mixed-center training, at least one of the grayscale algorithms outperformed RGB in every model, with 23 of 30 model combinations exhibiting statistically distinct decision behaviors (Wilcoxon signed-rank test: p < 0.05). Under the distribution-shift scenario, grayscale methods demonstrated better generalization: ACSRM (with Swin-Transformer-base (Swin-B)) outperformed the RGB baseline by 0.31 (0.14 and 0.45) on a specific class in cross-scanner tests, while demonstrating comparable performance on the remaining classes. Similarly, Luster with Swin-B improved F1-scores from 0.50 to 0.78 in cross-center evaluation. Statistical analysis confirmed significant differences in predictive behavior for these combinations on (3) and (4) (McNemar's Test: p < 0.05). Overall, ACSRM and Luster emerged as the most effective strategies for enhancing DL generalization, facilitating reliable clinical deployment.
dc.identifier.citationJournal of Pathology Informatics Vol.21 (2026)
dc.identifier.doi10.1016/j.jpi.2026.100647
dc.identifier.eissn21533539
dc.identifier.issn22295089
dc.identifier.scopus2-s2.0-105033246608
dc.identifier.urihttps://repository.li.mahidol.ac.th/handle/123456789/115924
dc.rights.holderSCOPUS
dc.subjectComputer Science
dc.subjectMedicine
dc.titleA systematic evaluation of grayscale conversion methods for mitigating color variation in deep learning-based histopathological image analysis
dc.typeArticle
mu.datasource.scopushttps://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=105033246608&origin=inward
oaire.citation.titleJournal of Pathology Informatics
oaire.citation.volume21
oairecerif.author.affiliationChulalongkorn University
oairecerif.author.affiliationMahidol University, Faculty of Dentistry
oairecerif.author.affiliationChulabhorn Royal Academy

Files

Collections