A systematic evaluation of grayscale conversion methods for mitigating color variation in deep learning-based histopathological image analysis
Issued Date
2026-04-01
Resource Type
ISSN
22295089
eISSN
21533539
Scopus ID
2-s2.0-105033246608
Journal Title
Journal of Pathology Informatics
Volume
21
Rights Holder(s)
SCOPUS
Bibliographic Citation
Journal of Pathology Informatics Vol.21 (2026)
Suggested Citation
Srisermphoak N., Amornphimoltham P., Chaisuparat R., Achararit P., Fuangrod T. A systematic evaluation of grayscale conversion methods for mitigating color variation in deep learning-based histopathological image analysis. Journal of Pathology Informatics Vol.21 (2026). doi:10.1016/j.jpi.2026.100647 Retrieved from: https://repository.li.mahidol.ac.th/handle/123456789/115924
Title
A systematic evaluation of grayscale conversion methods for mitigating color variation in deep learning-based histopathological image analysis
Author's Affiliation
Corresponding Author(s)
Other Contributor(s)
Abstract
The clinical adoption of deep learning (DL) for histopathological image analysis is hindered by performance degradation caused by color variations arising from disparate staining protocols and scanning technologies. As morphological features may effectively provide the diagnostic information in hematoxylin and eosin slides, this study investigated grayscale conversion as an approach to standardize input for DL. We evaluated six grayscale algorithms against RGB across: (1) a single-center baseline, (2) a mixed multicenter training, (3) a cross-scanner generalization test, and (4) a cross-center generalization test. Furthermore, a novel attention-based grayscale conversion method (ACSRM) was investigated. It utilizes the transformer's attention mechanism to preserve critical color information through long-range pixel dependencies. In homogeneous settings, the best-performing grayscale methods achieved performance comparable to RGB (All-class F1 differences: −0.01 to 0.04 and no intersection over union differences). In mixed-center training, at least one of the grayscale algorithms outperformed RGB in every model, with 23 of 30 model combinations exhibiting statistically distinct decision behaviors (Wilcoxon signed-rank test: p < 0.05). Under the distribution-shift scenario, grayscale methods demonstrated better generalization: ACSRM (with Swin-Transformer-base (Swin-B)) outperformed the RGB baseline by 0.31 (0.14 and 0.45) on a specific class in cross-scanner tests, while demonstrating comparable performance on the remaining classes. Similarly, Luster with Swin-B improved F1-scores from 0.50 to 0.78 in cross-center evaluation. Statistical analysis confirmed significant differences in predictive behavior for these combinations on (3) and (4) (McNemar's Test: p < 0.05). Overall, ACSRM and Luster emerged as the most effective strategies for enhancing DL generalization, facilitating reliable clinical deployment.
