Integrating machine learning models with cross-validation and bootstrapping for evaluating groundwater quality in Kanchanaburi province, Thailand
Issued Date
2024-07-01
Resource Type
ISSN
00139351
eISSN
10960953
Scopus ID
2-s2.0-85190939580
Pubmed ID
38636644
Journal Title
Environmental Research
Volume
252
Rights Holder(s)
SCOPUS
Bibliographic Citation
Environmental Research Vol.252 (2024)
Suggested Citation
Thanh N.N., Chotpantarat S., Ngu N.H., Thunyawatcharakul P., Kaewdum N. Integrating machine learning models with cross-validation and bootstrapping for evaluating groundwater quality in Kanchanaburi province, Thailand. Environmental Research Vol.252 (2024). doi:10.1016/j.envres.2024.118952 Retrieved from: https://repository.li.mahidol.ac.th/handle/20.500.14594/98184
Title
Integrating machine learning models with cross-validation and bootstrapping for evaluating groundwater quality in Kanchanaburi province, Thailand
Corresponding Author(s)
Other Contributor(s)
Abstract
Exploring the potential of new models for mapping groundwater quality presents a major challenge in water resource management, particularly in Kanchanaburi Province, Thailand, where groundwater faces contamination risks. This study aimed to explore the applicability of random forest (RF) and artificial neural networks (ANN) models to predict groundwater quality. Particularly, these two models were integrated into cross-validation (CV) and bootstrapping (B) techniques to build predictive models, including RF-CV, RF-B, ANN-CV, and ANN-B. Entropy groundwater quality index (EWQI) was converted to normalized EWQI which was then classified into five levels from very poor to very good. A total of twelve physicochemical parameters from 180 groundwater wells, including potassium, sodium, calcium, magnesium, chloride, sulfate, bicarbonate, nitrate, pH, electrical conductivity, total dissolved solids, and total hardness, were investigated to decipher groundwater quality in the eastern part of Kanchanaburi Province, Thailand. Our results indicated that groundwater quality in the study area was primarily polluted by calcium, magnesium, and bicarbonate and that the RF-CV model (RMSE = 0.06, R2 = 0.87, MAE = 0.04) outperformed the RF-B (RMSE = 0.07, R2 = 0.80, MAE = 0.04), ANN-CV (RMSE = 0.09, R2 = 0.70, MAE = 0.06), and ANN-B (RMSE = 0.10, R2 = 0.67, MAE = 0.06). Our findings highlight the superiority of the RF models over the ANN models based on the CV and B techniques. In addition, the role of groundwater parameters to the normalized EWQI in various machine learning models was found. The groundwater quality map created by the RF-CV model can be applied to orient groundwater use.