Publication: Predicting formation of haloacetic acids by chlorination of organic compounds using machine-learning-assisted quantitative structure-activity relationships
Issued Date
2020-01-01
Resource Type
ISSN
18733336
03043894
03043894
Other identifier(s)
2-s2.0-85096010084
Rights
Mahidol University
Rights Holder(s)
SCOPUS
Bibliographic Citation
Journal of Hazardous Materials. (2020)
Suggested Citation
José Andrés Cordero, Kai He, Kanjira Janya, Shinya Echigo, Sadahiko Itoh Predicting formation of haloacetic acids by chlorination of organic compounds using machine-learning-assisted quantitative structure-activity relationships. Journal of Hazardous Materials. (2020). doi:10.1016/j.jhazmat.2020.124466 Retrieved from: https://repository.li.mahidol.ac.th/handle/20.500.14594/60481
Research Projects
Organizational Units
Authors
Journal Issue
Thesis
Title
Predicting formation of haloacetic acids by chlorination of organic compounds using machine-learning-assisted quantitative structure-activity relationships
Other Contributor(s)
Abstract
© 2020 Elsevier B.V. The presence of disinfection byproducts (DBPs) in drinking water is a major public health concern, and an effective strategy to limit the formation of these DBPs is to prevent their precursors. In silico prediction from chemical structure would allow rapid identification of precursors and could be used as a prescreening tool to prioritize testing. We present models using machine learning algorithms (i.e., support vector regressor, random forest regressor, and multilayer perceptron regressor) and chemical descriptors as features to predict the formation of haloacetic acids (HAAs). A robust model with good predictivity (i.e., leave-one-out cross-validated Q2 > 0.5) to predict the formation of trichloroacetic acid (TCAA) was developed using a random forest regressor. The number of aromatic bonds, hydrophilicity, and electrotopological descriptors related to electrostatic interactions and the atomic distribution of electronegativity were identified as important predictors of TCAA formation potentials (FPs). However, the prediction of dichloroacetic acid was less accurate, which is congruent with the presence of different types of precursors exhibiting distinct mechanisms. This study demonstrates that nonlinear combinations of general chemical descriptors can adequately estimate HAAFPs, and we hope that our study can be used to predict precursors of other disinfection byproducts based on chemical structures using a similar workflow.