Developing high-dimensional machine learning models to improve generalization ability and overcome data insufficiency for mixed sugar fermentation simulation
Issued Date
2023-10-01
Resource Type
ISSN
09608524
eISSN
18732976
Scopus ID
2-s2.0-85164486808
Pubmed ID
37352987
Journal Title
Bioresource Technology
Volume
385
Rights Holder(s)
SCOPUS
Bibliographic Citation
Bioresource Technology Vol.385 (2023)
Suggested Citation
Huang X.Y., Ao T.J., Zhang X., Li K., Zhao X.Q., Champreda V., Runguphan W., Sakdaronnarong C., Liu C.G., Bai F.W. Developing high-dimensional machine learning models to improve generalization ability and overcome data insufficiency for mixed sugar fermentation simulation. Bioresource Technology Vol.385 (2023). doi:10.1016/j.biortech.2023.129375 Retrieved from: https://repository.li.mahidol.ac.th/handle/20.500.14594/88014
Title
Developing high-dimensional machine learning models to improve generalization ability and overcome data insufficiency for mixed sugar fermentation simulation
Other Contributor(s)
Abstract
Biorefinery can be promoted by building accurate machine learning models. This work proposed a strategy to enhance model's generalization ability and overcome insufficient data conditions for mixed sugar fermentation simulation. Multiple inputs single output models, using initial glucose, initial xylose, and time together as inputs, have higher generalization ability than single input single output models with time as sole input in predicting glucose, xylose, ethanol, or biomass separately. Multiple inputs multiple outputs models, integrating outputs, enhanced model accuracy and resulted in an average R2 at 0.99. To overcome data insufficiency conditions, consensus yeast (CY) model, through consolidating data from 4 yeasts, obtained R2 at 0.90. By adjusting the pretrained CY model, the model can save more than 50% data and get R2 at 0.95 and 0.93 for yeast and bacterial fermentation simulation. The strategy can expand the application range and save costs of data curation for ANN models.