A decision-theoretic approach to the evaluation of machine learning algorithms in computational drug discovery

Oliver P. Watson; Isidro Cortes-Ciriano; Aimee R. Taylor; James A. Watson

Publication:
A decision-theoretic approach to the evaluation of machine learning algorithms in computational drug discovery

dc.contributor.author	Oliver P. Watson	en_US
dc.contributor.author	Isidro Cortes-Ciriano	en_US
dc.contributor.author	Aimee R. Taylor	en_US
dc.contributor.author	James A. Watson	en_US
dc.contributor.other	University of Cambridge	en_US
dc.contributor.other	Boston University School of Public Health	en_US
dc.contributor.other	Mahidol University	en_US
dc.contributor.other	Nuffield Department of Clinical Medicine	en_US
dc.contributor.other	Broad Institute	en_US
dc.contributor.other	Evariste Technologies Ltd.	en_US
dc.date.accessioned	2020-01-27T07:37:15Z
dc.date.available	2020-01-27T07:37:15Z
dc.date.issued	2019-11-01	en_US
dc.description.abstract	© The Author(s) 2019. Published by Oxford University Press. Artificial intelligence, trained via machine learning (e.g. neural nets, random forests) or computational statistical algorithms (e.g. support vector machines, ridge regression), holds much promise for the improvement of small-molecule drug discovery. However, small-molecule structure- activity data are high dimensional with low signal-to-noise ratios and proper validation of predictive methods is difficult. It is poorly understood which, if any, of the currently available machine learning algorithms will best predict new candidate drugs. Results: The quantile-activity bootstrap is proposed as a new model validation framework using quantile splits on the activity distribution function to construct training and testing sets. In addition, we propose two novel rank-based loss functions which penalize only the out-of-sample predicted ranks of high-activity molecules. The combination of these methods was used to assess the performance of neural nets, random forests, support vector machines (regression) and ridge regression applied to 25 diverse high-quality structure-activity datasets publicly available on ChEMBL. Model validation based on random partitioning of available data favours models that overfit and 'memorize' the training set, namely random forests and deep neural nets. Partitioning based on quantiles of the activity distribution correctly penalizes extrapolation of models onto structurally different molecules outside of the training data. Simpler, traditional statistical methods such as ridge regression can outperform state-of-the-art machine learning methods in this setting. In addition, our new rank-based loss functions give considerably different results from mean squared error highlighting the necessity to define model optimality with respect to the decision task at hand.	en_US
dc.identifier.citation	Bioinformatics. Vol.35, No.22 (2019), 4656-4663	en_US
dc.identifier.doi	10.1093/bioinformatics/btz293	en_US
dc.identifier.issn	14602059	en_US
dc.identifier.issn	13674803	en_US
dc.identifier.other	2-s2.0-85074962955	en_US
dc.identifier.uri	https://repository.li.mahidol.ac.th/handle/123456789/50051
dc.rights	Mahidol University	en_US
dc.rights.holder	SCOPUS	en_US
dc.source.uri	https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85074962955&origin=inward	en_US
dc.subject	Biochemistry, Genetics and Molecular Biology	en_US
dc.subject	Computer Science	en_US
dc.subject	Mathematics	en_US
dc.title	A decision-theoretic approach to the evaluation of machine learning algorithms in computational drug discovery	en_US
dc.type	Article	en_US
dspace.entity.type	Publication
mu.datasource.scopus	https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85074962955&origin=inward	en_US

Collections

Scopus 2019

	Office Hour: Monday-Friday 08.30-12.00 and 13.00-16.30 hrs.
	Phutthamonthon Sai 4 Rd. Salaya, Nakhon Pathom 73170, Thailand
	The office: +66 (2) 800 2680 ext.4306
	thipsuda.van@mahidol.ac.th
	https://repository.li.mahidol.ac.th

Publication: A decision-theoretic approach to the evaluation of machine learning algorithms in computational drug discovery

Files

Collections

Publication:
A decision-theoretic approach to the evaluation of machine learning algorithms in computational drug discovery