Publication:
A semi-supervised learning framework for quantitative structure-activity regression modelling

dc.contributor.authorOliver Watsonen_US
dc.contributor.authorIsidro Cortes-Cirianoen_US
dc.contributor.authorJames A. Watsonen_US
dc.contributor.otherFaculty of Tropical Medicine, Mahidol Universityen_US
dc.contributor.otherUniversity of Cambridgeen_US
dc.contributor.otherNuffield Department of Medicineen_US
dc.contributor.otherEvariste Technologies Ltden_US
dc.date.accessioned2022-08-04T08:10:09Z
dc.date.available2022-08-04T08:10:09Z
dc.date.issued2021-04-20en_US
dc.description.abstractMOTIVATION: Quantitative structure-activity relationship (QSAR) methods are increasingly used in assisting the process of preclinical, small molecule drug discovery. Regression models are trained on data consisting of a finite-dimensional representation of molecular structures and their corresponding target-specific activities. These supervised learning models can then be used to predict the activity of previously unmeasured novel compounds. RESULTS: This work provides methods that solve three problems in QSAR modelling: (i) a method for comparing the information content between finite-dimensional representations of molecular structures (fingerprints) with respect to the target of interest, (ii) a method that quantifies how the accuracy of the model prediction degrades as a function of the distance between the testing and training data and (iii) a method to adjust for screening dependent selection bias inherent in many training datasets. For example, in the most extreme cases, only compounds which pass an activity-dependent screening threshold are reported. A semi-supervised learning framework combines (ii) and (iii) and can make predictions, which take into account the similarity of the testing compounds to those in the training data and adjust for the reporting selection bias. We illustrate the three methods using publicly available structure-activity data for a large set of compounds reported by GlaxoSmithKline (the Tres Cantos AntiMalarial Set, TCAMS) to inhibit asexual in vitro Plasmodium falciparum growth. AVAILABILITYAND IMPLEMENTATION: https://github.com/owatson/PenalizedPrediction. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.en_US
dc.identifier.citationBioinformatics (Oxford, England). Vol.37, No.3 (2021), 342-350en_US
dc.identifier.doi10.1093/bioinformatics/btaa711en_US
dc.identifier.issn13674811en_US
dc.identifier.other2-s2.0-85105698719en_US
dc.identifier.urihttps://repository.li.mahidol.ac.th/handle/123456789/76207
dc.rightsMahidol Universityen_US
dc.rights.holderSCOPUSen_US
dc.source.urihttps://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85105698719&origin=inwarden_US
dc.subjectBiochemistry, Genetics and Molecular Biologyen_US
dc.subjectComputer Scienceen_US
dc.subjectMathematicsen_US
dc.titleA semi-supervised learning framework for quantitative structure-activity regression modellingen_US
dc.typeArticleen_US
dspace.entity.typePublication
mu.datasource.scopushttps://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85105698719&origin=inwarden_US

Files

Collections