Publication:
Unraveling the origin of splice switching activity of hemoglobin β-globin gene modulators via QSAR modeling

dc.contributor.authorSaw Simeonen_US
dc.contributor.authorRickard Mölleren_US
dc.contributor.authorDaniel Almgrenen_US
dc.contributor.authorHao Lien_US
dc.contributor.authorChuleeporn Phanus-umpornen_US
dc.contributor.authorVirapong Prachayasittikulen_US
dc.contributor.authorLeif Bülowen_US
dc.contributor.authorChanin Nantasenamaten_US
dc.contributor.otherMahidol Universityen_US
dc.contributor.otherLunds Universiteten_US
dc.date.accessioned2018-12-11T02:30:22Z
dc.date.accessioned2019-03-14T08:04:23Z
dc.date.available2018-12-11T02:30:22Z
dc.date.available2019-03-14T08:04:23Z
dc.date.issued2016-02-15en_US
dc.description.abstract© 2015 Elsevier B.V. β-Thalassemia is a blood disease caused by a mutation in the second intron of the β-globin gene of hemoglobin that leads to abnormal hemoglobin production. Low molecular weight compounds have been proposed to modulate defective splicing by binding unwanted splicing sites, thereby restoring correct splicing. This study investigates the origin of this splice switching activity in a set of 39 active and 61,000 inactive compounds. The K-means algorithm was applied to the inactive compound points with 39 clusters, in which a point from each cluster was selected to create a balanced data set of 39 active and inactive compounds. To avoid random bias, predictive models (i.e., decision tree (DT), random forest (RF), artificial neural network (ANN), partial least squares discriminant analysis (PLS-DA) and support vector machine (SVM)) were constructed 50 times. The performances of the predictive models were statistically assessed in terms of accuracy, sensitivity, specificity and Matthews correlation coefficient (MCC). RF provided an accuracy of 89.50 ± 13.45, sensitivity of 94.97 ± 13.49, specificity of 84.29. ± 22.27, and MCC of 0.80 ± 0.25 for 10-fold CV, and it provided and accuracy of 88.00 ± 8.55, sensitivity of 87.89 ± 13.93, specificity of 87.51 ± 13.75, and MCC of 0.75 ± 0.18 for external testing. Taking advantage of the built-in feature selector of RF, a thorough analysis of feature importance was conducted. Newly identified fingerprint substructures, namely, three carbon-hetero bonds (i.e., secondary amide, tertiary amide, carboxyl derivative, carboxylic acid derivative and nitrile), carbon-carbon bonds (i.e., primary carbon, secondary carbon and alkene), aromatics (hetero N nonbasic) and carbon-hetero bond (alkyl aryl ether), may provide a better understanding of the structural variations governing the splice switching activity of the hemoglobin β-globin gene.en_US
dc.identifier.citationChemometrics and Intelligent Laboratory Systems. Vol.151, (2016), 51-60en_US
dc.identifier.doi10.1016/j.chemolab.2015.12.002en_US
dc.identifier.issn18733239en_US
dc.identifier.issn01697439en_US
dc.identifier.other2-s2.0-84951781197en_US
dc.identifier.urihttps://repository.li.mahidol.ac.th/handle/20.500.14594/43333
dc.rightsMahidol Universityen_US
dc.rights.holderSCOPUSen_US
dc.source.urihttps://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=84951781197&origin=inwarden_US
dc.subjectChemical Engineeringen_US
dc.subjectChemistryen_US
dc.subjectComputer Scienceen_US
dc.titleUnraveling the origin of splice switching activity of hemoglobin β-globin gene modulators via QSAR modelingen_US
dc.typeArticleen_US
dspace.entity.typePublication
mu.datasource.scopushttps://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=84951781197&origin=inwarden_US

Files

Collections