Direct inference and control of genetic population structure from RNA sequencing data
Issued Date
2023-08-02
Resource Type
eISSN
23993642
Scopus ID
2-s2.0-85166425755
Pubmed ID
37532769
Journal Title
Communications biology
Volume
6
Issue
1
Rights Holder(s)
SCOPUS
Bibliographic Citation
Communications biology Vol.6 No.1 (2023) , 804
Suggested Citation
Fachrul M., Karkey A., Shakya M., Judd L.M., Harshegyi T., Sim K.S., Tonks S., Dongol S., Shrestha R., Salim A., Adhikari A., Banda H.C., Blohmke C., Darton T.C., Farooq Y., Ghimire M., Hill J., Hoang N.T., Jere T.M., Kamzati M., Kao Y.H., Masesa C., Mbewe M., Msuku H., Munthali P., Nga T.V.T., Nkhata R., Saad N.J., Van Tan T., Thindwa D., Khanam F., Meiring J., Clemens J.D., Dougan G., Pitzer V.E., Qadri F., Heyderman R.S., Gordon M.A., Voysey M., Baker S., Pollard A.J., Khor C.C., Dolecek C., Basnyat B., Dunstan S.J., Holt K.E., Inouye M. Direct inference and control of genetic population structure from RNA sequencing data. Communications biology Vol.6 No.1 (2023) , 804. doi:10.1038/s42003-023-05171-9 Retrieved from: https://repository.li.mahidol.ac.th/handle/20.500.14594/88276
Title
Direct inference and control of genetic population structure from RNA sequencing data
Author(s)
Fachrul M.
Karkey A.
Shakya M.
Judd L.M.
Harshegyi T.
Sim K.S.
Tonks S.
Dongol S.
Shrestha R.
Salim A.
Adhikari A.
Banda H.C.
Blohmke C.
Darton T.C.
Farooq Y.
Ghimire M.
Hill J.
Hoang N.T.
Jere T.M.
Kamzati M.
Kao Y.H.
Masesa C.
Mbewe M.
Msuku H.
Munthali P.
Nga T.V.T.
Nkhata R.
Saad N.J.
Van Tan T.
Thindwa D.
Khanam F.
Meiring J.
Clemens J.D.
Dougan G.
Pitzer V.E.
Qadri F.
Heyderman R.S.
Gordon M.A.
Voysey M.
Baker S.
Pollard A.J.
Khor C.C.
Dolecek C.
Basnyat B.
Dunstan S.J.
Holt K.E.
Inouye M.
Karkey A.
Shakya M.
Judd L.M.
Harshegyi T.
Sim K.S.
Tonks S.
Dongol S.
Shrestha R.
Salim A.
Adhikari A.
Banda H.C.
Blohmke C.
Darton T.C.
Farooq Y.
Ghimire M.
Hill J.
Hoang N.T.
Jere T.M.
Kamzati M.
Kao Y.H.
Masesa C.
Mbewe M.
Msuku H.
Munthali P.
Nga T.V.T.
Nkhata R.
Saad N.J.
Van Tan T.
Thindwa D.
Khanam F.
Meiring J.
Clemens J.D.
Dougan G.
Pitzer V.E.
Qadri F.
Heyderman R.S.
Gordon M.A.
Voysey M.
Baker S.
Pollard A.J.
Khor C.C.
Dolecek C.
Basnyat B.
Dunstan S.J.
Holt K.E.
Inouye M.
Author's Affiliation
Mahidol Oxford Tropical Medicine Research Unit
Oxford University Clinical Research Unit
Department of Medicine
Department of Public Health and Primary Care
School of Mathematics and Statistics
School of Biosciences
Melbourne School of Population and Global Health
The Peter Doherty Institute for Infection and Immunity
Friends of Patan Hospital Nepal
Baker Heart and Diabetes Institute
London School of Hygiene & Tropical Medicine
A-Star, Genome Institute of Singapore
University of Cambridge
University of Melbourne
Faculty of Medicine, Nursing and Health Sciences
Nuffield Department of Medicine
University of Oxford Medical Sciences Division
Oxford University Clinical Research Unit
Department of Medicine
Department of Public Health and Primary Care
School of Mathematics and Statistics
School of Biosciences
Melbourne School of Population and Global Health
The Peter Doherty Institute for Infection and Immunity
Friends of Patan Hospital Nepal
Baker Heart and Diabetes Institute
London School of Hygiene & Tropical Medicine
A-Star, Genome Institute of Singapore
University of Cambridge
University of Melbourne
Faculty of Medicine, Nursing and Health Sciences
Nuffield Department of Medicine
University of Oxford Medical Sciences Division
Other Contributor(s)
Abstract
RNAseq data can be used to infer genetic variants, yet its use for estimating genetic population structure remains underexplored. Here, we construct a freely available computational tool (RGStraP) to estimate RNAseq-based genetic principal components (RG-PCs) and assess whether RG-PCs can be used to control for population structure in gene expression analyses. Using whole blood samples from understudied Nepalese populations and the Geuvadis study, we show that RG-PCs had comparable results to paired array-based genotypes, with high genotype concordance and high correlations of genetic principal components, capturing subpopulations within the dataset. In differential gene expression analysis, we found that inclusion of RG-PCs as covariates reduced test statistic inflation. Our paper demonstrates that genetic population structure can be directly inferred and controlled for using RNAseq data, thus facilitating improved retrospective and future analyses of transcriptomic data.