Publication: GenomegaMap: Within-Species Genome-Wide d<inf>N</inf>=d<inf>S</inf> Estimation from over 10,000 Genomes
Issued Date
2021-01-01
Resource Type
ISSN
15371719
07374038
07374038
Other identifier(s)
2-s2.0-85089203677
Rights
Mahidol University
Rights Holder(s)
SCOPUS
Bibliographic Citation
Molecular Biology and Evolution. Vol.37, No.8 (2021), 2450-2460
Suggested Citation
Daniel J. Wilson, Derrick W. Crook, Timothy E.A. Peto, A. Sarah Walker, Sarah J. Hoosdally, Ana L. Gibertoni Cruz, Joshua Carter, Clara Grazian, Sarah G. Earle, Samaneh Kouchaki, Alexander Lachapelle, Yang Yang, David A. Clifton, Philip W. Fowler, Zamin Iqbal, Martin Hunt, Jeffrey Knaggs, E. Grace Smith, Priti Rathod, Lisa Jarrett, Daniela Matias, Daniela M. Cirillo, Emanuele Borroni, Simone Battaglia, Arash Ghodousi, Andrea Spitaleri, Andrea Cabibbe, Sabira Tahseen, Kayzad Nilgiriwala, Sanchi Shah, Camilla Rodrigues, Priti Kambli, Utkarsha Surve, Rukhsar Khot, Stefan Niemann, Thomas A. Kohl, Matthias Merker, Harald Hoffmann, Katharina Todt, Sara Plesnik, Nazir Ismail, Shaheed Vally Omar, Lavania Joseph, Guy Thwaites, Thuong Nguyen Thuy Thuong, Nhung Hoang Ngoc, Vijay Srinivasan, Timothy M. Walker, David Moore, Jorge Coronel, Walter Solano, George F. Gao, Guangxue He, Yanlin Zhao, Chunfa Liu, Aijing Ma, Baoli Zhu, Ian Laurenson, Pauline Claxton, Anastasia Koch, Robert Wilkinson, Ajit Lalvani, James Posey, Jennifer Gardy, Jim Werngren, Nicholas Paton, Ruwen Jou, Mei Hua Wu, Wan Hsuan Lin, Lucilaine Ferrazoli, Rosangela Siqueira de Oliveira, Irena Arandjelovic, Angkana Chaiprasert, Inaki Comas, Francis A. Drobniewski, Maha R. Farhat, Qian Gao, Rick Ong Twee Hee, Vitali Sintchenko, Philip Supply, Dick van Soolingen GenomegaMap: Within-Species Genome-Wide d<inf>N</inf>=d<inf>S</inf> Estimation from over 10,000 Genomes. Molecular Biology and Evolution. Vol.37, No.8 (2021), 2450-2460. doi:10.1093/MOLBEV/MSAA069 Retrieved from: https://repository.li.mahidol.ac.th/handle/20.500.14594/75819
Research Projects
Organizational Units
Authors
Journal Issue
Thesis
Title
GenomegaMap: Within-Species Genome-Wide d<inf>N</inf>=d<inf>S</inf> Estimation from over 10,000 Genomes
Author(s)
Daniel J. Wilson
Derrick W. Crook
Timothy E.A. Peto
A. Sarah Walker
Sarah J. Hoosdally
Ana L. Gibertoni Cruz
Joshua Carter
Clara Grazian
Sarah G. Earle
Samaneh Kouchaki
Alexander Lachapelle
Yang Yang
David A. Clifton
Philip W. Fowler
Zamin Iqbal
Martin Hunt
Jeffrey Knaggs
E. Grace Smith
Priti Rathod
Lisa Jarrett
Daniela Matias
Daniela M. Cirillo
Emanuele Borroni
Simone Battaglia
Arash Ghodousi
Andrea Spitaleri
Andrea Cabibbe
Sabira Tahseen
Kayzad Nilgiriwala
Sanchi Shah
Camilla Rodrigues
Priti Kambli
Utkarsha Surve
Rukhsar Khot
Stefan Niemann
Thomas A. Kohl
Matthias Merker
Harald Hoffmann
Katharina Todt
Sara Plesnik
Nazir Ismail
Shaheed Vally Omar
Lavania Joseph
Guy Thwaites
Thuong Nguyen Thuy Thuong
Nhung Hoang Ngoc
Vijay Srinivasan
Timothy M. Walker
David Moore
Jorge Coronel
Walter Solano
George F. Gao
Guangxue He
Yanlin Zhao
Chunfa Liu
Aijing Ma
Baoli Zhu
Ian Laurenson
Pauline Claxton
Anastasia Koch
Robert Wilkinson
Ajit Lalvani
James Posey
Jennifer Gardy
Jim Werngren
Nicholas Paton
Ruwen Jou
Mei Hua Wu
Wan Hsuan Lin
Lucilaine Ferrazoli
Rosangela Siqueira de Oliveira
Irena Arandjelovic
Angkana Chaiprasert
Inaki Comas
Francis A. Drobniewski
Maha R. Farhat
Qian Gao
Rick Ong Twee Hee
Vitali Sintchenko
Philip Supply
Dick van Soolingen
Derrick W. Crook
Timothy E.A. Peto
A. Sarah Walker
Sarah J. Hoosdally
Ana L. Gibertoni Cruz
Joshua Carter
Clara Grazian
Sarah G. Earle
Samaneh Kouchaki
Alexander Lachapelle
Yang Yang
David A. Clifton
Philip W. Fowler
Zamin Iqbal
Martin Hunt
Jeffrey Knaggs
E. Grace Smith
Priti Rathod
Lisa Jarrett
Daniela Matias
Daniela M. Cirillo
Emanuele Borroni
Simone Battaglia
Arash Ghodousi
Andrea Spitaleri
Andrea Cabibbe
Sabira Tahseen
Kayzad Nilgiriwala
Sanchi Shah
Camilla Rodrigues
Priti Kambli
Utkarsha Surve
Rukhsar Khot
Stefan Niemann
Thomas A. Kohl
Matthias Merker
Harald Hoffmann
Katharina Todt
Sara Plesnik
Nazir Ismail
Shaheed Vally Omar
Lavania Joseph
Guy Thwaites
Thuong Nguyen Thuy Thuong
Nhung Hoang Ngoc
Vijay Srinivasan
Timothy M. Walker
David Moore
Jorge Coronel
Walter Solano
George F. Gao
Guangxue He
Yanlin Zhao
Chunfa Liu
Aijing Ma
Baoli Zhu
Ian Laurenson
Pauline Claxton
Anastasia Koch
Robert Wilkinson
Ajit Lalvani
James Posey
Jennifer Gardy
Jim Werngren
Nicholas Paton
Ruwen Jou
Mei Hua Wu
Wan Hsuan Lin
Lucilaine Ferrazoli
Rosangela Siqueira de Oliveira
Irena Arandjelovic
Angkana Chaiprasert
Inaki Comas
Francis A. Drobniewski
Maha R. Farhat
Qian Gao
Rick Ong Twee Hee
Vitali Sintchenko
Philip Supply
Dick van Soolingen
Other Contributor(s)
Siriraj Hospital
Public Health Agency of Sweden
Oxford University Clinical Research Unit
Second Affiliated Hospital of Southern University of Science and Technology
Public Health England
Université de Lille
Centro de Investigación Biomédica en Red de Epidemiología y Salud Pública
Forschungszentrum Borstel - Zentrum für Medizin und Biowissenschaften
Belgrade University School of Medicine
The Foundation for Medical Research India
CSIC - Instituto de Biomedicina de Valencia (IBV)
National Institute for Communicable Diseases
London School of Hygiene & Tropical Medicine
Instituto Adolfo Lutz
University of Oxford
National Institute for Public Health and the Environment
European Bioinformatics Institute
The University of Sydney
IRCCS Ospedale San Raffaele
Asklepios Fachkliniken München-Gauting
P.D. Hinduja National Hospital and Medical Research Centre
Centers for Disease Control and Prevention
Institute of Microbiology Chinese Academy of Sciences
National University of Singapore
Imperial College London
The University of British Columbia
Fudan University
Nuffield Department of Medicine
Harvard Medical School
University of Cape Town
Universidad Peruana Cayetano Hereda
CDC Taiwan
FISABIO-Public Health
National Tuberculosis Control Program Pakistan
Scottish Mycobacteria Reference Laboratory
China CDC
Public Health Agency of Sweden
Oxford University Clinical Research Unit
Second Affiliated Hospital of Southern University of Science and Technology
Public Health England
Université de Lille
Centro de Investigación Biomédica en Red de Epidemiología y Salud Pública
Forschungszentrum Borstel - Zentrum für Medizin und Biowissenschaften
Belgrade University School of Medicine
The Foundation for Medical Research India
CSIC - Instituto de Biomedicina de Valencia (IBV)
National Institute for Communicable Diseases
London School of Hygiene & Tropical Medicine
Instituto Adolfo Lutz
University of Oxford
National Institute for Public Health and the Environment
European Bioinformatics Institute
The University of Sydney
IRCCS Ospedale San Raffaele
Asklepios Fachkliniken München-Gauting
P.D. Hinduja National Hospital and Medical Research Centre
Centers for Disease Control and Prevention
Institute of Microbiology Chinese Academy of Sciences
National University of Singapore
Imperial College London
The University of British Columbia
Fudan University
Nuffield Department of Medicine
Harvard Medical School
University of Cape Town
Universidad Peruana Cayetano Hereda
CDC Taiwan
FISABIO-Public Health
National Tuberculosis Control Program Pakistan
Scottish Mycobacteria Reference Laboratory
China CDC
Abstract
The dN=dS ratio provides evidence of adaptation or functional constraint in protein-coding genes by quantifying the relative excess or deficit of amino acid-replacing versus silent nucleotide variation. Inexpensive sequencing promises a better understanding of parameters, such as dN=dS, but analyzing very large data sets poses a major statistical challenge. Here, I introduce genomegaMap for estimating within-species genome-wide variation in dN=dS, and I apply it to 3,979 genes across 10,209 tuberculosis genomes to characterize the selection pressures shaping this global pathogen. GenomegaMap is a phylogeny-free method that addresses two major problems with existing approaches: 1) It is fast no matter how large the sample size and 2) it is robust to recombination, which causes phylogenetic methods to report artefactual signals of adaptation. GenomegaMap uses population genetics theory to approximate the distribution of allele frequencies under general, parent-dependent mutation models. Coalescent simulations show that substitution parameters are well estimated even when genomegaMap's simplifying assumption of independence among sites is violated. I demonstrate the ability of genomegaMap to detect genuine signatures of selection at antimicrobial resistance-conferring substitutions in Mycobacterium tuberculosis and describe a novel signature of selection in the cold-shock DEAD-box protein A gene deaD/csdA. The genomegaMap approach helps accelerate the exploitation of big data for gaining new insights into evolution within species.