Publication: An optimized genomic VCF workflow for precise identification of Mycobacterium tuberculosis cluster from cross-platform whole genome sequencing data
dc.contributor.author | Areeya Disratthakit | en_US |
dc.contributor.author | Licht Toyo-oka | en_US |
dc.contributor.author | Penpitcha Thawong | en_US |
dc.contributor.author | Pundharika Paiboonsiri | en_US |
dc.contributor.author | Nuanjun Wichukjinda | en_US |
dc.contributor.author | Pravech Ajawatanawong | en_US |
dc.contributor.author | Natthakan Thipkrua | en_US |
dc.contributor.author | Krairerk Suthum | en_US |
dc.contributor.author | Prasit Palittapongarnpim | en_US |
dc.contributor.author | Katsushi Tokunaga | en_US |
dc.contributor.author | Surakameth Mahasirimongkol | en_US |
dc.contributor.other | University of Tokyo | en_US |
dc.contributor.other | National Center for Global Health and Medicine | en_US |
dc.contributor.other | Thailand Ministry of Public Health | en_US |
dc.contributor.other | Mahidol University | en_US |
dc.contributor.other | Thailand National Center for Genetic Engineering and Biotechnology | en_US |
dc.date.accessioned | 2020-01-27T03:28:11Z | |
dc.date.available | 2020-01-27T03:28:11Z | |
dc.date.issued | 2020-04-01 | en_US |
dc.description.abstract | © 2019 Elsevier B.V. Whole-genome sequencing (WGS) data allow for an inference of Mycobacterium tuberculosis (Mtb) clusters by using a pairwise genetic distance of ≤12 single nucleotide polymorphisms (SNPs) as a threshold. However, a problem of discrepancies in numbers of SNPs and genetic distance measurement is a great concern when combining WGS data from different next generation sequencing (NGS) platforms. We performed SNP variant calling on WGS data of 9 multidrug-resistant (MDR-TB), 3 extensively drug-resistant tuberculosis (XDR-TB) and a standard M. tuberculosis strain H37Rv from an Illumina/NextSeq500 and an Ion Torrent PGM. Variant calls were obtained using four different common variant calling tools, including Genome Analysis Toolkit (GATK) HaplotypeCaller (GATK-VCF workflow), GATK HaplotypeCaller and GenotypeGVCFs (GATK-GVCF workflow), SAMtools, and VarScan 2. Cross-platform pairwise SNP differences, minimum spanning networks and average nucleotide identity (ANI) were analysed to measure performance of the variant calling tools. Minimum pairwise SNP differences ranged from 2 to 14 SNPs when using GVCF workflow while maximum pairwise SNP differences ranged from 7 to 158 SNPs when using VarScan 2. ANI comparison between SNPs data from NextSeq500 and PGM of MDR-TB and XDR-TB showed maximum ANI of 99.7% and 99.0%, respectively, with GVCF workflow while the other SNP calling results showed lower ANI in a range of 98.6% to 95.1%. In this study, we suggest that the GVCF workflow showed the best performing variant caller to avoid cross-platform pairwise SNP differences. | en_US |
dc.identifier.citation | Infection, Genetics and Evolution. Vol.79, (2020) | en_US |
dc.identifier.doi | 10.1016/j.meegid.2019.104152 | en_US |
dc.identifier.issn | 15677257 | en_US |
dc.identifier.issn | 15671348 | en_US |
dc.identifier.other | 2-s2.0-85077319732 | en_US |
dc.identifier.uri | https://repository.li.mahidol.ac.th/handle/20.500.14594/49503 | |
dc.rights | Mahidol University | en_US |
dc.rights.holder | SCOPUS | en_US |
dc.source.uri | https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85077319732&origin=inward | en_US |
dc.subject | Agricultural and Biological Sciences | en_US |
dc.subject | Biochemistry, Genetics and Molecular Biology | en_US |
dc.subject | Immunology and Microbiology | en_US |
dc.subject | Medicine | en_US |
dc.title | An optimized genomic VCF workflow for precise identification of Mycobacterium tuberculosis cluster from cross-platform whole genome sequencing data | en_US |
dc.type | Article | en_US |
dspace.entity.type | Publication | |
mu.datasource.scopus | https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85077319732&origin=inward | en_US |