Publication: An optimized genomic VCF workflow for precise identification of Mycobacterium tuberculosis cluster from cross-platform whole genome sequencing data
Issued Date
2020-04-01
Resource Type
ISSN
15677257
15671348
15671348
Other identifier(s)
2-s2.0-85077319732
Rights
Mahidol University
Rights Holder(s)
SCOPUS
Bibliographic Citation
Infection, Genetics and Evolution. Vol.79, (2020)
Suggested Citation
Areeya Disratthakit, Licht Toyo-oka, Penpitcha Thawong, Pundharika Paiboonsiri, Nuanjun Wichukjinda, Pravech Ajawatanawong, Natthakan Thipkrua, Krairerk Suthum, Prasit Palittapongarnpim, Katsushi Tokunaga, Surakameth Mahasirimongkol An optimized genomic VCF workflow for precise identification of Mycobacterium tuberculosis cluster from cross-platform whole genome sequencing data. Infection, Genetics and Evolution. Vol.79, (2020). doi:10.1016/j.meegid.2019.104152 Retrieved from: https://repository.li.mahidol.ac.th/handle/20.500.14594/49503
Research Projects
Organizational Units
Authors
Journal Issue
Thesis
Title
An optimized genomic VCF workflow for precise identification of Mycobacterium tuberculosis cluster from cross-platform whole genome sequencing data
Abstract
© 2019 Elsevier B.V. Whole-genome sequencing (WGS) data allow for an inference of Mycobacterium tuberculosis (Mtb) clusters by using a pairwise genetic distance of ≤12 single nucleotide polymorphisms (SNPs) as a threshold. However, a problem of discrepancies in numbers of SNPs and genetic distance measurement is a great concern when combining WGS data from different next generation sequencing (NGS) platforms. We performed SNP variant calling on WGS data of 9 multidrug-resistant (MDR-TB), 3 extensively drug-resistant tuberculosis (XDR-TB) and a standard M. tuberculosis strain H37Rv from an Illumina/NextSeq500 and an Ion Torrent PGM. Variant calls were obtained using four different common variant calling tools, including Genome Analysis Toolkit (GATK) HaplotypeCaller (GATK-VCF workflow), GATK HaplotypeCaller and GenotypeGVCFs (GATK-GVCF workflow), SAMtools, and VarScan 2. Cross-platform pairwise SNP differences, minimum spanning networks and average nucleotide identity (ANI) were analysed to measure performance of the variant calling tools. Minimum pairwise SNP differences ranged from 2 to 14 SNPs when using GVCF workflow while maximum pairwise SNP differences ranged from 7 to 158 SNPs when using VarScan 2. ANI comparison between SNPs data from NextSeq500 and PGM of MDR-TB and XDR-TB showed maximum ANI of 99.7% and 99.0%, respectively, with GVCF workflow while the other SNP calling results showed lower ANI in a range of 98.6% to 95.1%. In this study, we suggest that the GVCF workflow showed the best performing variant caller to avoid cross-platform pairwise SNP differences.