Publication:
An optimized genomic VCF workflow for precise identification of Mycobacterium tuberculosis cluster from cross-platform whole genome sequencing data

dc.contributor.authorAreeya Disratthakiten_US
dc.contributor.authorLicht Toyo-okaen_US
dc.contributor.authorPenpitcha Thawongen_US
dc.contributor.authorPundharika Paiboonsirien_US
dc.contributor.authorNuanjun Wichukjindaen_US
dc.contributor.authorPravech Ajawatanawongen_US
dc.contributor.authorNatthakan Thipkruaen_US
dc.contributor.authorKrairerk Suthumen_US
dc.contributor.authorPrasit Palittapongarnpimen_US
dc.contributor.authorKatsushi Tokunagaen_US
dc.contributor.authorSurakameth Mahasirimongkolen_US
dc.contributor.otherUniversity of Tokyoen_US
dc.contributor.otherNational Center for Global Health and Medicineen_US
dc.contributor.otherThailand Ministry of Public Healthen_US
dc.contributor.otherMahidol Universityen_US
dc.contributor.otherThailand National Center for Genetic Engineering and Biotechnologyen_US
dc.date.accessioned2020-01-27T03:28:11Z
dc.date.available2020-01-27T03:28:11Z
dc.date.issued2020-04-01en_US
dc.description.abstract© 2019 Elsevier B.V. Whole-genome sequencing (WGS) data allow for an inference of Mycobacterium tuberculosis (Mtb) clusters by using a pairwise genetic distance of ≤12 single nucleotide polymorphisms (SNPs) as a threshold. However, a problem of discrepancies in numbers of SNPs and genetic distance measurement is a great concern when combining WGS data from different next generation sequencing (NGS) platforms. We performed SNP variant calling on WGS data of 9 multidrug-resistant (MDR-TB), 3 extensively drug-resistant tuberculosis (XDR-TB) and a standard M. tuberculosis strain H37Rv from an Illumina/NextSeq500 and an Ion Torrent PGM. Variant calls were obtained using four different common variant calling tools, including Genome Analysis Toolkit (GATK) HaplotypeCaller (GATK-VCF workflow), GATK HaplotypeCaller and GenotypeGVCFs (GATK-GVCF workflow), SAMtools, and VarScan 2. Cross-platform pairwise SNP differences, minimum spanning networks and average nucleotide identity (ANI) were analysed to measure performance of the variant calling tools. Minimum pairwise SNP differences ranged from 2 to 14 SNPs when using GVCF workflow while maximum pairwise SNP differences ranged from 7 to 158 SNPs when using VarScan 2. ANI comparison between SNPs data from NextSeq500 and PGM of MDR-TB and XDR-TB showed maximum ANI of 99.7% and 99.0%, respectively, with GVCF workflow while the other SNP calling results showed lower ANI in a range of 98.6% to 95.1%. In this study, we suggest that the GVCF workflow showed the best performing variant caller to avoid cross-platform pairwise SNP differences.en_US
dc.identifier.citationInfection, Genetics and Evolution. Vol.79, (2020)en_US
dc.identifier.doi10.1016/j.meegid.2019.104152en_US
dc.identifier.issn15677257en_US
dc.identifier.issn15671348en_US
dc.identifier.other2-s2.0-85077319732en_US
dc.identifier.urihttps://repository.li.mahidol.ac.th/handle/20.500.14594/49503
dc.rightsMahidol Universityen_US
dc.rights.holderSCOPUSen_US
dc.source.urihttps://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85077319732&origin=inwarden_US
dc.subjectAgricultural and Biological Sciencesen_US
dc.subjectBiochemistry, Genetics and Molecular Biologyen_US
dc.subjectImmunology and Microbiologyen_US
dc.subjectMedicineen_US
dc.titleAn optimized genomic VCF workflow for precise identification of Mycobacterium tuberculosis cluster from cross-platform whole genome sequencing dataen_US
dc.typeArticleen_US
dspace.entity.typePublication
mu.datasource.scopushttps://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85077319732&origin=inwarden_US

Files

Collections