Entourage: all-in-one sequence analysis software for genome assembly, virus detection, virus discovery, and intrasample variation profiling

dc.contributor.authorPhumiphanjarphak W.
dc.contributor.authorAiewsakun P.
dc.contributor.correspondencePhumiphanjarphak W.
dc.contributor.otherMahidol University
dc.date.accessioned2024-06-29T18:25:07Z
dc.date.available2024-06-29T18:25:07Z
dc.date.issued2024-12-01
dc.description.abstractBackground: Pan-virus detection, and virome investigation in general, can be challenging, mainly due to the lack of universally conserved genetic elements in viruses. Metagenomic next-generation sequencing can offer a promising solution to this problem by providing an unbiased overview of the microbial community, enabling detection of any viruses without prior target selection. However, a major challenge in utilising metagenomic next-generation sequencing for virome investigation is that data analysis can be highly complex, involving numerous data processing steps. Results: Here, we present Entourage to address this challenge. Entourage enables short-read sequence assembly, viral sequence search with or without reference virus targets using contig-based approaches, and intrasample sequence variation quantification. Several workflows are implemented in Entourage to facilitate end-to-end virus sequence detection analysis through a single command line, from read cleaning, sequence assembly, to virus sequence searching. The results generated are comprehensive, allowing for thorough quality control, reliability assessment, and interpretation. We illustrate Entourage's utility as a streamlined workflow for virus detection by employing it to comprehensively search for target virus sequences and beyond in raw sequence read data generated from HeLa cell culture samples spiked with viruses. Furthermore, we showcase its flexibility and performance on a real-world dataset by analysing a preassembled Tara Oceans dataset. Overall, our results show that Entourage performs well even with low virus sequencing depth in single digits, and it can be used to discover novel viruses effectively. Additionally, by using sequence data generated from a patient with chronic SARS-CoV-2 infection, we demonstrate Entourage's capability to quantify virus intrasample genetic variations, and generate publication-quality figures illustrating the results. Conclusions: Entourage is an all-in-one, versatile, and streamlined bioinformatics software for virome investigation, developed with a focus on ease of use. Entourage is available at https://codeberg.org/CENMIG/Entourage under the MIT license.
dc.identifier.citationBMC Bioinformatics Vol.25 No.1 (2024)
dc.identifier.doi10.1186/s12859-024-05846-y
dc.identifier.eissn14712105
dc.identifier.scopus2-s2.0-85196714304
dc.identifier.urihttps://repository.li.mahidol.ac.th/handle/20.500.14594/99238
dc.rights.holderSCOPUS
dc.subjectMathematics
dc.subjectBiochemistry, Genetics and Molecular Biology
dc.subjectComputer Science
dc.titleEntourage: all-in-one sequence analysis software for genome assembly, virus detection, virus discovery, and intrasample variation profiling
dc.typeArticle
mu.datasource.scopushttps://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85196714304&origin=inward
oaire.citation.issue1
oaire.citation.titleBMC Bioinformatics
oaire.citation.volume25
oairecerif.author.affiliationMahidol University

Files

Collections