Discovery of diverse anellovirus sequences in Thai human sequencing data
Issued Date
2025-10-07
Resource Type
eISSN
21650497
Scopus ID
2-s2.0-105018023560
Pubmed ID
40905703
Journal Title
Microbiology Spectrum
Volume
13
Issue
10
Rights Holder(s)
SCOPUS
Bibliographic Citation
Microbiology Spectrum Vol.13 No.10 (2025) , e0086625
Suggested Citation
Phumiphanjarphak W., Parkbhorn J., Ngamphiw C., Tongsima S., Aiewsakun P. Discovery of diverse anellovirus sequences in Thai human sequencing data. Microbiology Spectrum Vol.13 No.10 (2025) , e0086625. doi:10.1128/spectrum.00866-25 Retrieved from: https://repository.li.mahidol.ac.th/handle/123456789/112564
Title
Discovery of diverse anellovirus sequences in Thai human sequencing data
Corresponding Author(s)
Other Contributor(s)
Abstract
Anelloviruses are part of the normal human viral flora. Although their diversity in humans has been investigated in many countries, and despite their initial detection in Thailand in 1999, knowledge of Thai anelloviruses remains very limited. This study analyzed 1,175 whole-genome sequencing data sets from Thai individuals to mine for potential anellovirus sequences. Our analyses detected anellovirus sequences in 149 data sets (12.68%), uncovering 434 partial anellovirus sequences and 77 complete genome sequences, characterized by the presence of terminal redundancy, complete orf1, and the conserved untranslated region upstream of the orf1 gene. Sequence analyses indicated that these viruses belong to seven genera, including Alphatorquevirus, Betatorquevirus, Gammatorquevirus, Hetorquevirus, Lamedtorquevirus, Samektorquevirus, and Yodtorquevirus. Notably, Hetorquevirus, Lamedtorquevirus, Samektorquevirus, and Yodtorquevirus had not previously been reported in Thailand. Phylogenetic analysis of ORF1 protein sequences showed that Thai anelloviruses form multiple phylogenetic clusters with non-Thai anelloviruses, indicating frequent cross-country transmission and multiple origins of the virus in Thailand. Furthermore, sequence similarity network analysis identified 33 potentially novel anellovirus species in our data set. Our findings greatly expand the knowledge of anellovirus diversity in Thailand and demonstrate the potential of human whole-genome sequencing data as a valuable resource for viral discovery. Lastly, we highlight and discuss some challenges with the use of the current pairwise sequence similarity-based classification scheme, in particular, how gaps can influence similarity calculation and potentially lead to inconsistencies with a phylogenetic-based classification scheme. IMPORTANCE: Anelloviruses are widespread in humans, yet their diversity remains poorly characterized in many regions, including Thailand. Here, we demonstrate that human sequencing data sets, originally generated without the intention for virome research, can be effectively mined for anellovirus sequences, including complete genomes. Our findings reveal a substantial number of previously unreported anelloviruses in Thailand, significantly expanding the known diversity of the virus. We also highlight potential limitations of the current anellovirus species classification scheme, which is based on pairwise orf1 sequence similarity analysis with a hard threshold cutoff at 69%. Our results reveal that the current scheme can sometimes yield taxonomic groupings that are inconsistent with phylogenetic relationships, particularly when significant alignment gaps are present. Overall, our results show that existing human sequencing data can be effectively repurposed for virus discovery research and suggest the need for more robust and phylogenetically informed classification frameworks as viral sequence databases continue to expand.