Phumiphanjarphak W.Parkbhorn J.Ngamphiw C.Tongsima S.Aiewsakun P.Mahidol University2025-10-132025-10-132025-10-07Microbiology Spectrum Vol.13 No.10 (2025) , e0086625https://repository.li.mahidol.ac.th/handle/123456789/112564Anelloviruses are part of the normal human viral flora. Although their diversity in humans has been investigated in many countries, and despite their initial detection in Thailand in 1999, knowledge of Thai anelloviruses remains very limited. This study analyzed 1,175 whole-genome sequencing data sets from Thai individuals to mine for potential anellovirus sequences. Our analyses detected anellovirus sequences in 149 data sets (12.68%), uncovering 434 partial anellovirus sequences and 77 complete genome sequences, characterized by the presence of terminal redundancy, complete orf1, and the conserved untranslated region upstream of the orf1 gene. Sequence analyses indicated that these viruses belong to seven genera, including Alphatorquevirus, Betatorquevirus, Gammatorquevirus, Hetorquevirus, Lamedtorquevirus, Samektorquevirus, and Yodtorquevirus. Notably, Hetorquevirus, Lamedtorquevirus, Samektorquevirus, and Yodtorquevirus had not previously been reported in Thailand. Phylogenetic analysis of ORF1 protein sequences showed that Thai anelloviruses form multiple phylogenetic clusters with non-Thai anelloviruses, indicating frequent cross-country transmission and multiple origins of the virus in Thailand. Furthermore, sequence similarity network analysis identified 33 potentially novel anellovirus species in our data set. Our findings greatly expand the knowledge of anellovirus diversity in Thailand and demonstrate the potential of human whole-genome sequencing data as a valuable resource for viral discovery. Lastly, we highlight and discuss some challenges with the use of the current pairwise sequence similarity-based classification scheme, in particular, how gaps can influence similarity calculation and potentially lead to inconsistencies with a phylogenetic-based classification scheme. IMPORTANCE: Anelloviruses are widespread in humans, yet their diversity remains poorly characterized in many regions, including Thailand. Here, we demonstrate that human sequencing data sets, originally generated without the intention for virome research, can be effectively mined for anellovirus sequences, including complete genomes. Our findings reveal a substantial number of previously unreported anelloviruses in Thailand, significantly expanding the known diversity of the virus. We also highlight potential limitations of the current anellovirus species classification scheme, which is based on pairwise orf1 sequence similarity analysis with a hard threshold cutoff at 69%. Our results reveal that the current scheme can sometimes yield taxonomic groupings that are inconsistent with phylogenetic relationships, particularly when significant alignment gaps are present. Overall, our results show that existing human sequencing data can be effectively repurposed for virus discovery research and suggest the need for more robust and phylogenetically informed classification frameworks as viral sequence databases continue to expand.Environmental ScienceBiochemistry, Genetics and Molecular BiologyMedicineImmunology and MicrobiologyDiscovery of diverse anellovirus sequences in Thai human sequencing dataArticleSCOPUS10.1128/spectrum.00866-252-s2.0-1050180235602165049740905703