Systematic bias in malaria parasite relatedness estimation
Issued Date
2025-05-01
Resource Type
eISSN
21601836
Scopus ID
2-s2.0-105004663160
Pubmed ID
39883524
Journal Title
G3: Genes, Genomes, Genetics
Volume
15
Issue
5
Rights Holder(s)
SCOPUS
Bibliographic Citation
G3: Genes, Genomes, Genetics Vol.15 No.5 (2025)
Suggested Citation
Mehra S., Neafsey D.E., White M., Taylor A.R. Systematic bias in malaria parasite relatedness estimation. G3: Genes, Genomes, Genetics Vol.15 No.5 (2025). doi:10.1093/g3journal/jkaf018 Retrieved from: https://repository.li.mahidol.ac.th/handle/123456789/110184
Title
Systematic bias in malaria parasite relatedness estimation
Author(s)
Corresponding Author(s)
Other Contributor(s)
Abstract
Genetic studies of Plasmodium parasites increasingly feature relatedness estimates. However, various aspects of malaria parasite relatedness estimation are not fully understood. For example, relatedness estimates based on whole-genome-sequence (WGS) data often exceed those based on sparser data types. Systematic bias in relatedness estimation is well documented in the literature geared towards diploid organisms, but largely unknown within the malaria community. We characterize systematic bias in malaria parasite relatedness estimation using three complementary approaches: theoretically, under a non-ancestral statistical model of pairwise relatedness; numerically, under a simulation model of ancestry; and empirically, using data on parasites sampled from Guyana and Colombia. We show that allele frequency estimates encode, locus-by-locus, relatedness averaged over the set of sampled parasites used to compute them. Plugging sample allele frequencies into models of pairwise relatedness can lead to systematic underestimation. However, systematic underestimation can be viewed as population-relatedness calibration, i.e., a way of generating measures of relative relatedness. Systematic underestimation is unavoidable when relatedness is estimated assuming independence between genetic markers. It is mitigated when relatedness is estimated using WGS data under a hidden Markov model (HMM) that exploits linkage between proximal markers. The extent of mitigation is unknowable when a HMM is fit to sparser data, but downstream analyses that use high relatedness thresholds are relatively robust regardless. In summary, practitioners can either resolve to use relative relatedness estimated under independence, or try to estimate absolute relatedness under a HMM. We propose various tools to help practitioners evaluate their situation on a case-by-case basis.
