Publication:
Analysis of erroneous data entries in paper based and electronic data collection

dc.contributor.authorBenedikt Leyen_US
dc.contributor.authorKomal Raj Rijalen_US
dc.contributor.authorJutta Marfurten_US
dc.contributor.authorNaba Raj Adhikarien_US
dc.contributor.authorMegha Raj Banjaraen_US
dc.contributor.authorUpendra Thapa Shresthaen_US
dc.contributor.authorKamala Thriemeren_US
dc.contributor.authorRic N. Priceen_US
dc.contributor.authorPrakash Ghimireen_US
dc.contributor.otherTribhuvan Universityen_US
dc.contributor.otherMenzies School of Health Researchen_US
dc.contributor.otherMahidol Universityen_US
dc.contributor.otherNuffield Department of Clinical Medicineen_US
dc.date.accessioned2020-01-27T07:40:18Z
dc.date.available2020-01-27T07:40:18Z
dc.date.issued2019-08-22en_US
dc.description.abstract© 2019 The Author(s). Objective: Electronic data collection (EDC) has become a suitable alternative to paper based data collection (PBDC) in biomedical research even in resource poor settings. During a survey in Nepal, data were collected using both systems and data entry errors compared between both methods. Collected data were checked for completeness, values outside of realistic ranges, internal logic and date variables for reasonable time frames. Variables were grouped into 5 categories and the number of discordant entries were compared between both systems, overall and per variable category. Results: Data from 52 variables collected from 358 participants were available. Discrepancies between both data sets were found in 12.6% of all entries (2352/18,616). Differences between data points were identified in 18.0% (643/3580) of continuous variables, 15.8% of time variables (113/716), 13.0% of date variables (140/1074), 12.0% of text variables (86/716), and 10.9% of categorical variables (1370/12,530). Overall 64% (1499/2352) of all discrepancies were due to data omissions, 76.6% (1148/1499) of missing entries were among categorical data. Omissions in PBDC (n = 1002) were twice as frequent as in EDC (n = 497, p < 0.001). Data omissions, specifically among categorical variables were identified as the greatest source of error. If designed accordingly, EDC can address this short fall effectively.en_US
dc.identifier.citationBMC Research Notes. Vol.12, No.1 (2019)en_US
dc.identifier.doi10.1186/s13104-019-4574-8en_US
dc.identifier.issn17560500en_US
dc.identifier.other2-s2.0-85071230377en_US
dc.identifier.urihttps://repository.li.mahidol.ac.th/handle/20.500.14594/50103
dc.rightsMahidol Universityen_US
dc.rights.holderSCOPUSen_US
dc.source.urihttps://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85071230377&origin=inwarden_US
dc.subjectBiochemistry, Genetics and Molecular Biologyen_US
dc.titleAnalysis of erroneous data entries in paper based and electronic data collectionen_US
dc.typeArticleen_US
dspace.entity.typePublication
mu.datasource.scopushttps://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85071230377&origin=inwarden_US

Files

Collections