An automated classification pipeline for tables in pharmacokinetic literature
| dc.contributor.author | Smith V.C. | |
| dc.contributor.author | Gonzalez Hernandez F. | |
| dc.contributor.author | Wattanakul T. | |
| dc.contributor.author | Chotsiri P. | |
| dc.contributor.author | Cordero J.A. | |
| dc.contributor.author | Ballester M.R. | |
| dc.contributor.author | Duran M. | |
| dc.contributor.author | Fanlo Escudero O. | |
| dc.contributor.author | Lilaonitkul W. | |
| dc.contributor.author | Standing J.F. | |
| dc.contributor.author | Kloprogge F. | |
| dc.contributor.correspondence | Smith V.C. | |
| dc.contributor.other | Mahidol University | |
| dc.date.accessioned | 2025-04-01T18:11:17Z | |
| dc.date.available | 2025-04-01T18:11:17Z | |
| dc.date.issued | 2025-12-01 | |
| dc.description.abstract | Pharmacokinetic (PK) models are essential for optimising drug candidate selection and dosing regimens in drug development. Preclinical and population PK models benefit from integrating prior knowledge from existing compounds. While tables in scientific literature contain comprehensive prior PK data and critical contextual information, the lack of automated extraction tools forces researchers to manually curate datasets, limiting efficiency and scalability. This study addresses this gap by focusing on the crucial first step of PK table mining: automatically identifying tables containing in vivo PK parameters and study population characteristics. To this end, an expert-annotated corpus of 2640 tables from PK literature was developed and used to train a supervised classification pipeline. The pipeline integrates diverse table features and representations, with GPT-4 refining predictions in uncertain cases. The resulting model achieved F1 scores exceeding 96% across all classes. The pipeline was applied to PK papers from PubMed Central Open-Access, with results integrated into the PK paper search tool at www.pkpdai.com. This work establishes a foundational step towards automating PK table data extraction and streamlining dataset curation. The corpus and code are openly available. | |
| dc.identifier.citation | Scientific Reports Vol.15 No.1 (2025) | |
| dc.identifier.doi | 10.1038/s41598-025-94778-5 | |
| dc.identifier.eissn | 20452322 | |
| dc.identifier.scopus | 2-s2.0-105000727332 | |
| dc.identifier.uri | https://repository.li.mahidol.ac.th/handle/123456789/108526 | |
| dc.rights.holder | SCOPUS | |
| dc.subject | Multidisciplinary | |
| dc.title | An automated classification pipeline for tables in pharmacokinetic literature | |
| dc.type | Article | |
| mu.datasource.scopus | https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=105000727332&origin=inward | |
| oaire.citation.issue | 1 | |
| oaire.citation.title | Scientific Reports | |
| oaire.citation.volume | 15 | |
| oairecerif.author.affiliation | UCL Engineering | |
| oairecerif.author.affiliation | Mahidol Oxford Tropical Medicine Research Unit | |
| oairecerif.author.affiliation | Institut de Recerca Sant Pau (IR SANT PAU) | |
| oairecerif.author.affiliation | PAREXEL International | |
| oairecerif.author.affiliation | Universitat Ramon Llull | |
| oairecerif.author.affiliation | University College London | |
| oairecerif.author.affiliation | Great Ormond Street Hospital for Children NHS Foundation Trust |
