Evaluating the consistency of automated CEFR analyzers: a study of English language text classification
3
Issued Date
2025-08-01
Resource Type
ISSN
22528822
eISSN
26205440
Scopus ID
2-s2.0-105011988557
Journal Title
International Journal of Evaluation and Research in Education
Volume
14
Issue
4
Start Page
3283
End Page
3294
Rights Holder(s)
SCOPUS
Bibliographic Citation
International Journal of Evaluation and Research in Education Vol.14 No.4 (2025) , 3283-3294
Suggested Citation
Siripol P., Rhee S., Thirakunkovit S., Liang-Itsara A. Evaluating the consistency of automated CEFR analyzers: a study of English language text classification. International Journal of Evaluation and Research in Education Vol.14 No.4 (2025) , 3283-3294. 3294. doi:10.11591/ijere.v14i4.33528 Retrieved from: https://repository.li.mahidol.ac.th/handle/123456789/111533
Title
Evaluating the consistency of automated CEFR analyzers: a study of English language text classification
Author(s)
Author's Affiliation
Corresponding Author(s)
Other Contributor(s)
Abstract
With the increasing use of web-based tools for text analysis, there is a growing reliance on automated systems to assess text difficulty and classify texts to the Common European Framework of Reference for Languages (CEFR). However, inconsistencies in these tools’ outputs could undermine their effectiveness for language learners and researchers. This study investigates the consistency of five widely used automated CEFR analyzer tools, including ChatGPT, by analyzing 20 English descriptive texts at CEFR levels B1 and B2. A quantitative approach was employed to compare the CEFR classifications generated by these tools. The results reveal significant inconsistencies across the tools, raising concerns about the reliability of automated CEFR alignment. Additionally, the content and genre of texts appeared to influence the CEFR classification, suggesting that certain factors beyond the tools’ algorithms may affect their accuracy. These findings have important implications for language educators, curriculum designers, and researchers who rely on automated CEFR tools for text selection, grading, and analysis. The study highlights the limitations of automated CEFR classification systems and calls for a more qualitative approach to text difficulty alignment analysis. Future research recommendation is discussed and call for more focus on refining these tools and exploring additional factors that may impact their effectiveness in text difficulty measurement and CEFR alignment.
