The Use of Artificial Intelligence in Assessing the IELTS Academic Writing Task Essays

Aliza Kamalatuzzahroh; Joko Priyana

doi:10.58421/gehu.v5i1.1189

DOI:

https://doi.org/10.58421/gehu.v5i1.1189

Authors

Aliza Kamalatuzzahroh Universitas Negeri Yogyakarta
Joko Priyana Universitass Negeri Yogyakarta

Keywords:

Artificial Intelligence (AI), IELTS Academic Writing, Computer - Based Prediction Test, Writing Assessment

Abstract

This study investigates the accuracy of Artificial Intelligence (AI) in assessing the IELTS Academic Writing Task essays by comparing AI-generated and human examiner scores and feedback. Despite the increasing adoption of AI-based assessment tools, limited empirical evidence exists regarding their validity and reliability in high-stakes IELTS writing evaluation. Therefore, this study aims to determine whether significant differences exist between AI and human scoring and to examine the qualitative characteristics of the feedback provided. This research employed a mixed-method explanatory design involving ten participants who completed a computer-based IELTS prediction test. Their essays were independently evaluated by an AI scoring system and a human rater using IELTS band descriptors. Quantitative analysis using a paired-sample t-test measured differences in assigned scores, while qualitative content analysis examined patterns, depth, and focus of the feedback provided. The findings indicate a statistically significant difference between AI-generated and human-assigned scores (p = 0.022), with a mean difference of 0.4 points, suggesting that AI tended to assign higher scores. The feedback analysis reveals that AI primarily focuses on technical aspects such as grammar, vocabulary, and sentence structure, offering general improvement suggestions, whereas human feedback demonstrates greater depth and personalization. These results suggest that while AI enhances scoring efficiency, it cannot fully replace human evaluative judgment in complex academic writing assessment.

Downloads

Download data is not yet available.

References

J. Read, “Test Review: The International English Language Testing System (IELTS),” language testing, vol. 39, no. 4, pp. 679–694, Oct. 2022, doi: 10.1177/02655322221086211.

M. A. S. Al-Malki, “Testing the Predictive Validity of the IELTS Test on Omani English Candidates’ Professional Competencies,” IJALEL, vol. 3, no. 5, Jul. 2014, doi: 10.7575/aiac.ijalel.v.3n.5p.166.

P. Peltekov, “The International English Language Testing System (IELTS): A Critical Review,” JELTL, vol. 6, no. 2, p. 395, Aug. 2021, doi: 10.21462/jeltl.v6i2.581.

S. W. Chong and X. Ye, Developing Writing Skills for IELTS: A Research-based Approach. Routledge, 2020.

W. Pearson, “A comparative study of lexical bundles in IELTS Writing Task 1 and 2 simulation essays and tertiary academic writing,” Journal of Academic Language and Learning, vol. 15, no. 1, pp. 27–52, 2021, [Online]. Available: https://journal.aall.org.au/index.php/jall/article/download/717/435435511

V. A. Veerappan and T. Sulaiman, “A Review on IELTS Writing Test, Its Test Results and Inter Rater Reliability,” TPLS, vol. 2, no. 1, pp. 138–143, Jan. 2012, doi: 10.4304/tpls.2.1.138-143.

M. Y. M. Amin, “AI and Chat GPT in Language Teaching: Enhancing EFL Classroom Support and Transforming Assessment Techniques,” Intern. j., high. educ. pedag., vol. 4, no. 4, pp. 1–15, Dec. 2023, doi: 10.33422/ijhep.v4i4.554.

S. Fathali and F. Mohajeri, “Artificial intelligence in international English language testing system writing assessments: A comparative study of human ratings and DeepAI,” TLTL, vol. 7, no. 4, p. 103131, Nov. 2025, doi: 10.29140/tltl.v7n4.103131.

N. R. Taşkin Bedi̇Zel, “Evolving landscape of artificial intelligence (AI) and assessment in education: A bibliometric analysis,” International Journal of Assessment Tools in Education, vol. 10, no. Special Issue, pp. 208–223, Dec. 2023, doi: 10.21449/ijate.1369290.

Y.-J. Lee, R. O. Davis, and S. O. Lee, “University students’ perceptions of artificial intelligence-based tools for English writing courses,” ONLINE J COMMUN MEDIA TECHNOL, vol. 14, no. 1, p. e202412, Feb. 2024, doi: 10.30935/ojcmt/14195.

Z. Jiang, Z. Xu, Z. Pan, J. He, and K. Xie, “Exploring the Role of Artificial Intelligence in Facilitating Assessment of Writing Performance in Second Language Learning,” Languages, vol. 8, no. 4, p. 247, Oct. 2023, doi: 10.3390/languages8040247.

B. D. Wale, “Artificial intelligence in education: Effects of using integrative automated writing evaluation programs on honing academic writing instruction,” CP, vol. 43, no. 1, pp. 273–287, Feb. 2024, doi: 10.21831/cp.v43i1.67715.

A. Mizumoto and M. Eguchi, “Exploring the potential of using an AI language model for automated essay scoring,” Research Methods in Applied Linguistics, vol. 2, no. 2, p. 100050, Aug. 2023, doi: 10.1016/j.rmal.2023.100050.

N. M. Bui and J. S. Barrot, “ChatGPT as an automated essay scoring tool in the writing classrooms: how it compares with human scoring,” Educ Inf Technol, vol. 30, no. 2, pp. 2041–2058, Feb. 2025, doi: 10.1007/s10639-024-12891-w.

A. Alshehri, “AI’s effectiveness in language testing and feedback provision,” Social Sciences & Humanities Open, vol. 12, p. 101892, 2025, doi: 10.1016/j.ssaho.2025.101892.

D. Y. H. Lee, A. J. Parker, C. F. Norbury, and D. R. Shanks, “Validating AI-assisted evaluation of open science practices in brain sciences: ChatGPT, Claude and human expert comparisons,” Royal Society Open Science, vol. 13, no. 2, p. 250381, Feb. 2026, doi: 10.1098/rsos.250381.

A. Beikian, “Evaluating AI-Driven Feedback in IELTS Writing: A Comparative Analysis of Grok and Qualified Human Examiners.,” Iranian Journal of English for Academic Purposes, vol. 14, no. 2, 2025, doi: https://dor.isc.ac/dor/20.1001.1.24763187.2025.14.2.7.1.

S. Fathali and F. Mohajeri, “Artificial intelligence in international English language testing system writing assessments: A comparative study of human ratings and DeepAI,” TLTL, vol. 7, no. 4, p. 103131, Nov. 2025, doi: 10.29140/tltl.v7n4.103131.

A. N. Sari, “Exploring the Potential of Using AI Language Models in Democratising Global Language Test Preparation,” ijte, vol. 4, no. 4, pp. 111–126, Nov. 2024, doi: 10.54855/ijte.24447.

Y. Anistyasari, S. C. Hidayati, S. Suparji, E. Ekohariadi, and D. A. Kusumaningtyas, “Comparing AI and Human Assessment of Academic Writing Skills: A Kappa Analysis,” E3S Web Conf., vol. 645, p. 06014, 2025, doi: 10.1051/e3sconf/202564506014.

G. P. Georgiou, “Differentiating Between Human-Written and AI-Generated Texts Using Automatically Extracted Linguistic Features,” information, vol. 16, no. 11, p. 979, Nov. 2025, doi: 10.3390/info16110979.

J. W. Creswell, Educational research: Planning, conducting, and evaluating quantitative and qualitative research (4th ed.). Pearson, 2012.

J. L. Gastwirth, Y. R. Gel, and W. Miao, “The Impact of Levene’s Test of Equality of Variances on Statistical Theory and Practice,” Statist. Sci., vol. 24, no. 3, Aug. 2009, doi: 10.1214/09-STS301.

T. Sun, “Potential Use of Artificial Intelligence in ESL Writing Assessment: A Case Study of IELTS Writing Tasks,” telji, vol. 7, no. 2, pp. 42–51, Dec. 2023, doi: 10.22554/ijtel.v7i2.137.

R. Shabara, K. ElEbyary, D. Boraie, and TIRF (The International Research Foundation for English Language Education), “Teachers Or Chatgpt: The Issue Of Accuracy And Consistency In L2 Assessment,” TEwT, vol. 2024, no. 2, 2024, doi: 10.56297/vaca6841/LRDX3699/XSEZ5215.

R. Schmidt-Fajlik, “ChatGPT as a Grammar Checker for Japanese English Language Learners: A Comparison with Grammarly and ProWritingAid,” acoj, vol. 14, no. 1, pp. 105–119, Jun. 2023, doi: 10.54855/acoj.231417.

A. Pfau, C. Polio, and Y. Xu, “Exploring the potential of ChatGPT in assessing L2 writing accuracy for research purposes,” Research Methods in Applied Linguistics, vol. 2, no. 3, p. 100083, Dec. 2023, doi: 10.1016/j.rmal.2023.100083.

D. T. Dien, H. B. Nhu, and B. P. Thao, “Applying Chatgpt To Optimize Efl Teaching And Assessment,” EJEL, vol. 10, no. 1, Jun. 2025, doi: 10.46827/ejel.v10i1.6084.