Article

Agreement rates between automated essay scoring systems and human raters: Meta-analysis

ln-Soo Shin 1 ,
Author Information & Copyright
1Assistant Prof. Jeon-Ju Univ.
Corresponding Author : , E-mail : s9065031@jj.ac.kr

ⓒ Copyright 2014, Korea Institute for Curriculum and Evaluation. This is an Open-Access article distributed under the terms of the Creative Commons Attribution NonCommercial-ShareAlike License (http://creativecommons.org/licenses/by-nc-sa/4.0) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

Received: Sep 01, 2014 ; Revised: Sep 30, 2014 ; Accepted: Oct 13, 2014

Published Online: Nov 30, 2014

Abstract

Automated essay scoring (AES) is defined as the scoring of written prose using computer technology. The objective of this meta-analysis is to consider the claim that machine scoring of writing test responses agrees with human raters as much as humans agree with other humans. The effect size is the agreement rate between AES and human scoring estimated using a random effects model. The exact agreement rate between AES and human scoring is 52%, compared with an exact agreement rate of 54% between humans. The adjacent agreement rate between AES and human scoring is 93%, compared to an adjacent agreement rate of 94% between humans. This meta-analysis shows that the agreement rate between AES and human raters is very comparable. This study also compares the subgroup analysis of agreement rates using study characteristic variables such as publication status, AES type, essay type, exam type, human expertise, country, and school level. Implications of the results and potential future research are discussed in the conclusion.

Keywords: automatic essay scoring; agreement rate; meta-analysis; effect size