The Application of Automate Scoring Program to Supply-Type Items of Basic Competency Test

Rim, Myoung-Hwa; Noh, Eun-Hee; Sim, Jae-Ho

doi:10.29221/jce.2013.16.1.137

J. Curric. Eval. 2013; 16(1):137-160

pISSN: 1229-1544

DOI: https://doi.org/10.29221/jce.2013.16.1.137

Article

기초학력 진단평가 서답형 문항의 자동채점 가능성 탐색

김명화¹, 노은희¹^,^†, 심재호¹

The Application of Automate Scoring Program to Supply-Type Items of Basic Competency Test

Myoung-Hwa Rim¹, Eun-Hee Noh¹^,^†, Jae-Ho Sim¹

Author Information & Copyright ▼

¹한국교육과정평가원 연구위원

¹Research Fellow, Korea Institute for Curriculum and Evaluation

^†Corresponding Author : Eun-Hee Noh, E-mail : noro@kice.re.kr

ⓒ Copyright 2013, Korea Institute for Curriculum and Evaluation. This is an Open-Access article distributed under the terms of the Creative Commons Attribution NonCommercial-ShareAlike License (http://creativecommons.org/licenses/by-nc-sa/4.0) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

Received: Jan 1, 2013 ; Revised: Feb 4, 2013 ; Accepted: Feb 22, 2013

Published Online: Mar 31, 2013

요약

본 연구의 목적은 초등학교 3학년 기초학습 진단평가의 서답형 문항을 대상으로, 2012년에 한국교육과정평가원에서 개발한 한국어 단어ㆍ구 수준 서답형 자동채점 프로그램을 적용하여 자동채점 가능성을 탐색하려는 것이다. 이를 위해 읽기, 쓰기 서답형 문항 중 11문항을 선정하고 각 506∼929명의 답안 자료를 대상으로 자동채점 프로그램을 적용하여 자동채점 단계별 정답 수, 오답 수, 미채점 수, 채점 비율을 계산하였다. 이와 함께 각 문항별로 Kappa계수와 상관계수를 계산하여 제시하였다.

한국어 서답형 자동채점 프로그램을 활용하여 채점한 결과 초3 기초학습 진단평가의 단어ㆍ구 수준의 단답형 문항은 대부분 자동채점이 가능한 것으로 나타났다. 즉, 초3 기초학습 진단평가의 단어ㆍ구 수준의 한국어 서답형 문항에 대한 자동채점 프로그램의 채점 비율과 채점자와의 일치도는 적정한 수준이었고, 일부 채점 오류가 있었으나 그 비율은 적은 편이었다. 채점 오류 중 가장 많은 것은 철자 오류이고, 나머지는 유사어를 인지하지 못하거나 다른 기호나 용어가 포함되어 있는 경우가 대부분이었다.

초3 기초학습 진단평가에 자동채점 프로그램을 적용할 경우, 담임교사들이 직접 채점하므로 맞춤형 교수학습이 가능하도록 학생별 정ㆍ오답 처리 결과를 피드백할 수 있는 기능을 추가하고, 교사들이 자동채점 프로그램을 쉽게 활용할 수 있도록 편의성이 높은 인터페이스를 추가로 개발할 것을 제안하였다.

ABSTRACT

The purpose of this study is to explore possibility of automatic scoring supply type items of the Grade 3 National Diagnostic Assessment of Basic Competency (NDABC) to reduce scoring biirden, to improve scoring efficacy and scoring reliability. This study presented scoring rates, scoring errors, and Kappa(correlation) coefficients of scores between human scoring and automatic scoring in order to ensure scoring reliability. We also analyzed the sources of scoring errors, where the automatic scoring program fails. We used automatic scoring program developed by the Korea Institute for Curriculum and Evaluation (KICE). The results showed that the scoring rate was very high(91.5~100%), and that the Kappa coefficients depend on items. The numbers of scoring error were 1 〜42. The sources of scoring errors were caused by spelling errors, the non-recognition of analogous terms and symbols.

This study presented two suggestions as following. First an automatic scoring program for NDABC should be supplemented to give feedback and information about wrong answer to teachers and students. Second the program should focus on providing convenient interface for teachers.

Keywords: 기초학력 진단평가; 서답형 문항; 자동채점; 대규모 평가; 초등학교 3학년 기초학습 진단평가; 읽기; 쓰기

Keywords: basic competency test; supply-type item; automatic scoring; large-scale assessment; Grade 3 National Diagnostic Assessment of Basic Competency; reading; writing