Non-restricted Winter 2026 convocation theses and dissertations are available in ERA. Congratulations to all our graduates!

Comparing the correctness of classical test theory and item response theory in evaluating the consistency and accurancy of student proficiency classifications

Loading...
Thumbnail Image

Institution

http://id.loc.gov/authorities/names/n79058482

Degree Level

Doctoral

Degree

Doctor of Philosophy

Department

Department of Educational Psychology

Specialization

Measurement, Evaluation and Cognition

Supervisor / Co-Supervisor and Their Department(s)

Examining Committee Member(s) and Their Department(s)

Citation for Previous Publication

Link to Related Item

Abstract

The purposes of this study were: 1) to compare the values of decision consistency (DC) and decision accuracy (DA) yielded by three commonly used estimation procedures: Livingston-Lewis (LL) and the compound multinomial procedure (CM) procedures, both of which are based on classical test theory approach, and Lee’s IRT procedure based on item response theory approach and 2) to determine how accurate and precise these procedures are. Two population data sources were used: the Junior Reading (N = 128,103) and Mathematics (N = 127,639) assessments administered by the Education Quality and Accountability Office (EQAO) and the three entrance examinations administered by the University of Malawi (U of M; N = 6,191). To determine the degree of bias and the level of precision for both DC and DA, 100 replicated random samples corresponding to four sample sizes (n = 1,500, 3,000, 4,500, 6,000) for the EQAO populations and two sample sizes (n = 1,500, 3,000) for the U of M population were selected. At the population level, there was an interaction between the three procedures and the four cut-scores. While the differences between the values of DC and the values of DA among the three procedures tended to be small for one or both extreme cut-scores, the differences tended to be larger when the cut-score was closer to the population mean. The IRT procedure tended to provide the highest values for both DC and DA, followed in turn by the CM and LL procedures.
At the sample level, the estimates of DC and DA yielded by the three estimation procedures were unbiased and precise. Consequently, the findings at the population are applicable at the sample level. Therefore, based on the findings of the present study, the compound multinomial procedure should be used to determine DC and DA when classical test score theory is used to analyze a test and its items and the IRT procedure should be used to determine DC and DA when item response theory is used to analyze a test and its items.

Item Type

http://purl.org/coar/resource_type/c_46ec

Alternative

License

Other License Text / Link

This thesis is made available by the University of Alberta Libraries with permission of the copyright owner solely for non-commercial purposes. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.

Language

en

Location

Time Period

Source