Abstract
<jats:p>This study evaluated the reliability, consistency, discrimination power, and difficulty level of test items administered to students from Delgermörön and Ikh-Uul soums of Khuvsgul province, and the 48th school in Ulaanbaatar. The test consisted of five items, each scored on a scale from 0 to 4 points. Based on the measures of central tendency and variability, the average test score was at a moderate level, and the distribution was found to be non-normal. The reliability coefficient, Cronbach’s alpha, was 0.74, indicating an acceptable to high level of internal consistency.Items 1 and 4 showed low discrimination and were relatively difficult, while the remaining items demonstrated moderate difficulty levels. To construct a more valid and reliable test, items should be designed so that the mean, median, mode, and standard deviation are approximately aligned, and the reliability coefficient (α) should be improved to exceed 0.8. Additionally, balancing the cognitive process levels across items and administering the revised test at least twice will help ensure the development of a consistent and dependable assessment instrument.</jats:p>