Abstract
<jats:p>The purpose of this study was to evaluate the quality of recurring multiple-choice items used in formative assessments across three semesters of General Chemistry II (CHEM202), using item difficulty, item discrimination, and distractor functioning as key indicators, and to identify patterns of change across semesters. The study drew on data from Fall 2024–2025, Spring 2024–2025, and Fall 2025–2026, and analyzed a total of 41 recurring items. Exploratory follow-up analyses were conducted for 13 items that underwent change. The data were analyzed using the Friedman test, Mann–Whitney U test, independent-samples t test, paired-samples t test, Wilcoxon signed-rank test, Pearson’s χ² test, and binomial generalized linear models. The findings showed that item difficulty changed significantly across the three semesters, whereas item discrimination remained relatively stable overall. However, between Spring 2024–2025 and Fall 2025–2026, the discrimination of modified items improved more than that of unchanged items. Item-level analyses further indicated that changes in quality were not uniform across all items, but instead emerged in item-specific ways. These findings suggest that the quality of multiple-choice items should not be evaluated solely on the basis of one-time performance indicators; rather, recurring item tracking should be integrated with distractor analysis.</jats:p>