Ratings by
two assessors for 14 of the 20 APP items were identical among 70% or more of the 30 pairs. Figure 1 shows the percent exact agreement and the percent close agreement, ie, within 1 point on the 5-point scale, for each of the 20 items. There was complete agreement between 24 pairs of raters (80%) for the overall global rating of student performance. The remaining six pairs of raters all scored within one point of each other on the 4-point Global Rating Scale. A scatterplot was visually assessed for violation of assumptions of linearity and homoscedasticity. Figure 2 shows the positive, strong SKI-606 concentration (Cohen 1988), linear, significant relationship between Rater 1 and Rater 2 total APP scores [r = 0.92 (95% CI 0.87 to 0.95), p < 0.0005]. The coefficient of determination (r2 = 0.85) indicates that 85% (95% CI 75% to 90%) of the variance in a rater’s scores was explained
by variance in the other rater’s scores. The ICC(2,1) (two-way random effects model) for total APP scores for the two raters was 0.92 (95% CI 0.84 to 0.96). The ICC(2,1) for the global rating scale scores was 0.72 (95% CI 0.50 to 0.86). Table 2 presents the ICC(2,1) results for the total score, each of the 20 APP items, and the Global Rating Scale. The SEM for the total score was 3.2 APP points (scale width 0–80) indicating that a student’s true score will typically fall between an obtained score plus or minus 3.2 (at 68% confidence). The 95% confidence band around a single score was 6.5 APP points (given t(0.05, df = 29) = 2.045). This implies that in 95% Selleck MG 132 of cases a student’s true APP total score will fall between the obtained score plus or minus 6.5 points. Minimal detectable change scores were calculated for the total and individual item score data at the 90% confidence interval. The MDC90 for the APP total scores was 7.86 (given t(0.1, df = 29) = 1.699). This implies that a change in score
second of around 8 APP total score units is required to be confident that for 90% of students demonstrating changes of this magnitude, real change in professional competence has occurred. As the APP scale width is 0–80, the MDC90 represents 9% of the scale. For each item the MDC90 ranges from 0.60 to 0.85. Therefore on the 5-point rating scale used to score each item, a change in rating of around 1 point (the minimal observable change) indicates that real change in performance on that item has occurred beyond random variability. A Bland and Altman plot was constructed to display errors in estimates of total APP scores (Figure 3). In this plot, differences between raters’ marks were plotted against the mean of the two raters’ marks, and the 95% limits of agreement were defined. The Bland-Altman plot shows that the disagreement between raters was not greater among high scores than among low scores, or vice versa.