significant at p ≤ 0.008. When significant overall DIF was identified, we explored which items contributed to DIF by identifying anchor items whose thresholds or factor loadings did not significantly differ by group in order to set the metric between groups and allow individual testing for DIF in the remaining items. We next tested each item for DIF by constraining each individual item to be equal across groups. Using the chi-square statistic, we compared each constrained model to the baseline unconstrained model.