There is, however, a question of how much DIF is too much DIF. If all items displayed DIF, this would imply no commonality of measurement between subpopulations and a lack of comparability of scores. DIF among some items is often expected and just how much DIF is tolerable is a matter of debate (Byrne et al., 1989; Cheung & Rensvold, 1998; Reise et al., 1993; Steenkamp & Baumgartner, 1998). Strictly speaking, only one invariant (non-DIF) item is required to put the measures on an equivalent scale across subpopulations, but the odds of correctly detecting which items are invariant versus not are reduced when many items display DIF (Yoon & Millsap, 2007). The less DIF, the more confidence one can have that the scale of measurement is truly invariant across persons. Although a majority of our depression items displayed some DIF, less than half of the items displayed DIF due to any single covariate, suggesting that we could interpret our measure as being commensurate in scale across subpopulations defined by study, age, gender, and parental history of alcoholism.