bias, protocol violations, and other error sources can interfere with a study’s validity (4, 30). In two studies of antidepressants, only about half of all trained raters were able to detect drug effects on patients’ clinical status (3, 4). Even with two highly reliable rating scales for Alzheimer’s disease (AD), random measurement errors had sufficient magnitude to seriously interfere with CT power and clinical management of patients (5). These problems arise because tests of statistical reliability primarily evaluate whether repeated ratings maintain the original order of the subjects’ responses and not the variations present among the repeated ratings of each individual subject. Despite the use of rating scales that are highly reliable by these tests, imprecision and inaccuracy have been impugned as sources for failures in AD, depression, stroke, and other diseases (3–5, 30–32).