A second consideration with respect to samples in epigenetic studies is the degree to which they are representative of the population in question, a key concern for both statistical and external validity. With respect to statistical validity, studies that oversample particular populations may allow for a high degree of clustering, or shared characteristics between individuals in samples. These shared characteristics may violate the assumption of independence of observations—the fundamental assumption in most regression modeling techniques that the characteristics of individuals that influence the likelihood of developing an outcome are independent from one another.[42,43] In this way, analyses of data with a high degree of clustering may bias study findings. Clustering also compounds the sample size problems discussed above, as it decreases the effective sample size in regression models,[44] forcing investigators to recruit even larger samples to ensure statistical validity. Moreover, samples that are not representative with respect to geography, age distribution, race and/or ethnicity, or baseline health may limit the external validity of findings. Poor generalizability limits our capacity to translate epigenetic findings into meaningful, population-level interventions.