Understanding the impact of genetic and environmental factors on physical and mental health outcomes is critical if we are to develop effective preventive and treatment interventions. Large-scale cross-sectional and cohort studies provide an invaluable resource to support these efforts, in particular with respect to genetic influences where the small effects associated with common genetic variants require very large samples to achieve adequate statistical power. A study can be used to draw conclusions about the population it represents (the “intended study population”), but generalizability to other populations depends upon us knowing exactly what the actual study population is. However, participants who volunteer to participate in studies may not be representative of the intended study population, in which case the actual study population is unknown.1