From around 10,000 BioMe study samples present in TOPMed freeze 8, we randomly selected 1,000 samples for which whole-exome sequencing (WES) data were available. These samples were whole-exome sequenced using Illumina v4 HiSeq 2500 at an average 36.4× depth. Genetic variants were jointly called using the GATK v.3.5.0 pipeline across all 31,250 BioMe samples with WES data. A series of quality control filters, known as the Goldilocks filter, were applied before data delivery to the Charles Bronfman Institute for Personalized Medicine (IPM). First, a series of filters was applied to particular cells comprising combinations of sites and samples—that is, genotypic information for one individual at one locus. Quality scores were normalized by depth of coverage and used with depth of coverage itself to filter sites, using different thresholds for SNVs and short indels. For SNVs, cells with depth-normalized quality scores less than 3, or depth of coverage less than 7 are set to missing. For indels, cells with depth-normalized quality scores less than 5, or depth of coverage less than 10 are set to missing. Then, variant sites were filtered,