Sequence data processing was performed periodically to produce genotype data ‘freezes’ that included all samples available at the time. All sequences were remapped using BWA-MEM76 to the hs38DH 1000 Genomes build 38 human genome reference including decoy sequences, following the protocol published previously77. Variant discovery and genotype calling was performed jointly, across TOPMed studies, for all samples in a given freeze using the GotCloud78,79 pipeline. This procedure results in a single, multi-study genotype call set. A support vector machine quality filter for variant sites was trained using a large set of site-specific quality metrics and known variants from arrays and the 1000 Genomes Project as positive controls and variants with Mendelian inconsistencies in multiple families as negative controls (see online documentation80 for more details). After removing all sites with a minor allele count less than 2, the genotypes with a minimal depth of more than 10× were phased using Eagle 2.481. Sample-level quality control included checks for pedigree errors, discrepancies between self-reported and genetic sex, and concordance with previous genotyping array data. Any errors detected were addressed before dbGaP submission.