WGS of the TOPMed samples was performed over multiple studies, years and sequencing centres. To minimize batch effects, we standardized laboratory methods, mapped and processed sequence data centrally using a single pipeline, and performed variant calling and genotyping jointly across all samples (see Methods). We annotated each variant site with multiple sequence quality metrics and trained machine learning filters to identify and exclude inconsistencies that are revealed when the same individual was sequenced repeatedly. Available WGS data were processed periodically to produce genotype data ‘freezes’. The 53,831 samples described here are drawn from TOPMed freeze 5.