paperKB
coga / coga-kb
Help
Sign in

Chunk #27 — Methods — TOPMed 5b sequencing and phasing

Source
Use of >100,000 NHLBI Trans-Omics for Precision Medicine (TOPMed) Consortium whole genome sequences improves imputation quality and detection of rare variant associations in admixed African and Hispanic/Latino populations.
Embedded
yes

Text

Harvard, the University of Washington Northwest Genomics Center, Illumina Genomic Services, Macrogen Corp., and Baylor Human Genome Sequencing Center). Sequence data files were transferred from sequencing centers to the TOPMed Informatics Research Center (IRC), where reads were aligned to human genome build GRCh38, using a common pipeline, and joint genotype calling was undertaken. Variants were filtered using a machine learning based support vector machine (SVM) approach, using variants present on genotyping arrays as positive controls and variants with many Mendelian inconsistencies as negative controls. After filtering potentially problematic variant sites, freeze 5b contained ~438 million single nucleotide polymorphisms and ~33 million short insertion-deletion variants. For our imputation analyses, we excluded from the reference panel variants with an overall allele count of 5 or less (leaving 88,062,238 variants in our reference panel, Table 1). Additional sample level quality control (such as detection of sex mismatches, pedigree discrepancies, sample swaps, etc.) was undertaken by the TOPMed Data Coordinating Center (DCC).