Chunk #13 — Online methods — Evaluation of genotype calling process

Source: A reference panel of 64,976 haplotypes for genotype imputation.
Embedded: yes

Text

We tested the genotype calling process on data from chromosome 20 with different combinations of site lists and sample sets to assess both the effects of site filtering and the benefits of increasing samples size. We evaluated 3 different site lists: the 1000 Genomes Phase 3 set of sites (775,927), our HRC MAC5 site list (1,128,114) and our HRC MAC5 site list with additional site filtering (1,006,559). We ran the genotype calling method on 3 different sets of samples : the 2,525 original 1000 Genomes Phase 3 samples, a subset of 13,309 HRC samples that we used at an early stage of HRC testing (HRC Pilot) from studies 1000GP3, AMD, GoNL, GoT2D, ORCADES, SardinIA, FINLAND and UK10K, and the near-final full set of 32,905 HRC samples. We called genotypes using GLPhase on each of these 9 datasets and examined genotype discordance compared to Illumina OMNI2.5M genotypes produced by the 1000 Genomes Project. For this comparison, we focused only on genotypes from 365 samples shared across the 3 sample sets and at 42,244 SNP sites. We calculated percentage discordance for the