Chunk #11 — Results

Source: Integrating sequence and array data to create an improved 1000 Genomes Project haplotype reference panel.
Embedded: yes

Text

We also used the CG samples not included in Phase 1 to assess the quality of the estimated haplotypes when used as a reference panel for GWAS imputation [5, 10]. We divided the CG1 sites into those on the Illumina 1M SNP array, and then used these together with the different haplotype sets to impute the CG1 genotypes not on the array. We then measured the imputation accuracy against the CG1 genotypes. In the same way as previous evaluations [1], we stratified SNPs and Indels by their non-reference allele frequency in the 1000GP haplotypes so that each site is always assigned to the same frequency bin in the results. For each SNP or Indel we measured the R2 of the imputed dosage estimates with the validation genotypes. Figure 1b plots the non-reference allele frequency versus R2 and shows clearly that the use of a haplotype scaffold clearly leads to a increase in R2 especially at lower frequencies. For example, at 0.5% frequency the SHAPEIT2 haplotypes made with an 2.5M scaffold increase R2 by 0.1 compared to the 1000GP Phase 1