Chunk #8 — Results

Source: Integrating sequence and array data to create an improved 1000 Genomes Project haplotype reference panel.
Embedded: yes

Text

To demonstrate the benefits of this new method, we applied it to the 1000GP Phase 1 sequence data to produce new haplotypes. We then compared these haplotypes to the existing set of 1000GP Phase 1 haplotypes, and also to a set of haplotypes produced by Beagle. In all the experiments, we used the set of GLs available on the FTP website for 1,092 Phase 1 samples. These consist of GLs at 36,820,992 SNPs, 1,384,273 biallelic Indels and 14,017 structural variations. To create the haplotype scaffold (Omni2.5M), we used IlluminaOmni2.5 genotypes available on 2,141 samples and 2,368,234 SNPs. We phased this dataset using the existing version of SHAPEIT2 (r644). Supplementary Table 1 shows the number of trios, duos and unrelated samples in each of the 14 populations. To mimic the use of a sparser haplotype scaffold, we also created a new scaffold by thinning the Omni scaffold down to 1,000,000 SNPs (1M). We then phased the GL dataset on chromosome 20 in three different ways using (a) the Omni2.5M scaffold, (b) the 1M scaffold, (c) no scaffold.