paperKB
coga / coga-kb
Help
Sign in

Chunk #36 — Methods — 1000GP Illumina Omni2.5 SNP array data

Source
Integrating sequence and array data to create an improved 1000 Genomes Project haplotype reference panel.
Embedded
yes

Text

For the haplotype scaffold, we used a set of 2,141 samples genotyped on Illumina Omni2.5M. This set of samples includes all the 1000GP Phase 1 samples. This dataset contains some parent-child duos and mother-father-child trios, and in some cases just a subset of each family has been sequenced. Supplementary Table 1 gives details of sequenced and non-sequenced samples. We found that 380 and 30 Phase 1 1000GP sequenced samples are part of trios and duos in this data set. SNPs with a missing data rate above 10% and a Mendel error rate above 5% were removed, leaving a total of 2,368,234 SNPs ready for phasing. We phased this data using SHAPEIT2 (r644) using all default settings (W = 2 Mb, K = 100 haplotypes, iterations=45) and using all available family information. We used the resulting haplotypes as a scaffold to call the variant sites in 1000GP. The whole genome overlap between both data sets contains 2,183,314 SNPs.