Chunk #39 — Methods — Complete Genomics (CG) validation data

Source: Integrating sequence and array data to create an improved 1000 Genomes Project haplotype reference panel.
Embedded: yes

Text

We also used CG samples that are not in 1000GP nor related with any samples in 1000GP to assess the performance of the various call sets when used as reference panels for imputation. In CG1, we found 20 such samples, and 51 in CG2. To mimic a standard GWAS, we extracted genotypes at subsets of SNPs in both data sets: for CG1, at all SNPs on chromosome 20 also included in the Illumina 1M chip for CG1 (set A), and for CG2, at all SNPs on chromosome 10 also included in the Illumina 1M (set B) and Illumina Omni2.5M (set C) chips. We then imputed all remaining CG SNP genotypes available using Impute2 (default parameters) and the various call sets as reference panels. We imputed 315,326 SNPs from set A, 823,570 SNPs and 27,511 Indels from set B, and 775,818 SNPs and 27,511 Indels from set C. We defined as isolated, an indel with no other indel in the 50bp flanking regions. We found 23,641 (85.9%) isolated indels and 3,870 (14.1%) non isolated indels. All these variants were then classified