Chunk #0 — Results

Source: Extremely low-coverage sequencing and imputation increases power for genome-wide association studies.
Embedded: yes

Text

To explore the effectiveness of GWAS based on low coverage sequencing, we simulated sequencing data at various coverage levels, accounting for sequencing errors as well as variation in average coverage across samples and loci. We used the 762 haplotypes inferred from the 381 European samples of the 1000 Genomes Project (phase 1, June 2011 release), and restricted the analysis to 10 distinct 5Mb regions (total of 50 Mb, containing 150,261 SNPs) that were randomly chosen to represent the average genome-wide recombination rate and SNP density (Supplementary Note, Supplementary Table 1). One-half of the haplotypes were used to build simulated data, and the other half were used as an imputation reference panel. Simulated data were used to infer genotype dosages at known SNPs using Beagle12, an imputation engine appropriate for analysis of sequencing data. To assess the accuracy of imputation, we used the squared correlation (r2) between imputed dosages and true genotypes, which quantifies the reduction in effective sample size in GWAS due to imperfect imputation13 (Online Methods).