Chunk #23 — Online Methods — Simulation of sequencing data based on 1000 Genomes Project dataset — Combined whole-exome dataset

Source: Extremely low-coverage sequencing and imputation increases power for genome-wide association studies.
Embedded: yes

Text

departure from Hardy-Weinberg equilibrium. Genotype likelihoods obtained using GATK28 software were passed to Beagle in windows of 1Mb with 250Kb to impute all SNPs identified as polymorphic in the haplotypes of the European 1000 Genomes Project phase 1 data. 103,977 genome-wide SNPs both genotyped and imputed from sequencing across all 909 samples were used in all experiments over combined data (Supplementary Note). To remove effects of high coverage at or near exons we removed data at all SNPs covered at more than 4x.