Chunk #15 — Online Methods — Simulation of sequencing data based on 1000 Genomes Project dataset

Source: Extremely low-coverage sequencing and imputation increases power for genome-wide association studies.
Embedded: yes

Text

regions (total of 50Mb) across the genome, randomly chosen to represent the average genome-wide recombination rate and SNP density (Supplementary Note). Reads spanning polymorphic sites identified in the 1000 Genomes Project were simulated assuming a fixed error rate of 1%, per-locus coverage multipliers were drawn from a Gamma distribution Γ(α,β) with shape parameters α = 4 and β =1/α and mean 125 and per-sample coverage multipliers were drawn from a normal distribution N(1,0.2) (matching the empirical IHCS sequencing data) with negative values set to 0. Reads were sampled assuming a Poisson distribution with mean equal to the average coverage times per-locus multiplier times per-sample multiplier. Results were generally insensitive to the choice of simulation parameters (with the exception of average coverage per sample) (Supplementary Note).