Chunk #23 — Materials and Methods — Computational benchmarking

Source: Genotype imputation with thousands of genomes.
Embedded: yes

Text

To produce computational benchmarks in a realistic imputation scenario, we simulated data that model the large, ancestrally diverse reference panel that is being generated by the 1000 Genomes Project. Our simulations were based on the sfs_code program (Hernandez 2008), which uses a pre-specified demographic model (typically obtained from unbiased site frequency spectra) and DNA sequence annotations to drive a forward simulation that models the effects of genetic drift and natural selection on a population of chromosomes. Ryan Hernandez kindly provided us with the output of an sfs_code run that used a joint demographic model of three HapMap panels (CEU, CHB, and YRI) on chromosome 17p12 (a 4.7-Mb region). At the end of the forward simulation, the program sampled 10,000 haplotypes from each of the three populations. These haplotypes do not capture the full demographic complexity of the 1000 Genomes sample set, but the simulation does provide realistic DNA sequence data for three major sources of human genetic variation.