Chunk #24 — Materials and Methods — Computational benchmarking

Source: Genotype imputation with thousands of genomes.
Embedded: yes

Text

Given these simulated sequences, we sought to create imputation reference panels that would capture features of the anticipated 1000 Genomes panels. We mirrored the overall size of the 1000 Genomes reference set by sampling a panel of 1600 chromosomes from each population, which yielded a total of 4800 chromosomes worldwide, just under the 1000 Genomes target of 5000. The genome-wide sequencing module of the 1000 Genomes Project is based on a low-coverage design, so a certain fraction of low-frequency variants will be missed in the real data. To mimic this ascertainment process, we used power calculations from the 1000 Genomes pilot paper (The 1000 Genomes Project Consortium 2010) to determine the chances of discovering SNPs with different numbers of variant allele copies. The discovery probabilities are shown in Table S2; we applied them separately in each set of 1600 reference chromosomes, under the assumption that true SNPs are discovered (or not) independently of each other. Conditional on a SNP being discovered in any panel, we assumed it was genotyped perfectly in all three panels. This is a reasonable assumption for a benchmarking experiment because sporadic genotyping errors are unlikely to have a noticeable effect on a program’s computational burden.