Chunk #63 — Methods — HAPGEN

Source: Designing genome-wide association studies: sample size, power, imputation, and the choice of genotyping chip.
Embedded: yes

Text

We adopt the model introduced by [23] (denoted LS from now on), who described a new model for linkage disequilibrium, which enjoys many of the advantages of coalescent-based methods (e.g. it directly relates LD patterns to the underlying recombination rate) while remaining computationally tractable for huge genomic regions, up to entire chromosomes. Their model relates the distribution of sampled haplotypes to the underlying recombination rate, by exploiting the identity (2)where h 1 ,…,hn denote the n sampled haplotypes, and ρ denotes the recombination parameter (which may be a vector of parameters if the recombination rate is allowed to vary along the region). This identity expresses the unknown probability distribution on the left as a product of conditional distributions on the right. LS substitute an approximation for these conditional distributions into the right hand side of (3), to obtain an approximation to the distribution of the haplotypes h given ρ (3)If h 1 ,…,hn are n sampled haplotypes typed at S bi-allelic loci (SNPs) LS modelled the distribution of the first haplotype as independent of ρ, i.e. all 2S possible haplotypes