Chunk #31 — Methods — Initialization and MCMC iterations

Source: Integrating sequence and array data to create an improved 1000 Genomes Project haplotype reference panel.
Embedded: yes

Text

Finally, to complete the model, we only use a subset of all available haplotypes when updating each individual as done in SHAPEIT2. We used a carefully chosen subset containing K1 = 400 haplotypes that most closely match the haplotypes of the individual being updated [10]. Note that the haplotype matching is carried out on overlapping windows of size W = 0.1Mb. Moreover, we also found useful to use an additional set of K2 = 200 randomly chosen haplotypes to help the mixing of the MCMC. So in total, we used K = 600 conditioning haplotypes. Using such a large number of conditioning haplotypes is facilitated since SHAPEIT2 has linear complexity with K.