Chunk #41 — Findings — Improvements in PLINK 1.9 — Other noteworthy algorithms — Haplotype block estimation

Source: Second-generation PLINK: rising to the challenge of larger and richer datasets.
Embedded: yes

Text

later point, we just update a small number of “strong LD pairs within last k variants” and “recombination pairs within last k variants” counts while processing the data sequentially, saving only final haploblock candidates. This reduces the amount of time spent looking up out-of-cache memory, and also allows much larger datasets to be processed.Since “strong LD” pairs must outnumber “recombination” pairs by 19 to 1, it does not take many “recombination” pairs in a window before one can prove no haploblock can contain that window. When this bound is crossed, we take the opportunity to entirely skip classification of many pairs of variants.