Chunk #4 — Findings — Improvements in PLINK 1.9 — Bit-level parallelism

Source: Second-generation PLINK: rising to the challenge of larger and richer datasets.
Embedded: yes

Text

For example, when comparing two DNA segments, it is frequently useful to start by computing their Hamming distance. Formally, define two sequences {a1,a2,…,am} and {b1,b2,…,bm} where each ai and bi has a value in {0,1,2,ϕ}, representing either the number of copies of the major allele or (ϕ) the absence of genotype data. Also define an intersection set Ia,b:={i:ai≠ϕ and bi≠ϕ}. The “identity-by-state” measure computed by PLINK can then be expressed as \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document} $$1 - \frac{\sum_{i\in I_{a,b}}|a_{i} - b_{i}|}{2|I_{a,b}|}. $$ \end{document}1−∑i∈Ia,b|ai−bi|2|Ia,b|. where |Ia,b| denotes the size of set Ia,b, while |ai−bi| is the absolute value of ai minus bi. The old calculation proceeded roughly as follows: