paperKB
coga / coga-kb
Help
Sign in

Chunk #4 — Findings — Improvements in PLINK 1.9 — Bit-level parallelism

Source
Second-generation PLINK: rising to the challenge of larger and richer datasets.
Embedded
yes

Text

For example, when comparing two DNA segments, it is frequently useful to start by computing their Hamming distance. Formally, define two sequences {a1,a2,…,am} and {b1,b2,…,bm} where each ai and bi has a value in {0,1,2,ϕ}, representing either the number of copies of the major allele or (ϕ) the absence of genotype data. Also define an intersection set Ia,b:={i:ai≠ϕ and bi≠ϕ}. The “identity-by-state” measure computed by PLINK can then be expressed as \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document} $$1 - \frac{\sum_{i\in I_{a,b}}|a_{i} - b_{i}|}{2|I_{a,b}|}. $$ \end{document}1−∑i∈Ia,b|ai−bi|2|Ia,b|. where |Ia,b| denotes the size of set Ia,b, while |ai−bi| is the absolute value of ai minus bi. The old calculation proceeded roughly as follows: