paperKB
coga / coga-kb
Help
Sign in

Chunk #14 — Findings — Improvements in PLINK 1.9 — Bit population count

Source
Second-generation PLINK: rising to the challenge of larger and richer datasets.
Embedded
yes

Text

Given PLINK 1 binary data, |Ix,y|, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document} $\overline {v}$ \end{document}v¯, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document} $\overline {w}$ \end{document}w¯, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document} $\overline {v^{2}}$ \end{document}v2¯, and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document} $\overline {w^{2}}$ \end{document}w2¯ can easily be expressed in terms of bit population counts. The dot product \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document} $\sum _{i=1}^{n}v_{i}w_{i}$ \end{document}∑i=1nviwi is trickier; to evaluate it, we preprocess the data so that the genotype bit vectors X and Y encode homozygote minor calls as 002, heterozygote and missing calls as 012, and homozygote major calls as 102, and then proceed as follows: Set Z := (XORY) AND01010101… 2Evaluate popcount2(((XXORY) AND (10101010… 2 - Z)) ORZ),where popcount2() sums 2-bit quantities instead of counting set bits. (This is actually cheaper than PLINK’s regular population count; the first step of software popcount() is reduction to a popcount2() problem).Subtract the latter quantity from n.