paperKB
coga / coga-kb
Help
Sign in

Chunk #61 — Findings — Performance comparisons — Linkage disequilibrium-based variant pruning

Source
Second-generation PLINK: rising to the challenge of larger and richer datasets.
Embedded
yes

Text

The PLINK 1.0 –indep-pairwise command is frequently used in preparation for analyses which assume approximate linkage equilibrium. In Table 4, we compare PLINK 1.07 and PLINK 1.9 execution times for some reasonable parameter choices. The r2 threshold for “synth2” was chosen to make the “synth1p” and “synth2p” pruned datasets contain similar number of SNPs, so Tables 2 and 3 could clearly demonstrate scaling with respect to sample size.Table 4 –indep-pairwise runtimes (sec) ParametersDatasetMachinePLINK 1.07PLINK 1.90Ratio 50 5 0.5 synth1Mac-2701.30.631.1 kMac-12569.40.551.0 kLinux32-8572.70.95600Linux64-5124620.60770Win32-21163.93.2360Win64-21091.91.01.1 k 700 70 0.7 synth2Mac-2∼120 k31.93.8 kMac-1263.0 k20.63.06 kLinux32-857.4 k66.0870Linux64-512∼120 k26.44.5 kWin32-2139.3 k127.31.09 kWin64-2∼200 k22.98.7 k 20000 2000 0.5 chr1Mac-2nomem1520.1Mac-12nomem1121.7Linux32-8nomem4273.9Linux64-512∼950 k1553.3610Win32-2nomem4912.7Win64-2nomem1205.11000gMac-2nomem20.5 kMac-12nomem14.5 kLinux32-8nomem54.5 kLinux64-512∼13000 k20.2 k640Win32-2nomem64.5 kWin64-2nomem14.7 kThis command is used to select a set of genetic markers which are not too highly correlated with one another. The PLINK 1.9 implementation benefits from laziness (i.e. the correlation coefficient between a pair of variants is no longer computed when it is not needed by the main pruning algorithm) and bitwise operations.