paperKB
coga / coga-kb
Help
Sign in

Chunk #70 — Findings — PLINK 2.0 design — Data compression

Source
Second-generation PLINK: rising to the challenge of larger and richer datasets.
Embedded
yes

Text

To do our part to make “strong” sub-linear compressive genomics a reality, the PLINK 2 file format will introduce support for “deviations from most common value” storage of low-MAF variants. For datasets containing many samples, this captures much of the storage efficiency benefit of having real reference genomes available, without the drawback of forcing all programs operating on the data to have access to a library of references. Thanks to PLINK 2.0’s translation layer and file conversion facilities, programmers will be able to ignore this feature during initial development of a tool, and then work to exploit it after basic functionality is in place.