paperKB
coga / coga-kb
Help
Sign in

Chunk #19 — ONLINE METHODS — Optimized file structure for large reference panels

Source
Next-generation genotype imputation service and methods.
Embedded
yes

Text

We calculated the order of disk space saved using m3vcf files in comparison to the usual VCF files (in both unzipped and zipped formats) and found that, for 1000 Genomes Project Phase 1 with ~1,000 reference samples, we save 60% of disk space using zipped m3vcf files in comparison to zipped VCF files and 93% when compared across unzipped formats. The saving is even greater for larger panels. For example, for the HRC reference panel with ~33,000 samples, we save ~84% and 98% of disk space using zipped and unzipped m3vcf files, respectively (Supplementary Table 4).