paperKB
coga / coga-kb
Help
Sign in

Chunk #6 — GWAS DATA FORMAT

Source
Quality control procedures for genome-wide association studies.
Embedded
yes

Text

Regardless of the underlying study design (such as family-based or population-based), the most commonly used format for genetic data is the linkage, or pedigree file format (pedfile). This file contains one individual per row, where the first six columns are identifying information (family ID, individual ID, father ID, mother ID, sex, phenotype), and the remaining columns are genotypes (2 columns per genotype; one for each allele). The genotype column-pairs correspond to an ordered set of SNP markers present in an associated file (.map or .bim). Additional phenotypes can also be stored in separate files consisting of family ID, individual ID, then extra columns representing additional phenotypes. There are several variations on pedfile format, including transposed (long) formats (tped), and compressed (binary) formats. Descriptions of these file formats can be found on the PLINK homepage (Table 1). PLINK is a freely available, open source, cross platform application for QC and analysis of GWAS data [17]. We used PLINK for implementing most of the eMERGE network’s QC pipeline.