Chunk #8 — Methods — Genotype data quality control

Source: Polygenic dissection of diagnosis and clinical dimensions of bipolar disorder and schizophrenia.
Embedded: yes

Text

Raw individual genotype data from all samples were uploaded to the Genetic Cluster Computer hosted by the Dutch National Computing and Networking Services. Quality control was performed on each of the 31 sample collections separately. SNPs shared between platforms and pruned for LD were used to identify relatedness. SNPs were removed if they had: 1) minor allele frequency < 1%, 2) call rate < 98%, 3) Hardy-Weinberg equilibrium (p < 1 × 10−6), 4) differential levels of missing data between cases and controls (> 2%), and 5) differential frequency when compared to Hapmap CEU (> 15%). Individuals were removed who had genotyping rates < 98%, high relatedness to any other individual (π^ > 0.9), or low relatedness to many other individuals (π^ > 0.2), or substantially increased or decreased autosomal heterozygosity (|F| > 0.15). We tested 20 MDS components against phenotype status using logistic regression with sample as a covariate. We selected the first four components and any others with a nominally significant correlation (p-value < 0.05) between the component and phenotype. We included these components in our GWAS. This