Chunk #47 — STAR* METHODS — EXPERIMENTAL MODEL AND SUBJECT DETAILS — Genotype quality control, imputation, and association analysis

Source: Genomic Relationships, Novel Loci, and Pleiotropic Mechanisms across Eight Psychiatric Disorders.
Embedded: yes

Text

All primary studies used the standardized PGC ricopili pipeline for quality control, imputation and association testing. Briefly, for each dataset, poor quality SNPs and samples missing >5% SNPs were removed. Next, pre-phasing and imputation were implemented using IMPUTE2 (Howie et al., 2011) and the 1000 Genomes reference panel. High quality SNPs (INFO > 0.8) with low missingness (<1%) were retained. A subset of these markers (MAF > 0.05; pruned for linkage disequilibrium, r2 > 0.02) were used to assess relatedness and population stratification. Only one of any pair of related individuals was retained. Each imputed dataset was tested for association with the disease outcome of interest using an additive logistic regression model in PLINK (Purcell et al., 2007) with age, sex, and 10 principal components included as covariates. Finally, a meta-analysis within each disease category was done using an inverse-weighted fixed effects model. After extracting SNPs commonly exist in all eight disorder studies, we removed 3,591 SNPs whose alleles were incompatible. For palindromic SNPs, we compared allele frequencies between eight studies to check strand ambiguity. 50 SNPs with frequency difference greater than 15% from the 1KG reference was excluded. As a result, 6,786,993 autosomal SNPs remained for further analysis.