paperKB
coga / coga-kb
Help
Sign in

Chunk #51 — ONLINE METHODS — DNA genotyping, QC, ancestral evaluation and polygenic scoring

Source
Gene expression elucidates functional impact of polygenic risk for schizophrenia.
Embedded
yes

Text

To infer ancestry from genetic data, we identified a set of high quality autosomal SNPs from the pre-imputed data with the following properties: an rs dbSNP database identifier, known physical location in the hg19 reference genome, alleles coded as either A, C, G, or T, call rate ≥ 99.5%, minor allele frequency MAF > 0.05. These criteria yielded 552,351 SNPs. Next, using PLINK57, we performed LD pruning using sliding windows of 50 SNPs, with steps of 5 and a pairwise r2 < 0.04 and found 28,663 SNPs. Ancestry was determined using clusterGem in GemTools (arXiv:1104.116260,61, http://www.wpic.pitt.edu/wpiccompgen/GemTools/GemTools.htm). Gemtools found that 5 dimensions and 7 clusters were sufficient to describe the ancestry space. Because one sample was missing key phenotypic information, 667 subjects were assigned ancestry based on DNA genotypes. Supplementary Fig. 1B, C describe the distribution of nominal ancestry and diagnosis and plot several informative dimensions of genetically-inferred ancestry.