Chunk #17 — Materials and Methods — Simulation data sets: WTCCC2 and SIGMA

Source: Multiethnic polygenic risk scores improve risk prediction in diverse populations.
Embedded: yes

Text

Our simulations used real genotypes from the WTCCC2 and SIGMA data sets (rows 1-2 of Table 1). The WTCCC2 data set consists of 15,622 unrelated European samples from a multiple sclerosis study genotyped at 360,557 SNPs after QC (Sawcer et al., 2011; Yang, Zaitlen, Goddard, Visscher, & Price, 2014) (see Web Resources). The SIGMA data set consists of 8,214 unrelated Latino samples genotyped at 2,440,134 SNPs after QC (SIGMA Type 2 Diabetes Consortium et al., 2014) (see Web Resources). We restricted our simulations to 232,629 SNPs present in both data sets (with matched reference and variant alleles) after removing A/T and C/G SNPs to eliminate potential strand ambiguity.