Chunk #36 — Data access — GWAS summary statistics and validation data

Source: Leveraging functional annotations in genetic risk prediction for human complex diseases.
Embedded: yes

Text

For Crohn’s disease, we trained the model using summary statistics from International Inflammatory Bowel Disease Genetics Consortium (IIBDGC; Ncase = 6,333 and Ncontrol = 15,056) [25]. Samples from the Wellcome Trust Case Control Consortium (WTCCC) were removed from the meta-analysis and used as the validation dataset (Ncase = 1,689 and Ncontrol = 2,891) [26]. For breast cancer, we trained the model using summary statistics from Genetic Associations and Mechanisms in Oncology (GAME-ON) study (Ncase = 16,003 and Ncontrol = 41,335) [27], and tested the performance using samples from the Cancer Genetic Markers of Susceptibility (CGEMS) study (Ncase = 966 and Ncontrol = 70) [28]. Shared samples between CGEMS and GAME-ON were removed. We used samples from the CIDR-GWAS of breast cancer for trans-ethnic analysis (Ncase = 1,666 and Ncontrol = 2,038) [29]. For rheumatoid arthritis, we used summary statistics from a meta-analysis with 5,539 cases and 20,169 controls to train the model [30]. WTCCC samples were removed from the meta-analysis and used for validation (Ncase = 1,829 and Ncontrol = 2,892) [26]. For type-II diabetes, the training dataset is Diabetes