paperKB
coga / coga-kb
Help
Sign in

Chunk #4 — MATERIALS AND METHODS — GWAS summary statistics curation and integration

Source
CAUSALdb: a database for disease/trait causal variants identified using summary statistics of genome-wide association studies.
Embedded
yes

Text

We collected publicly available GWAS summary statistics from two major sources according to the investigated cohorts: UKBB and non-UKBB cohorts. The latter includes samples from other specific projects (including meta-analysis, which combines the UKBB cohort). GWAS summary statistics of the UKBB cohort were collected from three resources: Neale Lab UKBB v3 (http://www.nealelab.is/uk-biobank), Gene ATLAS (15) and GWAS ATLAS (16). Although they are all derived from the UKBB cohort, the incorporated samples, quality control (QC) processes, and association models are different. Consequently, the summary statistics among these datasets could be distinct (Supplementary Table S1). For Neale Lab's release data containing over 10 000 tests, to exclude low power results, we only included ICD10 binary traits with total sample size of >50 000 and number of cases >1000, and selected continuous traits with total sample size >50 000 tested by PHESANT (21). Besides, we integrated GWAS summary statistics of non-UKBB cohorts from several public databases, including GWAS Catalog (8), LD Hub (12), GRASP (10), PhenoScanner (13) and dbGaP (22). We also curated hundreds of summary statistics from websites of consortiums such as