paperKB
coga / coga-kb
Processing
Help
Sign in

Chunk #7 — MATERIALS AND METHODS — Pre-fine-mapping QC and summary statistics standardization

Source
CAUSALdb: a database for disease/trait causal variants identified using summary statistics of genome-wide association studies.
Embedded
yes

Text

GWAS fine-mapping methods based on summary information usually require complete association statistics [such as variant coordinate, minor allele frequency (MAF), effect/non-effect allele, P, effect size (beta-coefficient; BETA), and standard error (SE)] and LD information on variants. To ensure that these curated GWAS summary statistics fit the input requirements of fine-mapping tools, we performed a series of QC steps on the raw downloaded data. First, we inspected the coordinates and dbSNP ID (rsID) for each variant and converted non-GRCh37 coordinates to GRCh37 (hg19) coordinates. When either the coordinate or rsID was missing, we extracted it from dbSNP 151. The statistics were excluded when both the coordinates and rsID were missing. Second, CAUSALdb only curated summary statistics with an explicitly defined effect allele. When only the effect allele was available, the non-effect allele was inferred from 1KGP biallelic sites, and we excluded variants if the non-effect allele could not be clearly determined. Third, MAF is required by certain fine-mapping tools, but it was at times unavailable in the raw data. In such cases, we converted other allele frequencies (such as reference allele