Chunk #31 — Methods — Simulation settings

Source: Leveraging functional annotations in genetic risk prediction for human complex diseases.
Embedded: yes

Text

We simulated traits from WTCCC genotype data, which contain 15,918 individuals genotyped for 393,273 SNPs after filtering variants with missing rate above 1% and individuals with genetic relatedness above 0.05. We first generated two annotations and each annotation was simulated by randomly selecting 10% of the genome, denoted as A1 and A2, which we assume are known when applying AnnoPred. Denote the heritability of the trait as hg2 (25% or 50%) and the number of causal variants as m (300 or 3,000). Causal variants were generated as follows: m3 causal variants were selected from A1, m3 from A2 and the rest from (A1UA2)C corresponding to a high enrichment of signals in A1 and A2. Effect sizes of causal variants were sampled from N(0,hg2m). For each simulation, we used 70% of the data to calculate the training summary statistics and randomly divided the rest 30% into two parts for parameter tuning. We also randomly selected half of the training data to calculate summary statistics in order to study the effect of sample size on prediction accuracy.