Chunk #2 — METHODS AND MATERIALS — Simulation and Recovery of SNP Heritability for Binary Traits — Data Generating Model.

Source: Pervasive Downward Bias in Estimates of Liability-Scale Heritability in Genome-wide Association Study Meta-analysis: A Simple Solution.
Embedded: yes

Text

For all simulations, data were simulated using European population LD scores provided by the original LDSC developers (10) for 1,184,461 HapMap3 SNPs, excluding the major histocompatibility complex region and sex chromosomes, according to simulation procedures first described in de la Fuente et al. (11). More specifically, summary statistics were simulated following the multivariate LDSC equation: (8) [Z1j,Z2j,…Z10j]∼N([0,0,…0],cov(Z1j,Z2j,…Z10j)) where (9) cov(Z1j,Z2j,…Z10j)=[N1hl2Mℓ(j)+1+a1N1N2σg1,2Mℓ(j)+ρ1,2Ns1,2N1N2N2h22Mℓ(j)+1+a2⋮⋮⋱N1N10σg1,10Mℓ(j)+ρ1,10Ns1,10N1N10N2N10σg2,10Mℓ(j)+ρ2,10Ns2,10N2N10…N10h102Mℓ(j)+1+a10] and [Z1j,Z2j,…Z10j] reflects the Z statistics for the 10 GWAS cohorts (expressed in condensed form, not depicting cohorts 3 to 9 from the current simulations for display reasons), M is the number of SNPs from the LD file (1,184,461), Ns is the number of overlapping individuals, N is the sample size of the individual GWAS, ℓ(j) is the LD score of SNP j, and a+1 reflects the univariate LDSC intercept that picks up on unmeasured confounds, such as population stratification. The bivariate LDSC intercept, expressed as ρ1,2Ns1,2N1N2 for cohorts 1 and 2, was 0 owing to setting the sample overlap (Ns) to 0 for all simulations. GWAS z statistics were simulated following the equation above and using the mvrnorm R