Chunk #38 — Methods — Two-Step Estimator

Source: An atlas of genetic correlations across human diseases and traits.
Embedded: yes

Text

As noted in [19], SNPs with very large effect sizes can result in large LD Score regression standard errors for single-trait LD Score regression with unconstrained intercept; cross-trait LD Score regression with unconstrained intercept behaves similarly. This is due to the well-known fact that linear regression deals poorly with outliers in the response variable (LD Score regression with constrained intercept is not nearly as adversely affected by large-effect SNPs). The solution proposed in [19] was to remove SNPs with χ2>80 from the LD Score regression. This is a satisfactory solution when the goal is to estimate the LD Score regression intercept. If the goal is to distinguish polygenicity from population stratification, and we are willing to assume that the population stratification is subtle, such that SNPs with χ2>80 are much more likely to be real causal SNPs rather than artifacts, then we can make the task much easier by removing those SNPs. However, this is unsatisfactory if the goal is to estimate h2: ignoring large-effect SNPs with χ2>80 would bias estimates of h2 and ρg towards zero. Therefore, for estimating