Chunk #8 — Results — Overview of Methods

Source: An atlas of genetic correlations across human diseases and traits.
Embedded: yes

Text

Sample overlap creates spurious correlation between z1j and z2j, which inflates z1jz2j. The expected magnitude of this inflation is uniform across all markers, and in particular does not depend on LD Score. As a result, sample overlap only affects the intercept from this regression (the term ρNs/N1N2) and not the slope, so the estimates of genetic correlation will not be biased by sample overlap. Similarly, shared population stratification will alter the intercept but have minimal impact on the slope, because the correlation between LD Score and the rate of genetic drift is minimal [19]. If we are willing to assume no shared population stratification, and we know the amount of sample overlap and phenotypic correlation in advance (i.e., the true value of ρNs/N1N2), we can constrain the intercept to this value. We refer to this approach as constrained intercept LD Score regression. Constrained intercept LD Score regression has lower standard error – often by as much as 30% – than LD Score regression with unconstrained intercept, but will yield biased and misleading estimates if the intercept is misspecified, e.g., if we specify the wrong value of Nsρ or do not completely control for population stratification.