paperKB
coga / coga-kb
Help
Sign in

Chunk #48 — Methods — Summary Statistics — Using the UK Biobank split-sample PGI

Source
Resource profile and user guide of the Polygenic Index Repository.
Embedded
yes

Text

the data, Cov(Xi, εi) ≠ 0 whenever individuals i and j are in different partitions. As a result, Varβ^=VarX′X-1X′Y (5)=Var[(X′X)−1X′ε] (6)≠(X′X)−1X′Var(ε)X(X′X)−1. The expression (6) is the standard general formula for the sampling variance of OLS estimates. It is not equal to (5) due to the correlation between (X′ X)−1 X′ and ε. If we knew the correlation between these two vectors, we could calculate correct standard errors in this setting, but the correlation structure is complex, and we are unaware of any current method that produces correct standard errors. For this reason, we recommend that researchers only do analyses on sets of individuals within a partition. If researchers choose to do analyses with individuals across different partitions, they should include the strong caveat that their standard errors may be poorly calibrated.