paperKB
coga / coga-kb
Help
Sign in

Chunk #23 — Materials and Methods — Statistical analysis — Selection of PC for PS correction

Source
Population substructure and control selection in genome-wide association studies.
Embedded
yes

Text

To limit the searching space, we focus on only PCs with significantly large genetic variation (with P-values less than 0.05 based on the Tracy-Widom test). The number of significant PCs, represented by L, is either 4 or 5 among the four considered scans (the two original and the two reconstructed). Let u l = (ul ,,1,ul ,2,…,ul ,N)′, 1≤l≤L be the vector of the lth PCs for all subjects. To reduce the computing time further, we order those L PCs u 1, u 2, …, u L according to their Wilcoxon rank-sum test statistics that compare the distributions between cases and controls along individual PCs, and define them in that order as u (1), u (2), …, u (L), with u (1) being the PC with the largest Wilcoxon rank-sum test statistic. We use the following greedy search algorithm to choose a subset of PCs for the correction of PS by sequentially evaluating u (1), u (2), …, u (L).