paperKB
coga / coga-kb
Help
Sign in

Chunk #13 — Materials and methods — Dataset merging

Source
A comprehensive survey of genetic variation in 20,691 subjects from four large cohorts.
Embedded
yes

Text

After removing duplicate and related pairs of IDs, we used EIGENSTRAT [48] to run principal component analysis (PCA) on each dataset, removing one member from each flagged pair of related individuals. For Affymetrix and HumanHap, we used approximately 12,000 SNPs from Yu et al [49] that were filtered to ensure low pairwise linkage disequilibrium (LD). For the OmniExpress dataset we used approximately 33,000 SNPs that were similarly filtered. The top principal components were manually checked for outliers.