For the GWAS data, association with disease was tested in a logistic model using gender, dummy-coded nationality and the first eight principal components in order to correct for ancestry as covariates. To determine the number of principal components to be included in the logistic regression model, the first ten principal components from the EIGENSTRAT [29] analysis were tested for association with case/control status (threshold p<0.05). For the GWAS discovery set, eight principal components were included in the logistic model, while for the GWAS replication set two principal components were included. Analyses were performed in PLINK v1.07 [28] and R (2009, The R Foundation for Statistical Computing).