For selection of controls in step 1, it is crucial to choose samples with an ancestral composition consistent with the case samples, as population stratification is a strong confounding factor for GWAS analysis. Additional demographic (e.g., age, sex) and clinical variables (e.g., smoking status) should be considered based on the datasets being combined.