Chunk #27 — Discussion

Source: A comprehensive survey of genetic variation in 20,691 subjects from four large cohorts.
Embedded: yes

Text

Our pipeline for combining and imputing twelve different GWAS datasets can overcome both technical and methodological issues. We chose to create three different datasets defined by platform family (in our case, Illumina HumanHap, Illumina OmniExpress and AffyMetrix) since the SNP overlap across platforms was low on a genome-wide scale (75,285 SNPs). An attempt to impute a genome-wide dataset comprising only 75,000 SNPs as starting point would have resulted in decreased imputation accuracy in regions of the genome with sparse genotype data. Moreover, it has been shown that different platforms might call SNPs differently and that SNP-specific allele frequencies can differ between platforms (see [41] for further discussion). We conducted multiple case-control GWAS among control subjects within each dataset (i.e. running multiple “null” GWAS) and identified and excluded more than 100 SNPs that showed spurious associations. These results emphasize that although datasets are merged by platform family, problematic SNPs giving rise to spurious associations might still exist and it is important to carefully check for these.