Chunk #14 — Preparing data for association testing — Assigning data elements to genes

Source: Pathway analysis of genomic data: concepts, methods, and prospects for future development.
Embedded: yes

Text

Genomic data has historically been integrated into pathways by mapping assayed elements to genes. For SNP-based genotyping arrays, this is not straightforward because many array SNPs are not located in known coding or regulatory regions. In one study, all SNPs that were not be mapped to a single gene through a reference genome build were discarded, but this resulted in a loss of more than 25% of assayed SNPs [33]. Alternatively, each unmapped SNP can be assigned to its nearest gene [34]. However, evolving theories suggest that sequences may not be associated to genes based on closest proximity, and may not even be solely associated to one gene [35, 36]. Hence, many studies assign unmapped SNPs to all genes within a distance window, ranging from 10 kb to 500 kb [13, 25, 26, 37]. Studies taking this approach should beware that some SNPs may not be functionally related to their assigned gene(s). In addition, SNPs that map to multiple genes in the same pathway can yield spurious pathway association. This issue is particularly important for genes (such as the MHC/HLA