paperKB
coga / coga-kb
Help
Sign in

Chunk #5 — Methodological issues — From SNPs to genes

Source
Gene set analysis of genome-wide association studies: methodological issues and perspectives.
Embedded
yes

Text

When defining gene boundaries, different criteria (e.g., 500kb [5], 200kb [10], 20kb [11], and 5kb [12] in both upstream and downstream of the gene coding regions) have been proposed in the literature. Considering LD and gene regulation pattern, investigators often define a gene region to include both the genic region (core part) and the boundary regions (upstream and downstream of the gene). More sophisticated approaches, such as including SNPs that are in LD with the gene, have also been developed [13,14]. These strategies aim to cover SNP markers that play regulatory roles in gene expression and/or link to causal variants within the same LD block. However, these approaches also include more irrelevant SNPs. Thus, they may not only dilute potential signal strength for a gene set but also increase computational burden dramatically, especially for gene sets with a large number of genes. One potentially promising strategy is to take advantage of the information from gene expression studies. Veyrieras et al. [15] estimated that the majority of genetic variants influencing gene expression are located within 20kb of the genes. Recently, to