Chunk #44 — Materials and methods — Identification of candidate splicing regulatory motifs

Source: Variation in alternative splicing across human tissues.
Embedded: yes

Text

Over-represented sequence motifs (k-mers) were identified by comparing the number of occurrences of k-mers (for k in the range of 4 to 6 bases) in a test set of alternative exons versus a control set. In this analysis, monomeric tandem repeats (for example, poly(A) sequences) were excluded. The enrichment score of candidate k-mers in the test set versus the control set was evaluated by computing χ2 (chi-squared) values with a Yates correction term [75], using an approach similar in spirit to that described by Brudno et al. [59]. We randomly sampled 500 subsets of the same size as the test set from the control set. The enrichment scores for k-mers over-represented in the sampled subset versus the remainder of the control set were computed as above. The estimated p-value for observing the given enrichment score (χ2-value) associated with an over-represented sequence motif of length k was defined as the fraction of subsets that contained any k-mer with enrichment score (χ2-value) higher than the tested motif. Correcting for multiple testing is not required as the p-value was defined relative to the