Chunk #37 — Materials and Methods — RNA-seq and Ribosomal Profiling Read Alignment and Processing

Source: Pervasive transcription of the human genome produces thousands of previously unidentified long intergenic noncoding RNAs.
Embedded: yes

Text

127 RNA-seq sequence files (5 novel and 122 publicly available datasets, Table S1) were aligned to hg18 with TopHat v1.1.4 allowing only uniquely mapped reads using the option -g 1 (all other parameters were default, see the TopHat manual http://tophat.cbcb.umd.edu/manual.html). Detailed information pertaining to each dataset, including novel datasets, is available in the sources provided in Table S1. These RNA-seq datasets were chosen because they sampled a wide breadth of human tissues and cell types, have well documented experimental methods used for their generation, and were publicly available. While datasets with longer reads and deeper read depth were preferred because they allow for more complete de novo transcript assembly, some datasets with short reads and shallow read depths were included in order to sample as many tissue types as possible. Datasets derived from tissues with mutated genomes, such as cancers, were included to capture tissue specific expression even though some reads from mutated genomic positions would fail to map to the reference hg18 genome. SAMtools v0.1.7 and BEDTools v2.12.0 were used to process aligned read files.