paperKB
coga / coga-kb
Help
Sign in

Chunk #38 — Materials and Methods — Quantitation of the Transcribed Fraction of the Genome

Source
Pervasive transcription of the human genome produces thousands of previously unidentified long intergenic noncoding RNAs.
Embedded
yes

Text

The uniquely mappable human genome, defined here as the portions of the genome to which RNA-seq reads can be uniquely mapped, was derived for hg18 from http://www.imagenix.com/uniqueome/downloads/hg18_uniqueome.unique_starts.base-space.50.2.positive.BED.gz [45]. It contains 2,570,174,327 bp or 83.4% of the total human genomic sequence. To determine the genomic coverage of RNA-seq data, all aligned RNA-seq reads were combined and read coverage at each genomic base position was determined with the BEDTools function genomeCoverageBed. Split reads (i.e. exon-exon junction spanning reads) were counted such that intronic sequence was included as part of the reads. In Figure 1A, “All genes, ESTs, cDNAs” includes GENCODE v10 genes (excluding pseudogenes), RefSeq NM and NR genes, UCSC Known Genes, spliced H-Invitational cDNAs, spliced ESTs (UCSC Genome Browser “Spliced EST” track), and previously annotated spliced lincRNAs [16]. In all cases, intronic sequences of genes, cDNAs and ESTs were included.