Chunk #12 — Materials and methods — Samples — Data preparation

Source: Genome- and transcriptome-wide splicing associations with alcohol use disorder.
Embedded: yes

Text

RNA-seq data were processed using a uniform pipeline. First, we investigated RNA-seq data quality using FastQC (https://www.bioinformatics.babraham.ac.uk/projects/fastqc/). We removed Illumina adapters and poor quality reads (reads < 36 bp long, leading or trailing reads < Phred score of 3 and allowing a maximum of 2 mismatches per read) using Trimmomatic (version 0.39)19. Then, we aligned trimmed reads to either the human hg19 genome or the Rhesus Macaque mmul_10 genome using STAR aligner version 2.5.3.a20. We followed the guidelines outlined by leafcutter (https://davidaknowles.github.io/leafcutter) to align RNA-seq reads and prepare data for differential splicing analyses. RNA-seq read alignment yielded an average of 78,955,738 paired-end reads in humans (s.d. = 29,804,777; MAlignment = 86.16%; Mread_size = 188.36) and a mean of 34,551,920 paired–end reads in primates (s.d. = 8,202,258; MAlignment = 79.71%; Mread_size = 127.59).