Chunk #44 — Methods — RNA-seq alignment and QC

Source: Brain expression quantitative trait locus and network analyses reveal downstream effects and putative drivers for brain-related diseases.
Embedded: yes

Text

To measure the FASTQ and alignment quality we used FastQC84 (v0.11.3), STAR metrics and Picard Tools85 metrics (v2.18.26; MultipleMetrics and RNAseqMetrics). Samples were filtered out if aligned reads had <10% coding bases (Supplementary Fig. 4a), <60% reads aligned (Supplementary Fig. 4b) or <60% unique mapping. Among the RNA-seq samples, 117 did not pass this filter, mostly from GTEx81. The other quality measurements were visually inspected but contained no outliers. To identify outliers that had not been captured by these statistics, we performed a PCA-based filtering approach, after which 8,868 samples remained (Supplementary Note and Supplementary Fig. 5a–c). To adjust for between-dataset differences observed in the data (Supplementary Fig. 6a), we correlated the RNA-seq data with 77 covariates from the different QC tools and regressed-out the top-20 correlated covariates using ordinary least squares (OLS; Supplementary Note), after which clustering of datasets in PC1 and PC2 were no longer present (Supplementary Fig. 6b).