paperKB
coga / coga-kb
Help
Sign in

Chunk #57 — STAR*METHODS — METHOD DETAILS — RNA-seq data generation and quality assessments

Source
Large, Diverse Population Cohorts of hiPSCs and Derived Hepatocyte-like Cells Reveal Functional Genetic Variation at Blood Lipid-Associated Loci.
Embedded
yes

Text

Raw fastq files were assessed for quality measures using FastQC (Andrews, 2010). Adaptor trimming for Illumina paired-end libraries was applied using TrimGalore! (Krueger, 2012). We then aligned all remaining reads to the hg19/GRCh37 reference using the STAR aligner on the 2-pass mode (Dobin et al., 2013). We quantified the level of gene-level expression variation (transcripts per million; TPMs) using RSEM (Li and Dewey, 2011). We downloaded raw fastq files from GTEx (dbGaP Accession phs000424.v6.p1) and processed whole-liver sample data in an identical pipeline to iPSCs and HLCs. For additional tissue specificity assessments, we downloaded gene expression quantification files from the GTEx Portal (www.gtexportal.org). As data processing pipelines differed greatly between our samples and GTEx (Aguet et al., 2016), our use of the GTEx processed data was limited to quality control steps. To maximize compatibility and reduce batch effects between the two cohorts, we calculated TPMs from RPKMs provided by the GTEx Consortium with the following formula: TPMtranscript=(RPKMtranscript/∑i=1mRPKMtranscript)×1000000 where m is all transcripts per given individual. Subsequently, TPMs across all individuals and all genes were scaled to normal distribution.