paperKB
coga / coga-kb
Processing
Help
Sign in

Chunk #45 — Methods — RNA-seq alignment and QC

Source
Brain expression quantitative trait locus and network analyses reveal downstream effects and putative drivers for brain-related diseases.
Embedded
yes

Text

Our collection of RNA-seq samples consisted of 36 different tissue labels, many of which were represented by only a few samples. Therefore, we next defined major brain regions present in our dataset, including samples from the amygdala, basal ganglia, cerebellum, cortex, hippocampus and spinal cord. We noted that some samples (especially from the ENA) were not annotated with a specific major brain region. To resolve this, we performed PCA over the sample correlation matrix and then performed k-nearest neighbors on the first two PCs (k = 7) to classify samples to the major brain regions. Using this approach, we defined a set of 86 amygdala, 574 basal ganglia, 723 cerebellum, 6,601 cortex, 206 hippocampus, 252 hypothalamus and 285 spinal cord samples (Fig. 2a and Supplementary Table 1).