paperKB
coga / coga-kb
Help
Sign in

Chunk #78 — Method Details — Invariant genes as controls for data QC and normalization

Source
A Next Generation Connectivity Map: L1000 Platform and the First 1,000,000 Profiles.
Embedded
yes

Text

We developed a set of internal controls to assess quality, to provide real-time feedback during the scanning process, and to use in normalization. Importantly, rather than using a single “housekeeping” gene (e.g. GAPDH), we adopted an approach that utilizes control values across the entire spectrum of gene expression. We adapted the approach described in the Illumina BeadChip studio (Illumina Inc., 2007) by defining a set of genes that are rank invariant across all samples. To identify these genes, we analyzed human gene expression profiles from DSGEO and selected genes whose expression is relatively invariant (coefficient of variation < 10%) across a variety of tissue types and experimental conditions. To further minimize the variance, rather than picking single genes as invariants, we grouped the genes into 10 sets of 8 genes each based on their level of expression across all samples. The 10 gene sets were ordered by increasing levels of expression, with the first level corresponding to genes with the lowest expression and the tenth level to genes most highly expressed. Because these gene sets exhibit a consistent expression pattern,