QC was conducted on NTR and NESDA data in parallel. Expression data were required to pass standard Affymetrix Expression Console quality metrics before further QC. The array superset consisted of 6,526 U219 arrays (3,516 NTR, 2,783 NESDA samples, divided into baseline samples and a smaller portion after 2-year followup, and 227 controls) on 69 plates, including 417 samples which were identified as having reduced quality (D < −5.0, described below) and re-hybridized. Expression values were obtained using robust multichip averaging (RMA) normalization (Affymetrix Power Tools, v1.12.0). Probe sequences were mapped to the human genome (hg19) using BOWTIE, 76 and probes with sequences not mapping, mapping to multiple locations, or intersecting a polymorphic SNP (HapMap3 and 1000 Genomes Project data) were removed. 77,78 We mapped and annotated all Affymetrix U219 probesets with reference to GENCODE (v14) gene models as we were dissatisfied with the standard Affymetrix annotations.