We obtained freely available RNA-Seq data from 421 lymphoblastoid cell lines (LCLs) generated by the GEUVADIS consortium15 and genotype data generated by the 1000 Genomes project. We used GEUVADIS as a validation dataset to test the gene prediction models generated in the DGN cohort.