Chunk #45 — Methods — Statistical methods. — Deep learning annotations from DeepSEA and Basenji.

Source: Improving the trans-ancestry portability of polygenic risk scores by prioritizing variants in predicted cell-type-specific regulatory elements.
Embedded: yes

Text

We also trained new allelic effect DeepSEA models on the TF ChIP–seq used to train what we identified as lead IMPACT annotations (13 unique) for the 21 traits investigated in the PRS analysis. We employed DeepSEA as described previously using default parameters, 1 Quadro GV100 (NVIDIA) GPU, Selene (v.0.4.7) and PyTorch (v.1.3.1)54,55. For training the DeepSEA model, we used the genomic sequences corresponding to each of the 13 TF ChIP–seq peak sets as well as any regions where ENCODE or the Roadmap Epigenomics DeepSEA dataset contained at least one TF binding event. As done in the original DeepSEA study, we randomly sampled 1-kb sequences (hg19) from regions included ENCODE, Roadmap or our TF ChIP–seq data. Considering each training TF ChIP–seq dataset separately, we determine positive samples as done in the original DeepSEA study: if more than 100 bp of the center 200 bp of the 1-kb sequence falls in our provided TF ChIP–seq peaks, this sequence is labeled with a 1, otherwise it is 0. DeepSEA accurately predicted TF binding, average area under the receiver operating characteristic curve = 0.93,