paperKB
coga / coga-kb
Help
Sign in

Chunk #37 — METHODS — Subjects. — PTSD case-control (binary) electronic health record derived phenotype.

Source
Genome-wide association analyses of post-traumatic stress disorder and its symptom subdomains in the Million Veteran Program.
Embedded
yes

Text

Details on the validation and psychometric properties of this phenotype are reported in our recent publication23. In brief, we used manual chart review (n = 500) as the gold standard. For both the algorithm and chart review, three classifications were possible: likely PTSD, possible PTSD, or no PTSD. We used Lasso regression with cross-validation first to select statistically significant predictors of PTSD from the electronic health record (EHR) and then to generate a predicted probability score of being a PTSD case for every participant in the study population. Probability scores ranged from 0–1.00. Comparing the performance of our probabilistic approach (Lasso algorithm) to a rule-based approach (ICD algorithm), the Lasso algorithm showed modestly higher overall percent agreement with chart review compared to the ICD algorithm (80% vs. 75%), higher sensitivity (0.95 vs. 0.84), and higher overall accuracy (AUC = 0.95 vs. 0.90). For purposes of the case-control binary EHR-derived phenotype used here, we applied a 0.7 probability cut point to the Lasso results to determine final PTSD case and control status; we also selected a threshold score of 30 on