paperKB
coga / coga-kb
Help
Sign in

Chunk #58 — Methods — Partners Biobank curated disease populations and quantitative traits

Source
Polygenic prediction via Bayesian regression and continuous shrinkage priors.
Embedded
yes

Text

For a number of common complex diseases, the Partners Biobank trained and validated a classification algorithm, which leverages both structured and unstructured EHR data, and combines natural language processing and statistical methods, in a gold standard training set created by expert chart review. The algorithm was then applied to all the participants in the Biobank to identify cases and controls, and create curated disease populations. We selected six curated diseases—BRCA, CAD, DEP, IBD (Crohn’s disease or ulcerative colitis), RA, and T2DM—for which there are more than 500 cases in the Biobank that have been genotyped, and external large-scale GWAS summary statistics are publicly available. For all the diseases, cases have an algorithm-based positive predictive value (PPV) of having current or past history of the disease greater than 0.90, and controls have a negative predictive value (NPV) of having no history of the disease greater than 0.99.