Chunk #1 — Large datasets allow identification of defined subsets

Source: Integrating electronic health record genotype and phenotype datasets to transform patient care.
Embedded: yes

Text

Recent efforts have generated several very large datasets that integrate EHR data of various types to dense genomic information, including the Electronic Medical Records and Genomics (eMERGE) Network,3 the Veterans Administration’s Million Veterans Program,4 the Kaiser-Permanente GERA program,5 the UK Biobank,6, and the Icelandic deCODE resource.7 Taken together, these have generated dense genotype information (genome wide association study (GWAS)-level or more) in over a million patients. Importantly, while initial studies in these datasets have demonstrated their value in discovering common genetic loci associated with common human disease through GWAS, more recent work has shown they can be exploited for many other applications, such as identifying rare genetic variants with large effect sizes, pleiotropic effects of common and rare genetic variants, and potential drug targets. While these systems have been expensive to establish, they hold the promise of actually improving efficiencies in both discovery and implementation, since data generated in the course of clinical care is reused for research purposes.8 Further, the genetic datasets, once generated, can be reused for multiple analyses. This idea was initially developed by the Wellcome Trust