Chunk #1 — Main

Source: The UK Biobank resource with deep phenotyping and genomic data.
Embedded: yes
Text

In this paper, we summarize the existing and planned content of the phenotype resource and describe the genetic dataset on the full 500,000 participants. To facilitate its wider use, we applied a range of quality control procedures and conducted a set of analyses that reveal properties of the genetic data—such as population structure and relatedness—that can be important for downstream analyses. In addition, we estimated haplotypes and imputed genotypes into the dataset that increases the number of testable variants by more than 100-fold to approximately 96 million variants. We also imputed classical allelic variation at 11 human leukocyte antigen (HLA) genes, and replicated signals of known associations between HLA alleles and many common diseases. We describe tools that allow efficient genome-wide association studies (GWAS) of multiple traits and fast phenome-wide association studies, which work together with a new compressed file format that has been used to distribute the dataset. As a further check of the genotyped and imputed datasets, we performed a test-case genome-wide association scan on a well-studied human trait, standing height.