paperKB
coga / coga-kb
Help
Sign in

Chunk #5 — Introduction

Source
Integrating sequence and array data to create an improved 1000 Genomes Project haplotype reference panel.
Embedded
yes

Text

Here we present a new version of SHAPEIT2 that estimates haplotypes from GLs generated by low coverage sequencing data. In addition, our new method can also take advantage of SNP microarray genotypes on the same samples. The majority of the ~2,500 1000GP sequenced samples have been genotyped on either the IlluminaOmni2.5 or A ymetrix6.0 microarray, as well as an additional set of 1,198 un-sequenced samples, many of whom are close relatives of the ~2,500 sequenced samples. Our overall approach has two steps: firstly the SNP array data are phased in order to build a backbone of haplotypes across each chromosome, which we refer to as the scaffold. Secondly, we take GL data at sequenced variant sites, and jointly phase this data ‘onto’ this haplotype scaffold.