Chunk #37 — Methods — Complete Genomics (CG) validation data

Source: Integrating sequence and array data to create an improved 1000 Genomes Project haplotype reference panel.
Embedded: yes

Text

As validation data, we used two different data sets: the 69 genomes from Complete Genomics (CG1) and an additional set of 250 samples (CG2) also sequenced by Complete Genomics. All these samples were sequenced using the Complete Genomics sequencing technology at an average of 80×. The CG1 can be found at http://www.completegenomics.com/public-data/69-Genomes/ and the CG2 at ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/working/20130524_cgi_combined_calls/. On these data sets, we filtered out all variants with a call rate below 66% and ignored them in all posterior validation analysis. In both data sets, we used called SNPs as validations. We found 15,060,295 and 17,399,956 1000GP SNPs overlapping CG1 and CG2 respectively. In addition, we found 554,886 1000GP Indels also in CG2.