Initially we identified 5.5 Mb of the human reference sequence (HG19) having 0.1 or less relative coverage. If the data were unbiased, then 0.1 relative coverage would be more than 9 standard deviations from the expected coverage at each base. Therefore, we would expect no bases in the human genome to have such low coverage in the absence of sequencing bias. We then applied two filters to this 'undercovered set' to remove sequence that is unlikely to be present in the NA12878 genome (see Materials and methods). These filters, one based on analysis of the NA12878 assembly and the other based on a comparisons between NA12878 and a diverse population of other samples, excluded 8.7% (23 Mb) of the autosomal reference from further consideration. After this filtering, 3.6 Mb of undercovered reference genome remained.