To evaluate the potential of our data to generate even more comprehensive variation datasets, we developed and applied a method based on de novo assembly of unmapped and mismapped read pairs, enabling us to assemble sequences that are present in a sample but absent, or improperly represented, in the reference. As the majority of non-reference human sequence is present in the assembled genomes of other primates40,41, we leveraged available hominid references (see Methods) to specifically discover retained ancestral sequences that have been deleted in some human lineages, including on the reference haplotype.