matching between ATAC-seq and Illumina genotyping (see below). Therefore, these samples were excluded together with additional nineteen samples that were identified as possibly contaminated, leaving a final total of 269 samples (Fig. 4). Using this dataset, we generated a set of 272,424 peaks accounting for 4.96% of the genome (Fig. 5a). Finally, we quantified read counts of all the individual non-merged samples within these peaks and used these counts for MDS clustering (Fig. 5b).