Each set of pair-end reads was processed by Trimmomatic10 to remove low-quality base pairs and sequence adapters. Reads were subsequently aligned to the GRCh38 (http://hgdownload.cse.ucsc.edu/goldenPath/hg38/bigZips/analysisSet/hg38.analysisSet.chroms.tar.gz) analysis set reference genome with the pseudoautosomal region masked on chromosome Y with the STAR aligner11. This yielded for each sample a BAM file of mapped paired-end reads sorted by genomic coordinates. From these files, reads that mapped to multiple loci or to the mitochondrial genome were removed using samtools23 and duplicated reads were removed with PICARD (http://broadinstitute.github.io/picard).