We define gene duplicability as in Rambaldi et al. (11). In brief, we first align the protein sequences of all human genes to the human genome reference assembly (hg18), using BLAT (30). We then retrieve the best hit of each gene, defined as the locus on the genome with the highest score in terms of coverage. By default, all genes with additional genomic matches that cover at least 60% of the query length are considered duplicable, while genes with no additional hits above this threshold are considered singleton (11). In addition to the results at the default threshold of 60%, we also provide the possibility of inspecting additional hits of the same gene covering higher or lower percentage of the original protein length. For each duplicated locus, we refer to the genome annotation provided by the UCSC Table Browser (31) to assess whether it corresponds to a known gene or instead to non-genic region.