We evaluated the use of GAsP dataset in disease-associated genetic studies and medically relevant applications to determine how the results of larger continuing GenomeAsia studies can be used to improve human health (Supplementary Table 4a). We annotated high-quality variants using public databases including ExAC (Exome Aggregation Consortium)29, gnomAD29, 1000 Genomes Project4, ESP (NHLBI GO Exome Sequencing Project)30 and dbSNP (Extended Data Fig. 2) and focused on coding-sequence variants. Overall 23% of protein-altering variants in GAsP were not found in these data sources. As expected the majority of coding variants were singletons or very rare (Extended Data Fig. 2). However, the absolute numbers of novel variants with a minor allele frequency (MAF) ≥ 0.1% within our pan-Asian dataset is large (n = 194,585), and these are frequent enough to be of relevance for large-scale genetic association studies. We also searched for variants present at low frequency in the overall dataset that are present at significantly higher allele frequencies in one or more of the population groups. We found an additional 144,329 novel variants with MAF > 1% in the full GAsP