Chunk #4 — The GAsP dataset

Source: The GenomeAsia 100K Project enables genetic discoveries across Asia.
Embedded: yes
Text

For the GAsP project, we generated 1,267 high-coverage (average 36×) whole-genome sequences and analysed these together with 596 publicly available human genome sequences from previous sequencing studies (Supplementary Information 1, 2 and Supplementary Tables 1a–c, 2a). The 1,739 samples were enriched for individuals from population isolates to capture the broadest wealth of genetic diversity; the dataset includes 598 sequences from India, 156 from Malaysia, 152 from South Korea, 113 from Pakistan, 100 from Mongolia, 70 from China, 70 from Papua New Guinea, 68 from Indonesia, 52 from the Philippines, 35 from Japan and 32 from Russia (Fig. 1a–c and Supplementary Table 1a–c). To facilitate comprehensive and comparative analysis of human genetic variation, we included sequencing data from African, European and American samples (Supplementary Table 1a, b). The sequenced samples originate from 7 global regions, 64 different countries of origin and 219 population groups. About 80% of the samples come from Asia and emphasize population groups that are underrepresented in previous genetic studies (Fig. 1a, b, Supplementary Tables 1a–c, 2b and Supplementary Information 1, 2). Each global region and population group