As of 3 September 2009, the KEGG GENES database contains 4.8 million genes in 1049 genomes. In comparison, the UniProt database (9) contains 9.4 million proteins from one-half million species. KEGG already covers half of the known protein universe and >90% of protein sequence families (Kanehisa,M., unpublished data). As the number of complete genomes increases, the coverage of the protein universe will also increase, but there will be remaining fractions of protein families, such as for plant proteins and viral proteins. These protein families are useful to analyze, for example, EST data and metagenomics data, and they will be incorporated in the KO system.