Figure 1 depicts the system flow of the proposed method. The experimentally validated phosphorylation sites are extracted from Phospho.ELM (release 6.0) (15) and Swiss-Prot (release 50) (16), containing 13 612 phosphorylation sites within 3674 proteins and 6832 sites within 3148 proteins, respectively. After removing the redundant sites between Phospho.ELM and Swiss-Prot, the number of serine (S), threonine (T), tyrosine (Y) and histidine (H) substrate are 11 888, 2433, 2179 and 43, respectively, as given in Table 1. Since the flanking sequences (position −4 ∼ +4) of the phosphorylation sites (position 0) are graphically visualized as sequence logos (17), the conservation of amino acids in the phosphorylation sites can be observed. The 9-mer sequences (−4 ∼ +4) of kinase-specific phosphorylation sites are extracted and constructed as training sets. Table S1 (See Supplementary Data) summarizes the statistics of 60 kinase-specific phosphorylation sites in the data set constructed. Table 1.The statistics of phosphorylation sites obtained from Phospho.ELM and Swiss-ProtData sourceNumber of phosphorylated proteinsNumber of phosphorylation sitesSerine (S)Threonine (T)Tyrosine (Y)Histidine (H)TotalPhospho.ELM3674991718901804113 612Swiss-Prot*314848461035901426832Combined (non-redundant)584211 888243321794316 551It notices that the sum of serine, threonine, tyrosine and