paperKB
coga / coga-kb
Help
Sign in

Chunk #8 — MATERIALS AND METHODS — Feature extraction

Source
KinasePhos 2.0: a web server for identifying protein kinase-specific phosphorylation sites based on sequences and coupling patterns.
Embedded
yes

Text

The differences of coupling strength CXdZ between the training set of phosphorylation sites and the background set, which is extracted from all 9-mer sequences centering at residue serine, threonine, tyrosine and histidine in Swiss-Prot protein sequences, are computed and used to determine the number of coupling patterns trained by SVM. The higher differences of CXdZ mean that the coupling pattern [XdZ] is the most important feature for separating the training set from the background set; therefore, the values of differences of the coupling strength CXdZ between training set and background set should be tuned for determining the number of coupling patterns used to train a SVM model. Each coupling pattern is a dimension of features used in SVM. For instance, when set up the cutoff value of the differences of CXdZ between training set and background set to 1.5, there are about 400 coupling patterns which is higher than the cutoff; thus, the number of dimensions trained by SVM is about 400, which is equal to the number of selected coupling patterns.