Introduction:
This tool performs support vector machines to
rank peaks in the inputted table by SVM-RFE. Group information is given by a
group design file (Tab-delimited text file).
The SVM-RFE algorithm proposed by Guyon returns a
ranking of the features of a classification problem by training an SVM with a
linear kernel and removing the feature with the smallest ranking criterion.
This criterion is the w value of the decision hyperplane given by the SVM. For
more detailed information, please review the original paper.
Input files:
1.
Peak table file in Tab-delimited
txt format, with the first column as
the compound identifier and the others as samples.
For example:
|
HU_011 |
HU_014 |
HU_015 |
HU_017 |
HU_018 |
HU_019 |
|
|
(2-methoxyethoxy)propanoic acid isomer |
3.019766 |
3.814339 |
3.519691 |
2.562183 |
3.781922 |
4.161074 |
|
(gamma)Glu-Leu/Ile |
3.888479 |
4.277149 |
4.195649 |
4.32376 |
4.629329 |
4.412266 |
|
1-Methyluric acid |
3.869006 |
3.837704 |
4.102254 |
4.53852 |
4.178829 |
4.516805 |
|
1-Methylxanthine |
3.717259 |
3.776851 |
4.291665 |
4.432216 |
4.11736 |
4.562052 |
|
1,3-Dimethyluric acid |
3.535461 |
3.932581 |
3.955376 |
4.228491 |
4.005545 |
4.320582 |
|
1,7-Dimethyluric acid |
3.325199 |
4.025125 |
3.972904 |
4.109927 |
4.024092 |
4.326856 |
|
2-acetamido-4-methylphenyl acetate |
4.204754 |
5.181858 |
3.88568 |
4.237915 |
1.852994 |
4.080681 |
|
2-Aminoadipic acid |
4.080204 |
4.359246 |
4.249111 |
4.231404 |
4.323679 |
4.244485 |
2.
Group design file in Tab-delimited
text file with two columns (samplename groupname).
For example:
|
HU_011 |
M |
|
HU 014 |
F |
|
HU_015 |
M |
|
HU_017 |
M |
|
HU_018 |
M |
|
HU_019 |
M |
Parameter:
1.
kernel function: The kernel
function reflects the similarity between the inputted data. The correct choice
of kernel parameters is crucial for obtaining good results, which practically
means that an extensive search must be conducted on the parameter space before
results can be trusted.

2.
Linear kernel: Simple and safe,
try it first. The model is interpretative. It indicates which features or data
points in the model are important. But it
is not available if the data is not
linearly separable.
3.
Polynomial kernel: Less
restrictive than linear applications, it can solve non-linear separable data.
But it is more complicated with three parameters.
4.
Radial basis function (RBF):
Usually defined as a monotonic function of the Euclidean distance between any
points in space to a certain center. It maps primitive features to infinite
dimensions. It is able to achieve nonlinear mapping, and also has less
numerical difficulties.
5.
Sigmoid: Squashes numbers to the range [0, 1]. Historically popular since they
have a nice interpretation as a
saturating “firing rate” of a neuron. But there are some fatal disadvantages.
For instance, saturated neurons “kill” the gradients, sigmoid outputs are not
zero-centered, and exp () is a bit computationally expensive.
Output files:
1.
'SVM_Prediction.txt', SVM model
sample prediction results using inputted data.
2.
'SVM_Prediction_Summary.txt',
prediction summary.
3.
'SVM_Imp_Rank.txt', feature
ranked results that are sorted by SVM-RFE.
4.
'SVM_Imp.pdf', scatter plot about
feature importance.
5.
'SVM_Top10_Imp.pdf', plot for Top
10 features.
Note:
Group names of characters or string are preferred. Numbers are also
supported but not recommended.
Reference:
Marchiori E, Sebag M. Bayesian Learning with Local Support Vector
Machines for Cancer Classification with Gene Expression Data[M]// Applications
of Evolutionary Computing. Springer Berlin Heidelberg, 2005:74-83.