Introduction:
This tool implements Breiman's random forest algorithm (R
randomforest package) for classification and peak ranking based on the inputted
table. The peaks are ranked by the mean decrease in Gini index. Group
information is given by a group design file (Tab-delimited text file)
Input files:
1.
Peak table file in Tab-delimited
txt format, with the first column as the compound identifier and the others as
samples.
For example:
|
HU_011 |
HU_014 |
HU_015 |
HU_017 |
HU_018 |
HU_019 |
|
|
(2-methoxyethoxy)propanoic acid isomer |
3.019766 |
3.814339 |
3.519691 |
2.562183 |
3.781922 |
4.161074 |
|
(gamma)Glu-Leu/Ile |
3.888479 |
4.277149 |
4.195649 |
4.32376 |
4.629329 |
4.412266 |
|
1-Methyluric acid |
3.869006 |
3.837704 |
4.102254 |
4.53852 |
4.178829 |
4.516805 |
|
1-Methylxanthine |
3.717259 |
3.776851 |
4.291665 |
4.432216 |
4.11736 |
4.562052 |
|
1,3-Dimethyluric acid |
3.535461 |
3.932581 |
3.955376 |
4.228491 |
4.005545 |
4.320582 |
|
1,7-Dimethyluric acid |
3.325199 |
4.025125 |
3.972904 |
4.109927 |
4.024092 |
4.326856 |
|
2-acetamido-4-methylphenyl acetate |
4.204754 |
5.181858 |
3.88568 |
4.237915 |
1.852994 |
4.080681 |
|
2-Aminoadipic acid |
4.080204 |
4.359246 |
4.249111 |
4.231404 |
4.323679 |
4.244485 |
2.
Group design file in Tab-delimited
text format with two columns (samplename groupname).
For example:
|
HU_011 |
M |
|
HU 014 |
F |
|
HU_015 |
M |
|
HU_017 |
M |
|
HU_018 |
M |
|
HU_019 |
M |
Parameter:
1.
number of trees: It specifies the
number of decision trees included in the random forest. The default is 500.
2.
mtry: Mtry specifies the number of variables used in the node for the
binary tree. The default is the quadratic root of the data set variable
(classification model) or one- third (predictive model). Generally, it is
necessary to carry out artificial selection step by step to determine the
optimal m value.
3.
replacement: Specify the way to randomly
sample Bootstrap. The default is resampling.
4.
nodesize: The minimum number of decision
tree nodes. By default, the discriminant model is 1 and the regression model is
5.
5.
maxnodes: The maximum number of decision
tree nodes
Output files:
1.
'RF_Prediction.txt', RF model
sample prediction results using inputted data.
2.
'RF_Prediction_Summary.txt',
prediction summary.
3.
'RF _Imp_Rank.txt', feature
ranked results that are sorted by MeanDecreaseGini.
4.
'RF _Imp.pdf', scatter plot about
feature importance.
5.
'RF _Top10_Imp.pdf', plot for Top
10 features.