Introduction:

This tool implements Breiman's random forest algorithm (R randomforest package) for classification and peak ranking based on the inputted table. The peaks are ranked by the mean decrease in Gini index. Group information is given by a group design file (Tab-delimited text file)

Input files:

1.      Peak table file in Tab-delimited txt format, with the first column as the compound identifier and the others as samples.

For example:

HU_011

HU_014

HU_015

HU_017

HU_018

HU_019

(2-methoxyethoxy)propanoic acid isomer

3.019766

3.814339

3.519691

2.562183

3.781922

4.161074

(gamma)Glu-Leu/Ile

3.888479

4.277149

4.195649

4.32376

4.629329

4.412266

1-Methyluric acid

3.869006

3.837704

4.102254

4.53852

4.178829

4.516805

1-Methylxanthine

3.717259

3.776851

4.291665

4.432216

4.11736

4.562052

1,3-Dimethyluric acid

3.535461

3.932581

3.955376

4.228491

4.005545

4.320582

1,7-Dimethyluric acid

3.325199

4.025125

3.972904

4.109927

4.024092

4.326856

2-acetamido-4-methylphenyl acetate

4.204754

5.181858

3.88568

4.237915

1.852994

4.080681

2-Aminoadipic acid

4.080204

4.359246

4.249111

4.231404

4.323679

4.244485

 

2.      Group design file in Tab-delimited text format with two columns (samplename     groupname).

For example:

HU_011

M

HU 014

F

HU_015

M

HU_017

M

HU_018

M

HU_019

M

Parameter:

1.        number of trees: It specifies the number of decision trees included in the random forest. The default is 500.

2.        mtry: Mtry specifies the number of variables used in the node for the binary tree. The default is the quadratic root of the data set variable (classification model) or one- third (predictive model). Generally, it is necessary to carry out artificial selection step by step to determine the optimal m value.

3.        replacement: Specify the way to randomly sample Bootstrap. The default is resampling.

4.        nodesize: The minimum number of decision tree nodes. By default, the discriminant model is 1 and the regression model is 5.

5.        maxnodes: The maximum number of decision tree nodes

Output files:

1.      'RF_Prediction.txt', RF model sample prediction results using inputted data.

2.      'RF_Prediction_Summary.txt', prediction summary.

3.      'RF _Imp_Rank.txt', feature ranked results that are sorted by MeanDecreaseGini.

4.      'RF _Imp.pdf', scatter plot about feature importance.

5.      'RF _Top10_Imp.pdf', plot for Top 10 features.