Introduction:
This tool is the wrapper
of the R package 'biosigner' and aims to find the significant peaks in the inputted table. Three
binary classifiers have been jointly used in biosigner, namely Partial Least Square
Discriminant Analysis (PLS-DA), Random Forest (RF) and Support Vector Machines
(SVM), to achieve high levels of prediction accuracy.
Group
information is given by a group design file (Tab-delimited text file)
Input files:
1.
Peak table file in Tab-delimited
txt format, with the first column as the compound identifier and the others as
samples.
For example:
|
HU_011 |
HU_014 |
HU_015 |
HU_017 |
HU_018 |
HU_019 |
|
|
(2-methoxyethoxy)propanoic acid isomer |
3.019766 |
3.814339 |
3.519691 |
2.562183 |
3.781922 |
4.161074 |
|
(gamma)Glu-Leu/Ile |
3.888479 |
4.277149 |
4.195649 |
4.32376 |
4.629329 |
4.412266 |
|
1-Methyluric acid |
3.869006 |
3.837704 |
4.102254 |
4.53852 |
4.178829 |
4.516805 |
|
1-Methylxanthine |
3.717259 |
3.776851 |
4.291665 |
4.432216 |
4.11736 |
4.562052 |
|
1,3-Dimethyluric acid |
3.535461 |
3.932581 |
3.955376 |
4.228491 |
4.005545 |
4.320582 |
|
1,7-Dimethyluric acid |
3.325199 |
4.025125 |
3.972904 |
4.109927 |
4.024092 |
4.326856 |
|
2-acetamido-4-methylphenyl acetate |
4.204754 |
5.181858 |
3.88568 |
4.237915 |
1.852994 |
4.080681 |
|
2-Aminoadipic acid |
4.080204 |
4.359246 |
4.249111 |
4.231404 |
4.323679 |
4.244485 |
2.
Group design file in
Tab-delimited text format with two columns (samplename groupname).
For example:
|
HU_011 |
M |
|
HU 014 |
F |
|
HU_015 |
M |
|
HU_017 |
M |
|
HU_018 |
M |
|
HU_019 |
M |
Parameter:
2.
pvalN: To speed up the selection, only variables which significantly improve
the model up to two times this threshold (to take into account potential
fluctuations) are computed.
3.
Selection tiers: Tiers from S,
A, up to E by decreasing relevance. The (S) tier corresponds to the final
signature, i.e. features which passed through all the backward selection steps.
In contrast, features from the other tiers were discarded during the last (A)
or previous (B to E) selection rounds. Note that tierMaxC = ‘A’ argument in the
print and plot methods can be used to view the features from the larger S+A
signatures (especially when no S features have been found, or when the
performance of the S model is much lower than the S+A model).
Output files:
1.
'biosigner_variable_results.txt',
feature rank results by biosigner algorithm.
2.
'biosigner_variable_significant_results.txt',
significant feature results.
3.
'biosigner_figure-tier.pdf ',
displays classifier tiers from selected features.
4.
'biosigner_figure-boxplot.pdf
', individual boxplots from selected features.
Note:
1.
Group number must be 2 in the
sample group file.
2.
Group names of characters or
string are preferred. Numbers are also supported but not recommended.
3.
The algorithm returns the tier
of each feature for the selected classifier (s): tier S corresponds to the
final signature, i.e., features which have been found significant in all the
selection steps; features with tier A have been found significant in all but
the last selection, and so on for tier B to D. Tier E regroup all previous
round of selection.
Reference:
Rinaudo P, Boudah S, Junot C, et al. biosigner: A New Method for the Discovery of Significant
Molecular Signatures from Omics Data[J]. Frontiers in
Molecular Biosciences, 2016, 3.