Introduction:

This tool is the wrapper of the R package 'biosigner' and aims to find the significant peaks in the inputted table. Three binary classifiers have been jointly used in biosigner, namely Partial Least Square Discriminant Analysis (PLS-DA), Random Forest (RF) and Support Vector Machines (SVM), to achieve high levels of prediction accuracy.

Group information is given by a group design file (Tab-delimited text file)

Input files:

1.      Peak table file in Tab-delimited txt format, with the first column as the compound identifier and the others as samples.

For example:

HU_011

HU_014

HU_015

HU_017

HU_018

HU_019

(2-methoxyethoxy)propanoic acid isomer

3.019766

3.814339

3.519691

2.562183

3.781922

4.161074

(gamma)Glu-Leu/Ile

3.888479

4.277149

4.195649

4.32376

4.629329

4.412266

1-Methyluric acid

3.869006

3.837704

4.102254

4.53852

4.178829

4.516805

1-Methylxanthine

3.717259

3.776851

4.291665

4.432216

4.11736

4.562052

1,3-Dimethyluric acid

3.535461

3.932581

3.955376

4.228491

4.005545

4.320582

1,7-Dimethyluric acid

3.325199

4.025125

3.972904

4.109927

4.024092

4.326856

2-acetamido-4-methylphenyl acetate

4.204754

5.181858

3.88568

4.237915

1.852994

4.080681

2-Aminoadipic acid

4.080204

4.359246

4.249111

4.231404

4.323679

4.244485

 

2.      Group design file in Tab-delimited text format with two columns (samplename     groupname).

For example:

HU_011

M

HU 014

F

HU_015

M

HU_017

M

HU_018

M

HU_019

M

Parameter:

1.      bootstraps for resampling: The number of bootstraps is set to 5 to speed up computations when generating this vignette; we however recommend to keep the default 50 value for analyzing (otherwise signatures may be less stable).

2.      pvalN: To speed up the selection, only variables which significantly improve the model up to two times this threshold (to take into account potential fluctuations) are computed.

3.      Selection tiers: Tiers from S, A, up to E by decreasing relevance. The (S) tier corresponds to the final signature, i.e. features which passed through all the backward selection steps. In contrast, features from the other tiers were discarded during the last (A) or previous (B to E) selection rounds. Note that tierMaxC = ‘A’ argument in the print and plot methods can be used to view the features from the larger S+A signatures (especially when no S features have been found, or when the performance of the S model is much lower than the S+A model).

Output files:

1.        'biosigner_variable_results.txt', feature rank results by biosigner algorithm.

2.        'biosigner_variable_significant_results.txt', significant feature results.

3.        'biosigner_figure-tier.pdf ', displays classifier tiers from selected features.

4.        'biosigner_figure-boxplot.pdf ', individual boxplots from selected features.

Note:

1.        Group number must be 2 in the sample group file.

2.        Group names of characters or string are preferred. Numbers are also supported but not recommended.

3.        The algorithm returns the tier of each feature for the selected classifier (s): tier S corresponds to the final signature, i.e., features which have been found significant in all the selection steps; features with tier A have been found significant in all but the last selection, and so on for tier B to D. Tier E regroup all previous round of selection.

Reference:

Rinaudo P, Boudah S, Junot C, et al. biosigner: A New Method for the Discovery of Significant Molecular Signatures from Omics Data[J]. Frontiers in Molecular Biosciences, 2016, 3.