Introduction:

This tool performs the OPLS-DA algorithm to rank peaks on the inputted table by variable importance in projection (VIP). Group information is given by a group design file (Tab-delimited text file). OPLS-DA is only available for binary classification and the number of groups should be 2.

The orthogonal partial least-squares (OPLS) algorithm was introduced by J. Trygg and Wold (2002) in order to model separately the variations of the predictors correlated and orthogonal to the response. It has a similar predictive capacity compared to PLS and improves the interpretation of the predictive components and of the systematic variation (Pinto, Trygg, and Gottfries 2012). In particular, OPLS modeling of single responses only requires one predictive component. Diagnostics such as the Q2Y metrics and permutation testing are of high importance to avoid overfitting and assess the statistical significance of the model. The VIP, which reflects both the loading weights for each component and the variability of the response explained by this component (Pinto, Trygg, and Gottfries 2012; Mehmood et al. 2012), can be used for feature ranking and selection (J. Trygg and Wold 2002; Pinto, Trygg, and Gottfries 2012).

Input files:

1.      Peak table file in Tab-delimited text format, with the first column as the compound identifier and the others as samples.

1.          For example:

HU_011

HU_014

HU_015

HU_017

HU_018

HU_019

(2-methoxyethoxy)propanoic acid isomer

3.019766

3.814339

3.519691

2.562183

3.781922

4.161074

(gamma)Glu-Leu/Ile

3.888479

4.277149

4.195649

4.32376

4.629329

4.412266

1-Methyluric acid

3.869006

3.837704

4.102254

4.53852

4.178829

4.516805

1-Methylxanthine

3.717259

3.776851

4.291665

4.432216

4.11736

4.562052

1,3-Dimethyluric acid

3.535461

3.932581

3.955376

4.228491

4.005545

4.320582

1,7-Dimethyluric acid

3.325199

4.025125

3.972904

4.109927

4.024092

4.326856

2-acetamido-4-methylphenyl acetate

4.204754

5.181858

3.88568

4.237915

1.852994

4.080681

2-Aminoadipic acid

4.080204

4.359246

4.249111

4.231404

4.323679

4.244485

 

2.      Group design file in Tab-delimited text file with two columns (samplename     groupname).

For example:

HU_011

M

HU 014

F

HU_015

M

HU_017

M

HU_018

M

HU_019

M

Parameter:

1.        VIP-value threshold: A numerical variable indicating the cutoff of Variable Importance in Projection.

Output files:

1.      'OPLSDA_Score.txt', Component (scores) matrix, P1 refers to 1th score and O1 refers to 1th orthogonal score.

2.      'OPLSDA_VIP.txt', Columns: feature name, VIP, Corr.Coeffs (refers to correlation coefficient between raw data and 1th score data), Corr.P, FDR.

3.      'OPLSDA_VIP_Sig.txt', significant result.

4.      'OPLSDA_Permutation.txt', permutation result.

5.      'Fitted_Curve_Parameter.txt', parameters about fitted curve in permutation plot.

6.      'OPLSDA_R2X_R2Y_Q2.txt', data frame with the model overview.

7.      'OPLSDA_VPlot.pdf', visualization about 'OPLSDA_VIP.txt' data.

8.      'OPLSDA _Score_2D_Label.pdf', OPLSDA scatter plot using P1 and O1 values with the sample name label.

9.      'OPLSDA _Score_2D.pdf', OPLSDA scatter plot using P1 and O1 values without the sample name label.

10.  'OPLSDA_Permutation.pdf', visualization about 'OPLSDA_ Permutation.txt' data.

11.  'OPLSDA_R2X_R2Y_Q2.pdf', visualization about ' OPLSDA_R2X_R2Y_Q2.txt' data.

Note

Group number must be 2 in the sample group file.

Group names of characters or string are preferred. Numbers are also supported but not recommended.

Reference:

[1]     Thevenot, E.A., Roux, A., Xu, Y., Ezan, E., Junot, C. 2015. Analysis of the human adult urinary metabolome variations with age, body mass index and gender by implementing a comprehensive workflow for univariate and OPLS statistical analyses. Journal of Proteome Research. 14: 3322-3335.

[2]     Trygg J, Wold S. Orthogonal projections to latent structures (O-PLS) [J]. Journal of  Chemometrics 2002,16:119 –128.

[3]     Rui C P, Trygg J, Gottfries J. Advantages of orthogonal inspection in chemometrics[J]. Journal of Chemometrics, 2012, 26(6):231–235.

[4]     Mehmood, T., KH. Liland, L. Snipen, and S. Saebo. 2012. “A Review of Variable Selection Methods in Partial Least Squares Regression.” Chemometrics and Intelligent Laboratory Systems 118 (0): 62–69.

[5]     Galindo-Prieto B., Eriksson L. and Trygg J. (2014). Variable influence on projection (VIP) for orthogonal projections to latent structures (OPLS). Journal of Chemometrics 28, 623-632.