Introduction:

This tool intends to annotate compounds from the GC-MS peak table by matching mass spectra and/or retention times of public/custom library and detected peaks. If you want to use the custom library for annotation, a standard MSP format file is required.

Input files:

1.      Raw peak table file in Tab-delimited txt format, can be obtained from tool "metaMS.runGC" or tool "eRah". First column is the compound identifier, the others are samples.

For example:

AlignID

STDmix_GC_01

STDmix_GC_02

STDmix_GC_03

EC1

1486892478

561322777

3448620272

EC2

5492977592

684434115

3265669981

EC3

2265686433

4182838129

4365291513

EC4

13390154

12612932

21155307

EC5

14588107

8510918

7224351

 

2.      Corresponding compound mass spectrum information file in MSP format, can be obtained from tool "metaMS.runGC" or tool "eRah". The identifier must be same in two files. Fields to be used are 'Name ' 'rt ' and mass spectrum information, others are ignored.

For example:

Name: EC1

 rt: 3.4253

 FoundIn: 3

 Comments: MSP spectra exported by eRah

 Num Peaks: 10

 32 838; 33 60; 40 42; 41 54; 42 815; 43 713;

 43 713; 47 1000; 48 36; 49 6; 77 20;

 

 Name: EC2

 rt: 3.7521

 FoundIn: 3

 Comments: MSP spectra exported by eRah

 Num Peaks: 13

 30 1000; 31 335; 32 91; 33 11; 40 12; 41 47;

 41 47; 42 232; 43 299; 45 189; 46 831; 47 348;

 47 348; 48 6; 77 11;

Parameter:

1.        normalized dot product: Matching factor function for mass spectrum. The function applies weights to an input to get weighted outputs.

2.        normalized Euclidean distance: Matching factor function for mass spectrum.

3.        mass spectrum similarity cutoff: 0-1, more similar lager matching factor.

4.        RT window: The retention time difference that can be allowed.

5.        NSEN: An integrated library derived from NIST/EPA/NIH. It is the default public library.

6.        GMD_ALK: A public database from the Golm Metabolome Database (GMD). ALK - based on 9 n-alkanes (C10¨CC36).

7.        GMD_FAME: A public database from the Golm Metabolome Database (GMD). FAME - based on 13 fatty acid methyl esters (C8 ME¨CC30 ME).

8.        GMD_MSIR: The 'Q_MSRI_ID' GC-Quadrupole-MS MSRI Database of Golm Metabolome library.

9.        MoNA-HMDB: It is derived from MassBank of North America, with 4620 spectra(http://mona.fiehnlab.ucdavis.edu/downloads).

10.    MoNA-MetaboBASE: It is derived from MassBank of North America, with 1254 spectra (http://mona.fiehnlab.ucdavis.edu/downloads).

11.    MoNA-ReSpect: It is derived from MassBank of North America, with 6290 spectra(http://mona.fiehnlab.ucdavis.edu/downloads).

Output files:

1.      'identified_pkTable.txt', identified peak table file in Tab-delimited txt format. containing duplicate row names possibly.

For example:

AlignID

STDmix_GC_01

STDmix_GC_02

STDmix_GC_03

EC1

1486892478

561322777

3448620272

Nitrogen dioxide

5492977592

684434115

3265669981

Ethanol, 2-fluoro-

2265686433

4182838129

4365291513

3-Pentanone, 2,2,4,4-tetramethyl-

13390154

12612932

21155307

Hydrazine

14588107

8510918

7224351

 

2.      'identified_uniq_pkTable.txt', identified unique peak table file in Tab-delimited txt format. When row names are duplication, the row with the maximum intensity will be retained.

 

3.      'detailed_information.txt', detailed information about query and database relationship in library searching.

Columns are:

'Query'

Query name in inputs.

'DB'

Matched compound name in database

'matchFactor'

matching factor, from 0 to 1, more similar have a larger matching factor.

'rt_diff'

retention time difference on query and matched compound.

'query_rt'

query retention time.

'query_ms'

query mass spectrum information.

'db_rt'

matched compound, retention time.

'db_ms'

matched compound, mass spectrum information.

Note:

There is no retention time field in the public library and only mass spectrum information is used for annotation. For a custom library, this tool supports the joint annotation by mass spectrum and retention time. Users can provide an in-house library file in MSP format containing the field 'rt'. This is an optional field. A compound can be repeated in the database with the same 'Name ' but a different mass spectrum. In this case, the best hit will be outputted.

Reference:

[1]     Schauer N, Steinhauser D, Strelkov S, Schomburg D, Allison G, Moritz T, Lundgren K, Roessner-Tunali U, Forbes MG, Willmitzer L, Fernie AR, Kopka J: GC-MS libraries for the rapid identification of metabolites in complex biological samples. FEBS Lett 2005, 579(6):1332¨C1337. 10.1016/j.febslet.2005.01.029

[2]     Kopka, J., Schauer, N., Krueger, S., Birkemeyer, C., Usadel, B., Bergmuller, E., Dormann, P., Weckwerth, W., Gibon, Y., Stitt, M., Willmitzer, L., Fernie, A.R. and Steinhauser, D. (2005) GMD@CSB.DB: the Golm Metabolome Database, Bioinformatics, 21, 1635-1638.