Introduction:
This tool intends to annotate compounds from the
GC-MS peak table by matching mass spectra and/or retention times of
public/custom library and detected peaks. If you want to use the custom library
for annotation, a standard MSP format file is required.
Input files:
1.
Raw peak table file in
Tab-delimited txt format, can be obtained from tool "metaMS.runGC"
or tool "eRah". First column is the
compound identifier, the others are samples.
For example:
|
AlignID |
STDmix_GC_01 |
STDmix_GC_02 |
STDmix_GC_03 |
|
EC1 |
1486892478 |
561322777 |
3448620272 |
|
EC2 |
5492977592 |
684434115 |
3265669981 |
|
EC3 |
2265686433 |
4182838129 |
4365291513 |
|
EC4 |
13390154 |
12612932 |
21155307 |
|
EC5 |
14588107 |
8510918 |
7224351 |
2.
Corresponding compound mass
spectrum information file in MSP format, can be obtained from tool "metaMS.runGC" or tool "eRah".
The identifier must be same in two files. Fields to be used are 'Name ' 'rt ' and mass spectrum information, others are ignored.
For
example:
Name:
EC1
rt: 3.4253
FoundIn: 3
Comments: MSP spectra exported by eRah
Num Peaks: 10
32 838; 33 60; 40 42; 41 54; 42 815; 43
713;
43 713; 47 1000; 48 36; 49 6; 77 20;
Name: EC2
rt: 3.7521
FoundIn: 3
Comments: MSP spectra exported by eRah
Num Peaks: 13
30 1000; 31 335; 32 91; 33 11; 40 12; 41
47;
41 47; 42 232; 43 299; 45 189; 46 831; 47
348;
47 348; 48 6; 77 11;
Parameter:
1.
normalized dot product: Matching factor function for mass spectrum. The function
applies weights to an input to get weighted outputs.
2.
normalized Euclidean distance: Matching factor function for mass spectrum.
3.
mass spectrum similarity cutoff: 0-1, more similar lager matching factor.
4.
RT window: The retention time
difference that can be allowed.
5.
NSEN: An integrated library
derived from NIST/EPA/NIH. It is the default public library.
6.
GMD_ALK: A public database from
the Golm Metabolome
Database (GMD). ALK - based on 9 n-alkanes (C10¨CC36).
7.
GMD_FAME: A public database from
the Golm Metabolome
Database (GMD). FAME - based on 13 fatty acid methyl esters (C8 ME¨CC30 ME).
8.
GMD_MSIR: The 'Q_MSRI_ID' GC-Quadrupole-MS MSRI Database of Golm
Metabolome library.
9.
MoNA-HMDB: It is derived from MassBank of North
America, with 4620 spectra(http://mona.fiehnlab.ucdavis.edu/downloads).
10.
MoNA-MetaboBASE: It is derived from MassBank of North America,
with 1254 spectra (http://mona.fiehnlab.ucdavis.edu/downloads).
11.
MoNA-ReSpect: It is derived from MassBank of North America,
with 6290 spectra(http://mona.fiehnlab.ucdavis.edu/downloads).
Output files:
1.
'identified_pkTable.txt',
identified peak table file in Tab-delimited txt format. containing duplicate
row names possibly.
For example:
|
AlignID |
STDmix_GC_01 |
STDmix_GC_02 |
STDmix_GC_03 |
|
EC1 |
1486892478 |
561322777 |
3448620272 |
|
Nitrogen
dioxide |
5492977592 |
684434115 |
3265669981 |
|
Ethanol,
2-fluoro- |
2265686433 |
4182838129 |
4365291513 |
|
3-Pentanone,
2,2,4,4-tetramethyl- |
13390154 |
12612932 |
21155307 |
|
Hydrazine |
14588107 |
8510918 |
7224351 |
2.
'identified_uniq_pkTable.txt',
identified unique peak table file in Tab-delimited txt format. When row names
are duplication, the row with the maximum intensity will be retained.
3.
'detailed_information.txt', detailed
information about query and database relationship in library searching.
Columns are:
|
'Query' |
Query
name in inputs. |
|
'DB' |
Matched
compound name in database |
|
'matchFactor' |
matching factor, from 0 to 1, more
similar have a larger matching factor. |
|
'rt_diff' |
retention time difference on query and
matched compound. |
|
'query_rt' |
query retention time. |
|
'query_ms' |
query mass spectrum information. |
|
'db_rt' |
matched compound, retention time. |
|
'db_ms' |
matched compound, mass spectrum
information. |
Note:
Reference:
[1]
Schauer N, Steinhauser D, Strelkov
S, Schomburg D, Allison G, Moritz T, Lundgren K, Roessner-Tunali
U, Forbes MG, Willmitzer L, Fernie
AR, Kopka J: GC-MS libraries for the rapid
identification of metabolites in complex biological samples. FEBS Lett 2005,
579(6):1332¨C1337. 10.1016/j.febslet.2005.01.029
[2]
Kopka, J., Schauer, N., Krueger,
S., Birkemeyer, C., Usadel,
B., Bergmuller, E., Dormann,
P., Weckwerth, W., Gibon,
Y., Stitt, M., Willmitzer, L., Fernie,
A.R. and Steinhauser, D. (2005) GMD@CSB.DB: the Golm Metabolome Database,
Bioinformatics, 21, 1635-1638.