Introduction:

This tool takes a peak table file as input and performs cluster analysis on it. The metabolites are classified into several groups according to their distance or variation similarity.

Input files:

1.      Peak table file in Tab-delimited txt format, with the first column as the compound identifier and others as samples.

For example:

HU_011

HU_014

HU_015

HU_017

HU_018

HU_019

(2-methoxyethoxy)propanoic acid isomer

3.019766

3.814339

3.519691

2.562183

3.781922

4.161074

(gamma)Glu-Leu/Ile

3.888479

4.277149

4.195649

4.32376

4.629329

4.412266

1-Methyluric acid

3.869006

3.837704

4.102254

4.53852

4.178829

4.516805

1-Methylxanthine

3.717259

3.776851

4.291665

4.432216

4.11736

4.562052

1,3-Dimethyluric acid

3.535461

3.932581

3.955376

4.228491

4.005545

4.320582

1,7-Dimethyluric acid

3.325199

4.025125

3.972904

4.109927

4.024092

4.326856

2-acetamido-4-methylphenyl acetate

4.204754

5.181858

3.88568

4.237915

1.852994

4.080681

2-Aminoadipic acid

4.080204

4.359246

4.249111

4.231404

4.323679

4.244485

Parameter:

Distance calculate method:

1.        Euclidean:

The Euclidean distance between points p and q is the length of the line segment connecting them.

2.        Correlation distance:

Correlation coefficient:      

3.        Canberra distance£ºsum (|p_i - q_i| / |p_i + q_i|). Terms with zero numerator and denominator are omitted from the sum and treated as if the values were missing. This is intended for non-negative values (e.g., counts): take the absolute value of the denominator.

4.        Binary distance: The vectors are regarded as binary bits, so non-zero elements are ¡®on¡¯ and zero elements are ¡®off¡¯. The distance is the proportion of bits in which the only one is on amongst those in which at least one is on.

5.        Minkowski distance: The p norm, the pth root of the sum of the pth powers of the differences between the components.

6.        Manhattan: Absolute distance between the two vectors.

where (p, q) are vectors.

7.        maximum distance£ºMaximum distance between two components of x and y (supremum norm).

 

Cluster methods:

1.        ward: Ward's minimum variance method aims at finding compact, spherical clusters.

2.        complete: The complete linkage method finds similar clusters.

3.        single: The single linkage method (which is closely related to the minimal spanning tree) adopts a ¡®friends of friends¡¯ clustering strategy.

The other methods can be regarded as aiming for clusters with characteristics somewhere between the single and complete link methods.

4.        centroid: Method "centroid" is typically meant to be used with squared Euclidean distances.

5.        average: The average distance method measures the average distance between each pair of observations

6.        mcquitty: It finds the similar cluster.

7.        median: Median distance method.

Output files:

1.      'subcluster_1.matrix.txt', subcluster data information.

2.      'subcluster_1.pdf', subcluster line chart.