Introduction:

This tool takes a peak table file as input and fills the missing values (zero, null value or ‘NA’, or negative values) with 1) the a*min value, where 'a' is a user-defined coefficient; 2) 'min' which is the minimum non-negative value in the peak table; 3) user-specified value; 4) values computed by 'KNN'; and 5) values computed by 'qirlc'. The ‘qirlc’ algorithm is especially suitable for left-censored data.

Input files:

1.      Peak table file in Tab-delimited text format with the first column as the compound identifier and others as samples.

For example:

AlignID

STDmix_GC_01

STDmix_GC_02

STDmix_GC_03

1

1486892478

561322777

3448620272

Nitrogen dioxide

5492977592

684434115

3265669981

Ethanol, 2-fluoro-

2265686433

4182838129

4365291513

3-Pentanone, 2,2,4,4-tetramethyl-

13390154

12612932

21155307

Hydrazine

14588107

8510918

7224351

Output files:

1.      'zero_filled_pkTable.txt', zero filled peak table file in Tab-delimited txt format.

Reference:

If you use 'KNN' method, references:

[1]     Hastie, T., Tibshirani, R., Sherlock, G., Eisen, M., Brown, P. and Botstein, D., Imputing Missing Data for Gene Expression Arrays, Stanford University Statistics Department Technical report (1999).

[2]     Olga Troyanskaya, Michael Cantor, Gavin Sherlock, Pat Brown, Trevor Hastie, Robert Tibshirani, David Botstein and Russ B. Altman, Missing value estimation methods for DNA microarrays BIOINFORMATICS Vol. 17 no. 6, 2001 Pages 520-525

 

If you use 'qirlc' method, references:

[3]     QRILC: a quantile regression approach for the imputation of left-censored missing data in quantitative proteomics, Cosmin Lazar et al.

[4]     Wei R, Wang J, Su M, et al. Missing Value Imputation Approach for Mass Spectrometry-based Metabolomics Data: [J]. Scientific Reports, 2018, 8(1).

[5]     Wei R, Wang J, Jia E, et al. GSimp: A Gibbs sampler based left-censored missing value imputation approach for metabolomics studies: [J]. Plos Computational Biology, 2018, 14(1):e1005973.