Introduction:
This tool takes a peak table file as input and
fills the missing values (zero, null value or ‘NA’, or negative values) with 1)
the a*min value, where 'a' is a user-defined coefficient; 2) 'min' which is the
minimum non-negative value in the peak table; 3) user-specified value; 4)
values computed by 'KNN'; and 5) values computed by 'qirlc'.
The ‘qirlc’ algorithm is especially suitable for
left-censored data.
Input files:
1.
Peak table file in Tab-delimited
text format with the first column as the compound identifier and others as
samples.
For example:
|
AlignID |
STDmix_GC_01 |
STDmix_GC_02 |
STDmix_GC_03 |
|
1 |
1486892478 |
561322777 |
3448620272 |
|
Nitrogen
dioxide |
5492977592 |
684434115 |
3265669981 |
|
Ethanol,
2-fluoro- |
2265686433 |
4182838129 |
4365291513 |
|
3-Pentanone,
2,2,4,4-tetramethyl- |
13390154 |
12612932 |
21155307 |
|
Hydrazine |
14588107 |
8510918 |
7224351 |
Output files:
1.
'zero_filled_pkTable.txt', zero filled peak table file in Tab-delimited txt format.
Reference:
If you use 'KNN' method, references:
[1]
Hastie, T., Tibshirani,
R., Sherlock, G., Eisen, M., Brown, P. and Botstein, D., Imputing Missing Data
for Gene Expression Arrays, Stanford University Statistics Department Technical
report (1999).
[2]
Olga Troyanskaya,
Michael Cantor, Gavin Sherlock, Pat Brown, Trevor Hastie, Robert Tibshirani, David Botstein and Russ B. Altman, Missing
value estimation methods for DNA microarrays BIOINFORMATICS Vol. 17 no. 6, 2001
Pages 520-525
If you use 'qirlc'
method, references:
[3]
QRILC: a quantile
regression approach for the imputation of left-censored missing data in
quantitative proteomics, Cosmin Lazar et al.
[4]
Wei R, Wang J, Su M, et al.
Missing Value Imputation Approach for Mass Spectrometry-based Metabolomics
Data: [J]. Scientific Reports, 2018, 8(1).
[5]
Wei R, Wang J, Jia E, et al. GSimp: A Gibbs sampler based left-censored missing value
imputation approach for metabolomics studies: [J]. Plos
Computational Biology, 2018, 14(1):e1005973.