knnimpute
Impute missing data using nearest-neighbor method
Syntax
Description
imputedData = knnimpute(data)imputedData after replacing NaNs in the
        input data with the corresponding value from the nearest-neighbor
        column. If the corresponding value from the nearest-neighbor column is also
          NaN, the next nearest column is used. The function calculates the
        Euclidean distance between observation columns by using only the rows with no
          NaN values. Thus, the data must have at least one row that contains no
          NaN.
imputedData = knnimpute(data,k)NaNs in Data with a weighted mean of the
          k nearest-neighbor columns. The weights are inversely proportional to
        the distances from the neighboring columns. 
imputedData = knnimpute(data,k,Name,Value)imputedData = knnimpute(data,k,'Distance','mahalanobis') uses the
        Mahalanobis distance to compute the nearest-neighbor columns.
Examples
Input Arguments
Name-Value Arguments
Output Arguments
References
[1] Speed, T. (2003). Statistical Analysis of Gene Expression Microarray Data (Chapman & Hall/CRC).
[2] Hastie, T., Tibshirani, R., Sherlock, G., Eisen, M., Brown, P., and Botstein, D. (1999). “Imputing missing data for gene expression arrays”, Technical Report, Division of Biostatistics, Stanford University.
[3] Troyanskaya, O., Cantor, M., Sherlock, G., Brown, P., Hastie, T., Tibshirani, R., Botstein, D., and Altman, R. (2001). Missing value estimation methods for DNA microarrays. Bioinformatics 17(6), 520–525.
Version History
Introduced before R2006a