File Exchange

image thumbnail

Uniform Manifold Approximation and Projection (UMAP)

version 1.5.2 (760 KB) by Stephen Meehan
An algorithm for manifold learning and dimension reduction.

80 Downloads

Updated 04 Aug 2020

View License

Given a set of high-dimensional data, run_umap.m produces a lower-dimensional representation of the data for purposes of data visualization and exploration. See the comments at the top of the file run_umap.m for documentation and many examples of how to use this code.

The UMAP algorithm is the invention of Leland McInnes, John Healy, and James Melville. See their original paper for a long-form description (https://arxiv.org/pdf/1802.03426.pdf). Also see the documentation for the original Python implementation (https://umap-learn.readthedocs.io/en/latest/index.html).

This MATLAB implementation follows a very similar structure to the Python implementation, and many of the function descriptions are nearly identical.

Here are some major differences in this MATLAB implementation:
1) All nearest-neighbour searches are performed through the built-in MATLAB function knnsearch.m. The original Python implementation uses random projection trees and nearest-neighbour descent to approximate nearest neighbours of data points. The function knnsearch.m either uses an exhaustive approach or k-d trees, both of which are slow for high-dimensional data. As such, this implementation may slow down more in the case of large, high-dimensional data sets.

2) The MATLAB function eigs.m does not appear to be as fast as the function "eigsh" in the Python package Scipy. For large data sets, we initialize a low-dimensional transform by binning the data using an algorithm known as probability binning. If the user downloads and installs the function lobpcg.m, made available here (https://www.mathworks.com/matlabcentral/fileexchange/48-locally-optimal-block-preconditioned-conjugate-gradient) by Andrew Knyazev, this can be used to find exact eigenvectors for medium-sized data sets.

3) We have built in the optional ability to detect clusters in the low-dimensional output of UMAP. For users with MATLAB R2019a or later, DBSCAN can be used to produce cluster IDs as explained in the code examples. We have also added the ability to match new clusters to old supervisors using quadratic form matching (described at https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5818510/), in the case that test data is transformed using a template created by supervised dimension reduction.

Overall, this MATLAB UMAP implementation tends to be faster than the original Python implementation. This version (1.5) makes all UMAP reductions faster with a new C++ MEX implementation of stochastic gradient descent. Thus 'MEX' is the new default for the input argument 'method'. Due to File Exchange requirements, users must download the .MEX file separately (a link to the file on Google Drive is provided upon calling "run_umap"). As examples 13 to 15 show, you can now test the speed difference between the implementations for yourself on your computer by setting the 'python' argument to true.

Additionally, version 1.5 enables users of supervised templates to request the post reduction services of supervisor matching, QF tree, and QF dissimilarity regardless of their input arguments for 'n_components' and 'verbose'. The function run_umap.m returns the results of these services via the new 4th output argument: extras. The properties of extras are documented in the file umap/UMAP_extra_results.m.
Our thanks to Dr. Julie Elie from UC Berkeley for these supervised template improvements.

Optional toolbox dependencies:
-The Bioinformatics Toolbox is required to change the 'qf_tree' argument.
-The Curve Fitting Toolbox is required to change the 'min_dist' argument.

This implementation is a work in progress. It has been looked over by Leland McInnes, who considers it "a fairly faithful direct translation of the original Python code (except for the nearest neighbour search)". We hope to continue improving it in the future.

Provided by the Herzenberg Lab at Stanford University.

We appreciate all and any help in finding bugs. Our priority has been determining the suitability of our concepts for research publications in flow cytometry for the use of UMAP supervised templates. Thus our testing of run_umap.m has been limited to combinations of correct parameters on suitable data similar to the 19 examples in the run_umap.m header comment. This means that the majority of possible combinations of correct input and output parameters have not been tested. Moreover, testing has not been extensively done on incorrect parameters, incorrect data, or even less suitable “edge” data. However, the Herzenberg Lab has begun and is continuing independent testing to improve this implementation.

Cite As

Connor Meehan, Stephen Meehan, and Wayne Moore (2020). Uniform Manifold Approximation and Projection (UMAP) (https://www.mathworks.com/matlabcentral/fileexchange/71902), MATLAB Central File Exchange.

Comments and Ratings (39)

Stephen Meehan

run_umap now can access fall resources and examples on the Herzenberg lab servers

Stephen Meehan

Correction to my comment below: run_umap can not access examples on the Herzeberg servers..

Hayley Song

Hi, thank you for sharing this library. It's very useful for my project. I noticed that currently 'run_umap' function errors out with 'metric' set to 'spearman', which comes from the absence of 'spearman' as key,value in `U.METRIC_DICT` in THE 'UMap' class definition. Adding (key, value) of ('spearman' and 'spearman') in `Umap.m` (Line 153-161) fixed the error.
It'd be great if you could incorporate this change in the next version. Thanks again for sharing the code!

Hayley Song

Following up with my question below:

It seems like there was a minor mistake of missing 'spearman' in 'UMap' class definition. I was able to make it work with 'separman' metric by adding (key, value) of ('spearman' and 'spearman') in `Umap.m` (Line 153-161) to get the 'spearman'.
It'd be great if you could incorporate this change in the next version. Thanks again for sharing the code!

Ziwei Liu

Hi, I just downloaded this implementation and tried to run the example file (run_umap), but I got the following error:

Error using websave (line 98)
Could not access server. http://cgworkspace.cytogenie.org/GetDown2/demo/samples.zip.

Error in run_umap/downloadCsv (line 902)
websave(zipFile, ...

Error in run_umap (line 352)
csv_file_or_data=downloadCsv;

Could anyone help out with this? Thank you.

Mlfan

Frants Jensen

Hi Stephen, I have the same issue as Cortexlab - would love to use pre-computed (non-euclidean) distance matrix. Do those changes he suggested (Metric-Dict option precomputed, and insert dmat later instead of calculating) make sense? Thanks! -Frants

Yuchun Ding

hi, I'm trying to reduce from 100D to 2D. the size of the dataset is around 100k. I was wondering realistically how long should it take? using the default setting the performance seems really really slow

Michal Kvasnicka

Update 1.5.0 with default MEX files works very well ... thanks for your effort!!!

Richard Gardner

Thanks for sharing this implementation – it's working great for me so far. The only problem I've experienced is in reproducing results with non-default parameter sets, even when I set the 'randomize' argument to false. I wonder if this issue might originate from the curve-fitting process in find_ab_params.m. The fit() function appears to use a random initial state (I see a warning about this every time I execute run_umap.m), and this occurs before UMAP.m sets the random seed. Could this be a bug, or am I getting something wrong?

Cortexlab

HI, thanks so much for porting UMAP from Python to Matlab. I have a question about running UMAP with precomputed distance matrices (which are supported in the Python version). I believe these are almost supported by your code, but one needs to make two modifications in the file UMAP.m: (1) change METRIC_DICT so that it includes the option 'precomputed'; (2) prevent the code from calculating dmat in that case (dmat is what the user passes). I *think* that by making these two changes I got things to work but would you give me your opinion on whether this makes sense? Many thanks
-Matteo

Binxu

Biaobin Jiang

Tristan Wießalla

Stephen Meehan

Both exceptions that Bryan Bates found were reproduced and fixed this week on Feb 21 in update 1.4.1

Stephen Meehan

Thanks Bryan Bates. I suspect run_umap does need more testing of combinations of input parameters. Please send details to me at swmeehan@stanford.edu. I need the input files that file plus an exact copy of the command you type. I look forward to getting a fix to you quick. Thanks again.

Bryan Bates

Hi there! So far this function is awesome and has helped my project loads! However there are a few more bugs that keep appearing that I'm having some trouble squashing. When adding a 'label_column' input argument to run_map() function (and ensuring that my last column of my input data has the labels), I get the following error:

"
Matrix index is out of range for deletion.
Error in run_umap (line 611)
parameter_names(args.label_column)=[];

611 parameter_names(args.label_column)=[];
"

I thought that simply commenting this out was enough of a fix, but then after UMAP runs, right before the last plotting execution, I get the next error:

"
Dot indexing is not supported for variables of this type.
Error in run_umap/updatePlot (line 938)
umap.supervisors.prepareForTemplate;
Error in run_umap (line 822)
updatePlot(reduction, true)

938 umap.supervisors.prepareForTemplate;
"

Could you guys help out with this? Thanks!

Stephen Meehan

Hi Michal. Thanks for your comment. Our overview indicates that the run_umap.m file is the starting place for effectively using this package. If you type "doc run_umap" on the command line AFTER downloading you see a similar extent of textual information to what you see when you type "doc tsne". Can you (or anyone) send us a "how to" link that documents comprehensively how to add additional tabs like "Examples" to file exchange so user can see the comprehensive documentation BEFORE they download? And is there a similar link explaining how to enrich documentation in m files to include pictures and web formatting. Sorry for not knowing this. Thanks again for your interest in improving our submission.

Michal Kvasnicka

I think that is really important to create some comprehensive documentation and/or tutorial with examples. Upgrade of UMAP 1.3.4 -> 1.4.0 significantly change whole UMAP concept (Python codes). I am really not sure, how to effectively use this package. I am just guessing ...

Stephen Meehan

Hi Mohammed,
We have updated the accepted metrics for UMAP in the latest update, 1.4.0. You can try running the new version and seeing if it fixes your problem.
If you are still receiving an error, would you mind sending us exactly what commands you are calling to receive this error so that we can try reproducing the error on our computers? You can e-mail it to us at swmeehan@stanford.edu or connor.gw.meehan@gmail.com.

Camden MacDowell

Really appreciate this contribution. Thank you. Also easy to modify (logical flow). One occasional inconvenience is the restriction on the template file being a saved-off mat file with parameter names, etc. Easy workaround though: removed lines 405 - 422 e.g the two if/than checks for template_file parameters and replaced with

if ischar(template_file)
[umap, ~, canLoad, reOrgData]=Template.Get(inData, parameter_names, ...
template_file, 3);
else
umap = template_file;
canLoad = [];
reOrgData = [];
end

Messy but a quick fix. Now template_file can just be the umap variable when calling run_umap.

Iti Gov

<a href="s">test</a>

Caleb Stoltzfus

Works great. But i ran into an issue. I was running the algorithm, when it terminated midway. Next time whenever I run it, I get this error:

"Error using containers.Map/subsref
The specified key is not present in this container.

Error in UMAP/fit (line 340)
U.metric = U.METRIC_DICT(U.metric);

Error in UMAP/fit_transform (line 496)
U = fit(U, X, y);

Error in run_umap (line 542)
reduction = umap.fit_transform(inData);"

How to fix this? Thanks!

Rasmus Bro

Thanks a lot. With the curve-fitting toolbox installed it works perfectly

Adam Sciambi

This code is fantastic. Thanks for putting it together. I use it daily.

One error that I've encountered though is in function "smooth_knn_dist" around line 81, reproduced below.

rho = aug_dists(idx) + interpolation*(aug_dists(idx) - aug_dists(idx+height));

Sometimes "idx+height" is out of bounds of "aug_dists". Since "idx" itself is defined to go up to numel(aug_dists), this makes sense that it could go over when added to. I just put in a corrective factor shown below and it seems to work. At the edge case, it interpolates one column inward, rather than outward.

correction = zeros(size(idx));
correction(idx+height>numel(aug_dists)) = -height;
rho = aug_dists(idx) + interpolation*(aug_dists(idx+correction) - aug_dists(idx+height+correction));

jiaxin

错误使用 -
矩阵维度必须一致。

出错 smooth_knn_dist (line 84)
d = distances(:,2:end) - rho;

出错 fuzzy_simplicial_set (line 108)
[sigmas, rhos] = smooth_knn_dist(knn_dists, n_neighbors, local_connectivity);

出错 UMAP/fit (line 420)
U.graph = fuzzy_simplicial_set(X, U.n_neighbors, randomState, U.metric,
'metric_kwds', U.metric_kwds,...

出错 UMAP/fit_transform (line 486)
U = fit(U, X, y);

出错 run_umap (line 495)
reduction = umap.fit_transform(inData);

jiaxin

Beatriz Moya

Thanks for the code, it's been very useful! However, I have tried to reduce the model to a 3-dimensional system, but I come up with this error:

Error using UMAP/validate_parameters (line 303)
The Java and C methods currently only support reducing to 2 dimensions

Error in UMAP/fit (line 358)
validate_parameters(U);

Error in UMAP/fit_transform (line 470)
U = fit(U, X, y);

Error in run_umap (line 441)
reduction = umap.fit_transform(inData);

When is this option going to be available?

John

Stephen Meehan

Hi ageorge and Rasmus,

We've looked into the error that you are both receiving. We realized that one of the MATLAB functions that we call, fit.m, actually requires the MATLAB Curve Fitting Toolbox (https://www.mathworks.com/products/curvefitting.html) and we mistakenly did not list this requirement on the download page. If you do not have the Curve Fitting Toolbox installed, this would explain the errors that you are receiving. We have now listed this requirement on the download page.

As a workaround for MATLAB users who do not have the Curve Fitting Toolbox, we have now hard-coded in values for the outputs of find_ab_params.m when the inputs have particular default inputs. In particular, all the examples in the documentation of run_umap.m should now run in the current version 1.2.1 without any problems for users without the Curve Fitting Toolbox.

Rasmus Bro

Hi there

I am really interested in trying this, but I am also running into problems. I tried your updated version here and the one at your homepage. I get the following error (on matlab 2019a)

[reduction,umap] = run_umap(rand(10,100));

ans =

20

ans =

20

java.awt.Point[x=793,y=53] java.awt.Dimension[width=1146,height=1006]
DUDE [UMAP for 10x100
n\_neighbors=\color{blue}30\color{black}, min\_dist=\color{blue}0.3\color{black}, metric=\color{blue}euclidean\color{black},randomize=\color{blue}0\color{black}, labels=\color{blue}0
Undefined function 'fit' for input arguments of type 'function_handle'.

Error in find_ab_params (line 43)
f = fit(xv', yv', curve);

Error in UMAP/fit (line 352)
[U.a, U.b] = find_ab_params(U.spread, U.min_dist);

Error in UMAP/fit_transform (line 470)
U = fit(U, X, y);

Error in run_umap (line 441)
reduction = umap.fit_transform(inData);

Seng Bum Yoo

Thank you so much for the code. I wonder whether your set of codes includes re-embedding of new data to old embedding without modifying the old embeddings. Is init_transform relevant to that purpose?

One question: unless you change input parsing, it seems changing the 'n_epochs' are quite inflexible (I changed by myself). Like n_neighbor, for example, it would be great to have it as a free parameter.

ageorge

I'm getting the following error when run:

Undefined function 'fit' for input arguments of type 'function_handle'.

Error in find_ab_params (line 43)
f = fit(xv', yv', curve);

Error in UMAP/fit (line 352)
[U.a, U.b] = find_ab_params(U.spread, U.min_dist);

Error in UMAP/fit_transform (line 470)
U = fit(U, X, y);

Error in run_umap (line 441)
reduction = umap.fit_transform(inData);

Lucy Davis

Stephen Meehan

Hi Joanna,
Sorry for the delayed response to your issue. We have just uploaded a major update (version 1.2.0) that may resolve the issue, so try downloading the latest version and seeing if it is fixed! What was previously line 273 in version 1.1.0 should now be line 391 in 1.2.0.
If you are still receiving an error, would you mind sending us the full text of the exception so that we can investigate it further? We are having trouble reproducing the error. You can e-mail it to us at swmeehan@stanford.edu or connor.meehan@shaw.ca.
If you require a temporary workaround, we recommend downloading our UMAP distribution directly from our Web site at http://cgworkspace.cytogenie.org/GetDown2/demo/umapDistribution.zip. We are able to include some additional code in this distribution that does not meet File Exchange criteria. If an exception occurs with this version, it will switch to running the algorithm in C instead.

Joanna Polanska

I got the same error as Damon. Line 273: nTh=edu.stanford.facs.swing.Umap.EPOCH_REPORTS+3;
How to deal with it?
Thanks, Joanna

Henri Johansson

How much slower is it than the python implementation?

Damon Clark

Thanks for putting this together! Line 273 in run_umap throws an error for me -- it looks like it may be calling some local variable. I believe I had everything in the path correctly.

nTh=edu.stanford.facs.swing.Umap.EPOCH_REPORTS+3;

Updates

1.5.2

-Removed .exe and .MEX files to comply with File Exchange requirements. Users are now encouraged to download these from our Google Drive if they wish to significantly speed up run_umap.
-Added examples 17 to 19 in run_umap header comment.

1.3.4

-Fixed a bug in SGD in Java where data was unintentionally stored as two distinct objects
-Added QF trees and dissimilarity plots
-Added an experimental joined_transform method that outperforms transform() when training data is missing populations

1.3.3

-Fixed some minor cosmetic issues such as suboptimal plot scaling

1.3.2

-If applying a UMAP template on data that appears to have new populations, a warning occurs and the option is given to perform a re-supervised reduction
-Fixed an indexing error occurring in smooth_knn_dist.m if data had too many identical points

1.3.1

-Fixed a GUI bug that would occur for users with MATLAB R2018b or earlier

1.3.0

-Data can now be reduced to any number of dimensions by changing the 'n_components' parameter; if reducing to more than 2 dimensions, a 3D plot is shown
-DBSCAN can be used to cluster UMAP output
-The 'n_epochs' parameter can now be manually changed

1.2.1

-Added precomputed parameter values for users without the Curve Fitting Toolbox
-Fixed an issue when using transform() on new data sets of same size of previous embedding and improved adjacency matrix for transform()
-Improved progress bars

1.2.0

-Added 2 examples (run_umap.m) showing how to perform supervised dimension reduction with UMAP
-Improved labelling of plots; for supervised UMAP, the plot includes a legend with labels from the categorical data
-Explained proper MATLAB path settings

MATLAB Release Compatibility
Created with R2020a
Compatible with R2017a to R2020a
Platform Compatibility
Windows macOS Linux
Acknowledgements

Inspired: CytoMAP

umap

util