Given a set of highdimensional data, run_umap.m produces a lowerdimensional representation of the data for purposes of data visualization and exploration. See the comments at the top of the file run_umap.m for documentation and many examples of how to use this code.
The UMAP algorithm is the invention of Leland McInnes, John Healy, and James Melville. See their original paper for a longform description (https://arxiv.org/pdf/1802.03426.pdf). Also see the documentation for the original Python implementation (https://umaplearn.readthedocs.io/en/latest/index.html).
This MATLAB implementation follows a very similar structure to the Python implementation, and many of the function descriptions are nearly identical.
Here are some major differences in this MATLAB implementation:
1) All nearestneighbour searches are performed through the builtin MATLAB function knnsearch.m. The original Python implementation uses random projection trees and nearestneighbour descent to approximate nearest neighbours of data points. The function knnsearch.m either uses an exhaustive approach or kd trees, both of which are slow for highdimensional data. As such, this implementation may slow down more in the case of large, highdimensional data sets.
2) The MATLAB function eigs.m does not appear to be as fast as the function "eigsh" in the Python package Scipy. For large data sets, we initialize a lowdimensional transform by binning the data using an algorithm known as probability binning. If the user downloads and installs the function lobpcg.m, made available here (https://www.mathworks.com/matlabcentral/fileexchange/48locallyoptimalblockpreconditionedconjugategradient) by Andrew Knyazev, this can be used to find exact eigenvectors for mediumsized data sets.
3) We have built in the optional ability to detect clusters in the lowdimensional output of UMAP. For users with MATLAB R2019a or later, DBSCAN can be used to produce cluster IDs as explained in the code examples. We have also added the ability to match new clusters to old supervisors using quadratic form matching (described at https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5818510/), in the case that test data is transformed using a template created by supervised dimension reduction.
Overall, this MATLAB UMAP implementation tends to be faster than the original Python implementation. This version (1.5) makes all UMAP reductions faster with a new C++ MEX implementation of stochastic gradient descent. Thus 'MEX' is the new default for the input argument 'method'. Due to File Exchange requirements, users must download the .MEX file separately (a link to the file on Google Drive is provided upon calling "run_umap"). As examples 13 to 15 show, you can now test the speed difference between the implementations for yourself on your computer by setting the 'python' argument to true.
Additionally, version 1.5 enables users of supervised templates to request the post reduction services of supervisor matching, QF tree, and QF dissimilarity regardless of their input arguments for 'n_components' and 'verbose'. The function run_umap.m returns the results of these services via the new 4th output argument: extras. The properties of extras are documented in the file umap/UMAP_extra_results.m.
Our thanks to Dr. Julie Elie from UC Berkeley for these supervised template improvements.
Optional toolbox dependencies:
The Bioinformatics Toolbox is required to change the 'qf_tree' argument.
The Curve Fitting Toolbox is required to change the 'min_dist' argument.
This implementation is a work in progress. It has been looked over by Leland McInnes, who considers it "a fairly faithful direct translation of the original Python code (except for the nearest neighbour search)". We hope to continue improving it in the future.
Provided by the Herzenberg Lab at Stanford University.
We appreciate all and any help in finding bugs. Our priority has been determining the suitability of our concepts for research publications in flow cytometry for the use of UMAP supervised templates. Thus our testing of run_umap.m has been limited to combinations of correct parameters on suitable data similar to the 19 examples in the run_umap.m header comment. This means that the majority of possible combinations of correct input and output parameters have not been tested. Moreover, testing has not been extensively done on incorrect parameters, incorrect data, or even less suitable “edge” data. However, the Herzenberg Lab has begun and is continuing independent testing to improve this implementation.
Connor Meehan, Stephen Meehan, and Wayne Moore (2020). Uniform Manifold Approximation and Projection (UMAP) (https://www.mathworks.com/matlabcentral/fileexchange/71902), MATLAB Central File Exchange.
1.5.2  Removed .exe and .MEX files to comply with File Exchange requirements. Users are now encouraged to download these from our Google Drive if they wish to significantly speed up run_umap.


1.3.4  Fixed a bug in SGD in Java where data was unintentionally stored as two distinct objects


1.3.3  Fixed some minor cosmetic issues such as suboptimal plot scaling 

1.3.2  If applying a UMAP template on data that appears to have new populations, a warning occurs and the option is given to perform a resupervised reduction


1.3.1  Fixed a GUI bug that would occur for users with MATLAB R2018b or earlier 

1.3.0  Data can now be reduced to any number of dimensions by changing the 'n_components' parameter; if reducing to more than 2 dimensions, a 3D plot is shown


1.2.1  Added precomputed parameter values for users without the Curve Fitting Toolbox


1.2.0  Added 2 examples (run_umap.m) showing how to perform supervised dimension reduction with UMAP

Inspired: CytoMAP
Create scripts with code, output, and formatted text in a single executable document.
run_umap now can access fall resources and examples on the Herzenberg lab servers
Correction to my comment below: run_umap can not access examples on the Herzeberg servers..
Hi, thank you for sharing this library. It's very useful for my project. I noticed that currently 'run_umap' function errors out with 'metric' set to 'spearman', which comes from the absence of 'spearman' as key,value in `U.METRIC_DICT` in THE 'UMap' class definition. Adding (key, value) of ('spearman' and 'spearman') in `Umap.m` (Line 153161) fixed the error.
It'd be great if you could incorporate this change in the next version. Thanks again for sharing the code!
Following up with my question below:
It seems like there was a minor mistake of missing 'spearman' in 'UMap' class definition. I was able to make it work with 'separman' metric by adding (key, value) of ('spearman' and 'spearman') in `Umap.m` (Line 153161) to get the 'spearman'.
It'd be great if you could incorporate this change in the next version. Thanks again for sharing the code!
Hi, I just downloaded this implementation and tried to run the example file (run_umap), but I got the following error:
Error using websave (line 98)
Could not access server. http://cgworkspace.cytogenie.org/GetDown2/demo/samples.zip.
Error in run_umap/downloadCsv (line 902)
websave(zipFile, ...
Error in run_umap (line 352)
csv_file_or_data=downloadCsv;
Could anyone help out with this? Thank you.
Hi Stephen, I have the same issue as Cortexlab  would love to use precomputed (noneuclidean) distance matrix. Do those changes he suggested (MetricDict option precomputed, and insert dmat later instead of calculating) make sense? Thanks! Frants
hi, I'm trying to reduce from 100D to 2D. the size of the dataset is around 100k. I was wondering realistically how long should it take? using the default setting the performance seems really really slow
Update 1.5.0 with default MEX files works very well ... thanks for your effort!!!
Thanks for sharing this implementation – it's working great for me so far. The only problem I've experienced is in reproducing results with nondefault parameter sets, even when I set the 'randomize' argument to false. I wonder if this issue might originate from the curvefitting process in find_ab_params.m. The fit() function appears to use a random initial state (I see a warning about this every time I execute run_umap.m), and this occurs before UMAP.m sets the random seed. Could this be a bug, or am I getting something wrong?
HI, thanks so much for porting UMAP from Python to Matlab. I have a question about running UMAP with precomputed distance matrices (which are supported in the Python version). I believe these are almost supported by your code, but one needs to make two modifications in the file UMAP.m: (1) change METRIC_DICT so that it includes the option 'precomputed'; (2) prevent the code from calculating dmat in that case (dmat is what the user passes). I *think* that by making these two changes I got things to work but would you give me your opinion on whether this makes sense? Many thanks
Matteo
Both exceptions that Bryan Bates found were reproduced and fixed this week on Feb 21 in update 1.4.1
Thanks Bryan Bates. I suspect run_umap does need more testing of combinations of input parameters. Please send details to me at swmeehan@stanford.edu. I need the input files that file plus an exact copy of the command you type. I look forward to getting a fix to you quick. Thanks again.
Hi there! So far this function is awesome and has helped my project loads! However there are a few more bugs that keep appearing that I'm having some trouble squashing. When adding a 'label_column' input argument to run_map() function (and ensuring that my last column of my input data has the labels), I get the following error:
"
Matrix index is out of range for deletion.
Error in run_umap (line 611)
parameter_names(args.label_column)=[];
611 parameter_names(args.label_column)=[];
"
I thought that simply commenting this out was enough of a fix, but then after UMAP runs, right before the last plotting execution, I get the next error:
"
Dot indexing is not supported for variables of this type.
Error in run_umap/updatePlot (line 938)
umap.supervisors.prepareForTemplate;
Error in run_umap (line 822)
updatePlot(reduction, true)
938 umap.supervisors.prepareForTemplate;
"
Could you guys help out with this? Thanks!
Hi Michal. Thanks for your comment. Our overview indicates that the run_umap.m file is the starting place for effectively using this package. If you type "doc run_umap" on the command line AFTER downloading you see a similar extent of textual information to what you see when you type "doc tsne". Can you (or anyone) send us a "how to" link that documents comprehensively how to add additional tabs like "Examples" to file exchange so user can see the comprehensive documentation BEFORE they download? And is there a similar link explaining how to enrich documentation in m files to include pictures and web formatting. Sorry for not knowing this. Thanks again for your interest in improving our submission.
I think that is really important to create some comprehensive documentation and/or tutorial with examples. Upgrade of UMAP 1.3.4 > 1.4.0 significantly change whole UMAP concept (Python codes). I am really not sure, how to effectively use this package. I am just guessing ...
Hi Mohammed,
We have updated the accepted metrics for UMAP in the latest update, 1.4.0. You can try running the new version and seeing if it fixes your problem.
If you are still receiving an error, would you mind sending us exactly what commands you are calling to receive this error so that we can try reproducing the error on our computers? You can email it to us at swmeehan@stanford.edu or connor.gw.meehan@gmail.com.
Really appreciate this contribution. Thank you. Also easy to modify (logical flow). One occasional inconvenience is the restriction on the template file being a savedoff mat file with parameter names, etc. Easy workaround though: removed lines 405  422 e.g the two if/than checks for template_file parameters and replaced with
if ischar(template_file)
[umap, ~, canLoad, reOrgData]=Template.Get(inData, parameter_names, ...
template_file, 3);
else
umap = template_file;
canLoad = [];
reOrgData = [];
end
Messy but a quick fix. Now template_file can just be the umap variable when calling run_umap.
<a href="s">test</a>
Works great. But i ran into an issue. I was running the algorithm, when it terminated midway. Next time whenever I run it, I get this error:
"Error using containers.Map/subsref
The specified key is not present in this container.
Error in UMAP/fit (line 340)
U.metric = U.METRIC_DICT(U.metric);
Error in UMAP/fit_transform (line 496)
U = fit(U, X, y);
Error in run_umap (line 542)
reduction = umap.fit_transform(inData);"
How to fix this? Thanks!
Thanks a lot. With the curvefitting toolbox installed it works perfectly
This code is fantastic. Thanks for putting it together. I use it daily.
One error that I've encountered though is in function "smooth_knn_dist" around line 81, reproduced below.
rho = aug_dists(idx) + interpolation*(aug_dists(idx)  aug_dists(idx+height));
Sometimes "idx+height" is out of bounds of "aug_dists". Since "idx" itself is defined to go up to numel(aug_dists), this makes sense that it could go over when added to. I just put in a corrective factor shown below and it seems to work. At the edge case, it interpolates one column inward, rather than outward.
correction = zeros(size(idx));
correction(idx+height>numel(aug_dists)) = height;
rho = aug_dists(idx) + interpolation*(aug_dists(idx+correction)  aug_dists(idx+height+correction));
错误使用 
矩阵维度必须一致。
出错 smooth_knn_dist (line 84)
d = distances(:,2:end)  rho;
出错 fuzzy_simplicial_set (line 108)
[sigmas, rhos] = smooth_knn_dist(knn_dists, n_neighbors, local_connectivity);
出错 UMAP/fit (line 420)
U.graph = fuzzy_simplicial_set(X, U.n_neighbors, randomState, U.metric,
'metric_kwds', U.metric_kwds,...
出错 UMAP/fit_transform (line 486)
U = fit(U, X, y);
出错 run_umap (line 495)
reduction = umap.fit_transform(inData);
Thanks for the code, it's been very useful! However, I have tried to reduce the model to a 3dimensional system, but I come up with this error:
Error using UMAP/validate_parameters (line 303)
The Java and C methods currently only support reducing to 2 dimensions
Error in UMAP/fit (line 358)
validate_parameters(U);
Error in UMAP/fit_transform (line 470)
U = fit(U, X, y);
Error in run_umap (line 441)
reduction = umap.fit_transform(inData);
When is this option going to be available?
Hi ageorge and Rasmus,
We've looked into the error that you are both receiving. We realized that one of the MATLAB functions that we call, fit.m, actually requires the MATLAB Curve Fitting Toolbox (https://www.mathworks.com/products/curvefitting.html) and we mistakenly did not list this requirement on the download page. If you do not have the Curve Fitting Toolbox installed, this would explain the errors that you are receiving. We have now listed this requirement on the download page.
As a workaround for MATLAB users who do not have the Curve Fitting Toolbox, we have now hardcoded in values for the outputs of find_ab_params.m when the inputs have particular default inputs. In particular, all the examples in the documentation of run_umap.m should now run in the current version 1.2.1 without any problems for users without the Curve Fitting Toolbox.
Hi there
I am really interested in trying this, but I am also running into problems. I tried your updated version here and the one at your homepage. I get the following error (on matlab 2019a)
[reduction,umap] = run_umap(rand(10,100));
ans =
20
ans =
20
java.awt.Point[x=793,y=53] java.awt.Dimension[width=1146,height=1006]
DUDE [UMAP for 10x100
n\_neighbors=\color{blue}30\color{black}, min\_dist=\color{blue}0.3\color{black}, metric=\color{blue}euclidean\color{black},randomize=\color{blue}0\color{black}, labels=\color{blue}0
Undefined function 'fit' for input arguments of type 'function_handle'.
Error in find_ab_params (line 43)
f = fit(xv', yv', curve);
Error in UMAP/fit (line 352)
[U.a, U.b] = find_ab_params(U.spread, U.min_dist);
Error in UMAP/fit_transform (line 470)
U = fit(U, X, y);
Error in run_umap (line 441)
reduction = umap.fit_transform(inData);
Thank you so much for the code. I wonder whether your set of codes includes reembedding of new data to old embedding without modifying the old embeddings. Is init_transform relevant to that purpose?
One question: unless you change input parsing, it seems changing the 'n_epochs' are quite inflexible (I changed by myself). Like n_neighbor, for example, it would be great to have it as a free parameter.
I'm getting the following error when run:
Undefined function 'fit' for input arguments of type 'function_handle'.
Error in find_ab_params (line 43)
f = fit(xv', yv', curve);
Error in UMAP/fit (line 352)
[U.a, U.b] = find_ab_params(U.spread, U.min_dist);
Error in UMAP/fit_transform (line 470)
U = fit(U, X, y);
Error in run_umap (line 441)
reduction = umap.fit_transform(inData);
Hi Joanna,
Sorry for the delayed response to your issue. We have just uploaded a major update (version 1.2.0) that may resolve the issue, so try downloading the latest version and seeing if it is fixed! What was previously line 273 in version 1.1.0 should now be line 391 in 1.2.0.
If you are still receiving an error, would you mind sending us the full text of the exception so that we can investigate it further? We are having trouble reproducing the error. You can email it to us at swmeehan@stanford.edu or connor.meehan@shaw.ca.
If you require a temporary workaround, we recommend downloading our UMAP distribution directly from our Web site at http://cgworkspace.cytogenie.org/GetDown2/demo/umapDistribution.zip. We are able to include some additional code in this distribution that does not meet File Exchange criteria. If an exception occurs with this version, it will switch to running the algorithm in C instead.
I got the same error as Damon. Line 273: nTh=edu.stanford.facs.swing.Umap.EPOCH_REPORTS+3;
How to deal with it?
Thanks, Joanna
How much slower is it than the python implementation?
Thanks for putting this together! Line 273 in run_umap throws an error for me  it looks like it may be calling some local variable. I believe I had everything in the path correctly.
nTh=edu.stanford.facs.swing.Umap.EPOCH_REPORTS+3;