how can I use pdist2 function for big data?
5 views (last 30 days)
Show older comments
I want to implement k-means in matlab. my data set is matrix 9,000,000 by 1. when I used Euclidean for finding distance of points, I faced with following error:
Error using pdist2mex
Out of memory. Type HELP MEMORY for your options.
Error in pdist2 (line 343)
D = pdist2mex(X',Y',dist,additionalArg,smallestLargestFlag,radius);
Error in k_means_new (line 38)
dist = pdist2(d,centroids,distance); % distance between all data points and
centroids
I'd like to mention that I used matlab in system with windows 8 and following configuration :
RAM: 8G
CPU: intel core i5-3230M
so would you please help me?
Thanks in advance.
2 Comments
Answers (2)
Image Analyst
on 30 Apr 2016
Chances are you don't need that all in memory at the same time. What are you really trying to do? Like find the two points farthest from each other? If so, a simple double for loop where you're storing only the max distance (one value) instead of an 18 gigapixel array would work. OR you might be able to get what you need by taking a subsample of your original 9 million element array. So tell us the big picture. What are you really trying to accomplish so we can advise you on a better, less memory intensive approach.
Walter Roberson
on 30 Apr 2016
Why are you bothering with euclidean distance between 1 dimension objects? That is the same as abs() of the difference between them
abs(bsxfun(@minus, d, centroids(:).'))
This is only going to be 9000000 * 240 entries, each of 8 bytes, which is only 17.28 gigabytes. An additional working storage of 9000000 * 8 bytes (72 megabytes) would also be required. Just make sure your swap space is set large enough to hold the array, and set your preferences to not prevent large arrays. It should probably only take 5 or 6 hours to compute.
6 Comments
Walter Roberson
on 2 May 2016
For k_means you do not need to retain those distances, you only need to figure out where the closest one is. That takes the long term storage requirement down by a factor of length(centroids)
See Also
Categories
Find more on Text Data Preparation in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!