implementing k means algorithm on spike sorting data
4 views (last 30 days)
Show older comments
cameron lord
on 11 Feb 2021
Answered: Aditya Patil
on 17 Feb 2021
Hi there, I am trying to implemement my own K means function without using the unbuilt function 'kmeans'.
I started with some complex waveform data and reduced the dimensionality to 2 PC and plotted on a scatter, 3 distinct clusters emerge.
to do k means first i set random centroids within the range of the data - e.g.
k=3
%state the number of clusters%
centroids = min(wav_pca) + (max(wav_pca)-min(wav_pca)).* rand(k,1) %create random centroids in the range of test data%
scatter(wav_pca(:,1),wav_pca(:,2))
hold on
scatter(centroids(:,1),centroids(:,2),'x');
hold off
this gives me starting centroids - howevr i don't this the distribution is as random as i'd like.
then I have to compute the euclidean distance from each point to a centroid and assign it to the one with the shortest distance
for j=1:k
for i=1:length(wav_pca)
distance=sqrt( (centroids(j,1)- wav_pca(i,1))^2 + (centroids(j,2)- wav_pca(i,2)^2) )
end
end
for this I tried to use this for loop but it's not creating the matrix of distances that I need.
then each point must be assigned to it's closest centroid, giving it a cluster ID
the cluster centroids need to be recomputed as an average of all the assigned points and the points reassigned, this needs to be iterated though until the assignments change and I am unsure how to do this.
thanks for all that you can help with, if you need any more info let me know, and apologies for being new to matlab.
0 Comments
Accepted Answer
Aditya Patil
on 17 Feb 2021
Note that the parenthesis is wrong for the second part of the equation. The square is to be taken of the y1 - y2 term, and not just y2(wav_pca in your case).
The correct equation would be
sqrt((centroids(j,1) - wav_pca(i,1))^2 + (centroids(j,2) - wav_pca(i,2))^2)
You can further simplify the code by using vectorization as follows
sqrt((centroids(:,1) - wav_pca(i,1)).^2 + (centroids(:,2) - wav_pca(i,2)).^2)
This will calculate the distance for all centroids, and not just one point at at time. You can also do it other way around, taking distance for all points at a time for each centroid.
Further, the sqrt is unnecessary, as you are only interested in the relative distance, and not the exact value.
(centroids(:,1) - wav_pca(i,1)).^2 + (centroids(:,2) - wav_pca(i,2)).^2
0 Comments
More Answers (0)
See Also
Categories
Find more on Statistics and Machine Learning Toolbox in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!