You are now following this question
- You will see updates in your followed content feed.
- You may receive emails, depending on your communication preferences.
How can I reassign clusters based on similarity or any other method?
23 Comments
Hi @ Med Future,
Can you share your code on this form?
Also, please elaborate when you mentioned,
- I have already tried the K means clustering but it does not provide a results*
Hi @Med Future ,
I have modified your code shared on the form and it is capable of reassigning clusters based on similarity.
% Define cell1 and cell2
cell1 = [1, 2, 3; 4, 5, 6]; % Example data for cell1
cell2 = [7, 8, 9; 10, 11, 12]; % Example data for cell2
% Normalize the rows of the cells for cosine similarity
cell1_norm = cell1 ./ sqrt(sum(cell1.^2, 2));
cell2_norm = cell2 ./ sqrt(sum(cell2.^2, 2));
% Compute the cosine similarity matrix
similarity_matrix = cell1_norm * cell2_norm';
% Average similarity score
similarity_score = mean(similarity_matrix(:));
% Display the similarity score
fprintf('Average Cosine Similarity Score: %f\n', similarity_score);
% Define the threshold for similarity to reassign clusters
similarity_threshold = 0.9;
if similarity_score > similarity_threshold
% Combine the data from both cells
combinedData = [cell1; cell2];
% Apply K-means clustering
k = 2; % Define the number of clusters 'k'
[idx, C] = kmeans(combinedData, k);
% Calculate centroid distances for cluster reassignment
centroid_distances = pdist(C); % Calculate pairwise distances between centroids
avg_distance = mean(centroid_distances); % Calculate the average centroid distance
% Reassign clusters if centroid distances exceed a certain threshold
centroid_threshold = 5; % Define a threshold for centroid distances
if avg_distance > centroid_threshold
% Calculate the pairwise distances between data points and centroids distances = pdist2(combinedData, C);
% Find the minimum distance for each data point
[~, min_indices] = min(distances, [], 2);
% Update the cluster assignments in 'idx' based on the minimum distances
idx = min_indices;
end
% Iterate over the clusters and check for different features
unique_clusters = unique(idx); % Get the unique cluster labels
num_clusters = numel(unique_clusters); % Get the number of clusters
for i = 1:num_clusters
cluster_data = combinedData(idx == unique_clusters(i), :); % Get the data points for the current cluster
% Check for different features within the cluster
if any(range(cluster_data) > 1)
% Split the cluster into subclusters with similar features
subclusters = kmeans(cluster_data, 2);
% Update the cluster assignments in 'idx' for the subclusters
idx(idx == unique_clusters(i)) = subclusters + max(idx);
end
end
% Merge clusters with similar features
unique_clusters = unique(idx); % Get the updated unique cluster labels
num_clusters = numel(unique_clusters); % Get the updated number of clusters
for i = 1:num_clusters
cluster_data = combinedData(idx == unique_clusters(i), :); % Get the data points for the current cluster
% Check for similar features with other clusters
for j = i+1:num_clusters
other_cluster_data = combinedData(idx == unique_clusters(j), :); % Get the data points for the other cluster
% Check for similar features using a threshold
if max(pdist2(cluster_data, other_cluster_data)) < 1
% Merge the clusters into a single cluster
idx(idx == unique_clusters(j)) = unique_clusters(i);
end
end
end
% Display the updated clustering results
figure;
gscatter(combinedData(:,1), combinedData(:,2), idx);
title('Modified Clustering Results');
% Save the modified clustering results
save('modified_clustered_data.mat', 'idx', 'combinedData');
else
fprintf('Similarity score is less than %f, not reassigning clusters.\n', similarity_threshold);
end
I will go through the code step by step to let you understand how it achieves this. First, the code defines two cells, cell1 and cell2, which contain example data for clustering. These cells represent the clusters that need to be reassigned based on similarity.
cell1 = [1, 2, 3; 4, 5, 6]; % Example data for cell1
cell2 = [7, 8, 9; 10, 11, 12]; % Example data for cell2
Next, the code normalizes the rows of the cells using the cosine similarity measure. This normalization step ensures that the similarity between clusters is calculated accurately.
cell1_norm = cell1 ./ sqrt(sum(cell1.^2, 2));
cell2_norm = cell2 ./ sqrt(sum(cell2.^2, 2));
After normalizing the cells, the code computes the cosine similarity matrix between cell1_norm and cell2_norm. The similarity matrix represents the pairwise similarity between each data point in cell1 and cell2.
similarity_matrix = cell1_norm * cell2_norm';
To determine the average similarity score between the clusters, the code calculates the mean of all elements in the similarity matrix.
similarity_score = mean(similarity_matrix(:));
The code then displays the average cosine similarity score.
fprintf('Average Cosine Similarity Score: %f\n', similarity_score);
Next, the code defines a similarity threshold. If the similarity score is greater than the threshold, the clusters will be reassigned based on similarity.
similarity_threshold = 0.9;
The code checks if the similarity score exceeds the threshold. If it does, the clusters will be reassigned.
if similarity_score > similarity_threshold
% Combine the data from both cells
combinedData = [cell1; cell2];
% Apply K-means clustering
k = 2; % Define the number of clusters 'k'
[idx, C] = kmeans(combinedData, k);
The code then calculates the centroid distances between the clusters. If the average centroid distance exceeds a certain threshold, the clusters will be reassigned.
centroid_distances = pdist(C); % Calculate pairwise distances between centroids
avg_distance = mean(centroid_distances); % Calculate the average centroid distance
% Reassign clusters if centroid distances exceed a certain threshold
centroid_threshold = 5; % Define a threshold for centroid distances
if avg_distance > centroid_threshold
% Calculate the pairwise distances between data points and centroids
distances = pdist2(combinedData, C);
% Find the minimum distance for each data point
[~, min_indices] = min(distances, [], 2);
% Update the cluster assignments in 'idx' based on the minimum distances
idx = min_indices;
end
The code then iterates over the clusters and checks for different features within each cluster. If a cluster has different features, it will be split into subclusters with similar features.
unique_clusters = unique(idx); % Get the unique cluster labels
num_clusters = numel(unique_clusters); % Get the number of clusters
for i = 1:num_clusters
cluster_data = combinedData(idx == unique_clusters(i), :); % Get the data points for the current cluster
% Check for different features within the cluster
if any(range(cluster_data) > 1)
% Split the cluster into subclusters with similar features
subclusters = kmeans(cluster_data, 2);
% Update the cluster assignments in 'idx' for the subclusters
idx(idx == unique_clusters(i)) = subclusters + max(idx);
end
end
After splitting clusters with different features, the code merges clusters with similar features. It iterates over the clusters and compares their features using a threshold. If the features are similar, the clusters will be merged into a single cluster.
unique_clusters = unique(idx); % Get the updated unique cluster labels
num_clusters = numel(unique_clusters); % Get the updated number of clusters
for i = 1:num_clusters
cluster_data = combinedData(idx == unique_clusters(i), :); % Get the data points for the current cluster
% Check for similar features with other clusters
for j = i+1:num_clusters
other_cluster_data = combinedData(idx == unique_clusters(j), :); % Get the data points for the other cluster
% Check for similar features using a threshold
if max(pdist2(cluster_data, other_cluster_data)) < 1
% Merge the clusters into a single cluster
idx(idx == unique_clusters(j)) = unique_clusters(i);
end
end
end
Finally, the code displays the updated clustering results by plotting the data points with their assigned clusters.
% Display the updated clustering results
figure;
gscatter(combinedData(:,1), combinedData(:,2), idx);
title('Modified Clustering Results');
% Save the modified clustering results
save('modified_clustered_data.mat', 'idx', 'combinedData');
else
fprintf('Similarity score is less than %f, not reassigning clusters.\n', similarity_threshold);
end
In nutshell, this modified code is capable of reassigning clusters based on similarity. It combines clusters with the same features, splits clusters with different features, and merges clusters with similar features. The code utilizes the K-means clustering algorithm and cosine similarity to achieve this. Please see attached plot along with test results.
Hope, this answers your question.
Answers (1)
19 Comments
See Also
Tags
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!An Error Occurred
Unable to complete the action because of changes made to the page. Reload the page to see its updated state.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
You can also select a web site from the following list
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom(English)
Asia Pacific
- Australia (English)
- India (English)
- New Zealand (English)
- 中国
- 日本Japanese (日本語)
- 한국Korean (한국어)