problem with hierarchical clustering
2 views (last 30 days)
Show older comments
Hi guys i need your help to solve this problem. I need to make a Hierarchical Clustering for university purposes. I write the code:
a = [.2 0 0 .2 .6 0; 0 0 0 0 1 0; 0 .11 0 .11 .22 .56; 0 0 .25 0 0 .75; .1 0 0 .3 .5 .1; .06 .16 .08 .12 .04 .53]; b = [0 0 0 .2 .4 .4; 0 .24 .06 .06 .29 .35; 0 0 0 0 0 0; .33 .33 0 0 0 .33; .05 0 .09 .27 .45 .14; .21 .11 0 .11 .24 .34]; c = [.3 .1 0 .1 0 .5; 0 0 0 0 0 0; 0 0 0 0 0 0; .21 .12 .18 .24 .06 .18; 0 0 0 0 0 0; .29 .14 .04 .25 .11 .18]; d = [.18 .18 .09 .18 .09 .27; 0 0 .17 .33 0 .5; 0 0 .29 .29 0 .43; 0 .18 .09 .09 0 .64; 0 0 0 0 0 0; .15 .14 .05 .18 .01 .47]; e = [ 0 0 0 0 0 0; .18 .18 .27 .18 .18 0; 0 0 0 0 0 0; 0 0 0 0 0 0; .11 .2 .17 .03 .2 .29; .2 0 0 0 .1 .7]; f = [ 0 0 0 0 0 0; 0 .43 0 .14 .14 .29; 0 0 0 0 0 0; 0 0 0 1 0 0; 0 0 0 0 0 0; .18 .09 .05 .14 .14 .41]; g = [ .21 .07 0 .21 .21 .29; 0 0 .67 0 0 .33; .05 .11 .37 .21 .11 .16; .14 .09 .05 .23 .23 .27; .13 .11 .18 .11 .13 .33; 0 0 .25 .25 .50 .0]; h = [0 0 0 0 0 0; .15 .2 .15 .2 .15 .15; .11 .1 .11 0 0 .67; 0 0 0 0 0 1; .11 .11 .22 .11 .11 .33; .27 .18 .14 .14 .14 .14]; i = [.1 0 .1 .1 .4 .3; 0 0 0 0 0 0; .12 .04 .08 .08 .42 .25; 0 0 0 0 0 0; .32 .09 .12 .03 .27 .17; .17 0 .02 .13 .43 .26]; l = [.29 0 0 .07 0 .64; 0 0 0 0 0 0; 0 0 0 0 0 0; 0 0 0 0 0 1; 0 0 0 1 0 0; .31 .04 .00 .02 .13 .51]; m = [0 .5 0 0 0 .5; 0 0 0 0 0 1; 0 0 0 0 0 1; .09 .18 0 0 .09 .64; 0 0 0 0 0 1; .08 .06 .01 .13 .05 .68]; n = [.26 .09 .00 .04 .17 .43; 0 0 0 0 0 0; 0 0 0 0 0 0; .29 0 0 0 .29 .43; .17 0 0 .17 .17 .5; .20 .04 .02 .1 .24 .41]; o = [0 .25 0 .25 0 .5; 0 0 0 0 0 0;0 0 0 0 0 0; .27 0 0 .14 .09 .5; 0 0 0 0 0 0; .41 .05 .05 .19 .08 .22]; p = [.25 0 0 .17 .08 .5; 0 0 0 0 0 0;0 0 0 0 0 0;1 0 0 .33 0 .57; 0 0 0 0 0 0; .06 .06 00 .25 .00 .62]; q = [.29 .24 0 .1 .14 .24; 1 0 0 0 0 0; 0 0 0 0 0 0; 0 0 0 0 0 0; .33 .04 0 .06 .43 .14; 0 0 0 0 .33 .67]; r = [.22 .04 .09 .04 .35 .26; 0 0 0 0 0 0; .25 .05 .05 .05 .25 .35; 0 0 .14 .14 .33 .33; 0 0 .08 .15 .23 .54; .17 .06 .17 .11 .17 .33]; s = [.37 .21 00 .16 00 .26; 0 .5 0 0 0 .5; 0 0 0 0 0 0; 0 0 0 0 0 0; .45 0 0 .23 .05 .27; .05 .05 .05 0 .18 .68]; t = [ 0 0 0 .67 .17 .17; 0 0 0 0 0 0; 0 0 0 0 0 0;0 0 0 0 0 0; .20 0 0 .2 0 .6; .22 .02 .04 .07 0 .64]; u = [ .08 00 .08 .25 0 .58; 0 0 0 0 0 0; 0 0 0 0 0 0; .08 0 .08 .33 0 .5; 0 0 0 0 0 0; .13 .03 .06 .1 0 .67]; v = [.11 0 .11 0 .33 .44; 0 0 0 0 0 0; 0 0 0 0 0 1; 0 0 0 0 0 0; 0 0 .07 .07 .22 .63; .09 .04 .09 .07 .20 .52];
all_data = [a;b;c;d;e;f;g;h;i;l;m;n;o;p;q;r;s;t;u;v]; size(all_data) Y = pdist(all_data); Z = linkage(Y,'average'); dendrogram(Z);
I wrote twenty matrices, why diagram (dendrogram) returns me over twenty nodes? where am I wrong?
sorry for my bad english.
2 Comments
Brendan Hamm
on 30 Sep 2016
What happens on the first iteration is that 2 points are combined into a new node and then distance is measured from this node to others.
Answers (1)
Kelly Kearney
on 30 Sep 2016
You concatenated 20 6 x 6 matrices. This results in 120 rows in your matrix, hence 120 leaves in your dendrogram (assuming you actually show all the leaves... by default, dendrogram truncates the number of displayed leaves to 30).
2 Comments
Kelly Kearney
on 3 Oct 2016
Sure, if that makes sense for your application. You'll just need to reshape the matrix so all 36 property variables are in a single row for each leaf.
I would start by changing your variable naming system. The hard-coded alphabetical names are just asking for trouble. A 3D matrix or cell array will let you store the same data with much easier indexing.
So instead of
a = [.2 0 0 .2 .6 0; 0 0 0 0 1 0; 0 .11 0 .11 .22 .56; 0 0 .25 0 0 .75; .1 0 0 .3 .5 .1; .06 .16 .08 .12 .04 .53];
b = [0 0 0 .2 .4 .4; 0 .24 .06 .06 .29 .35; 0 0 0 0 0 0; .33 .33 0 0 0 .33; .05 0 .09 .27 .45 .14; .21 .11 0 .11 .24 .34];
c = [.3 .1 0 .1 0 .5; 0 0 0 0 0 0; 0 0 0 0 0 0; .21 .12 .18 .24 .06 .18; 0 0 0 0 0 0; .29 .14 .04 .25 .11 .18];
d = [.18 .18 .09 .18 .09 .27; 0 0 .17 .33 0 .5; 0 0 .29 .29 0 .43; 0 .18 .09 .09 0 .64; 0 0 0 0 0 0; .15 .14 .05 .18 .01 .47];
try this:
ndata = 4;
var = nan(6,6,ndata);
var(:,:,1) = [.2 0 0 .2 .6 0; 0 0 0 0 1 0; 0 .11 0 .11 .22 .56; 0 0 .25 0 0 .75; .1 0 0 .3 .5 .1; .06 .16 .08 .12 .04 .53];
var(:,:,2) = [0 0 0 .2 .4 .4; 0 .24 .06 .06 .29 .35; 0 0 0 0 0 0; .33 .33 0 0 0 .33; .05 0 .09 .27 .45 .14; .21 .11 0 .11 .24 .34];
var(:,:,3) = [.3 .1 0 .1 0 .5; 0 0 0 0 0 0; 0 0 0 0 0 0; .21 .12 .18 .24 .06 .18; 0 0 0 0 0 0; .29 .14 .04 .25 .11 .18];
var(:,:,4) = [.18 .18 .09 .18 .09 .27; 0 0 .17 .33 0 .5; 0 0 .29 .29 0 .43; 0 .18 .09 .09 0 .64; 0 0 0 0 0 0; .15 .14 .05 .18 .01 .47];
(I'm only demonstrating with 4 variables for now, but you should list all 20).
Now you can easily reshape that array into the nleaf x nproperty array needed by pdist (20 x 36, in your example):
Y = pdist(reshape(var,[],ndata)');
Z = linkage(Y,'average');
dendrogram(Z, 0);
See Also
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!