- Generate a new point (typically not already present in the dataset) such that the sum of the Euclidean distances to the three given points will be smallest?
- Generate a new point (typically not already present in the dataset) such that the sum of the squares of the Euclidean distances to the three points will be smallest?
- Generate a new point (typically not already present in the dataset) such that the new point would be one of the k closest neighbors of each of the three points and such that the sum of the Euclidean distances to the three given points will be smallest?
- Generate a new point (typically not already present in the dataset) such that the new point would be one of the k closest neighbors of each of the three points and such that the sum of the squares of the Euclidean distances to the three points will be smallest?
- Find an existing point already in the dataset such that the sum of the Euclidean distances to the three given points will be smallest?
- Find an existing point already in the dataset such that the sum of the squares of the Euclidean distances to the three given points will be smallest?
- Find an existing point already in the dataset such that point is one of the k closest neighbors for each of the three given points, and such that the sum of the Euclidean distances to the three given points will be smallest?
- Find an existing point already in the dataset such that point is one of the k closest neighbors for each of the three given points, and such that the sum of the squares of the Euclidean distances to the three given points will be smallest?

# Find knn (nearest neighbour) point give a data set

9 views (last 30 days)

Show older comments

monkey_matlab
on 16 Oct 2016

Answered: Walter Roberson
on 17 Oct 2016

Hello, I have this simple data set:

Class1 = [1 0; 0.5 1; 0.5 1.5];

How do I go about finding the "nearest neighbor" point of that data set using the euclidean distance?

That is, generate a point based on the knn algorithm that will be closest to the three points.

The point highlighted in orange should be where the calculated "nearest neighbor" point should be?

##### 5 Comments

### Accepted Answer

Walter Roberson
on 17 Oct 2016

In the below, I will assume that the coordinates of all points in your graph (including the three special points) are stored in x and y -- the J'th point being located at [x(J), y(J)]. Notice that these are X Y coordinates, not row and column coordinates -- in MATLAB, X corresponds to columns and Y corresponds to rows

The below assumes that the three distinguished points have their coordinates in special_x and special_y . The code does not assume that the coordinates are exact copies of what is in x and y, because of the potential for round-off error, but the code will "snap" each [special_x(J), special_y(J)] to its closest match in x, y

k = 5; %the number of closest neighbours eligible to be considered

XY = [x(:), y(:)];

special_XY = [special_x(:), special_y(:)];

%We are finding every point relative to every other point. That includes finding every

%point relative to itself. Which will be distance 0, which will be "nearest". We want

%to exclude the point to itself, so we need to find k+1 nearest neighbors and then

%ignore the first. When you bug-proof this code be sure to enhance it to take into

%account the possibility of duplicate points

[IDX, D] = knnsearch(XY, XY, 'K', k+1);

IDX = IDX(:, 2:end); %discard the point to itself

D = D(:, 2:end); %discard the distance of the point to itself

furthest_k_distance = D(:,end);

%now furthest_k_distance(J) is the distance from [X(J), Y(J)] to its K'th closest

%neighbour. If we insert a new point then it cannot be further away than this or else

%it will not be one of the K-closest neighbours.

%we do not count on the coordinates in special_x and special_y to be exact matches

%so we need to snap them to indices in x and y

special_IDX = knnsearch(XY, special_XY, 'K', 1);

%now we want coordinates of the special points corrected to be exactly in x, y

snapped_special_XY = XY(special_IDX, :);

k_closest_to_special = IDX(special_IDX,:); %indices into XY

special_furthest_distance = furthest_k_distance(special_IDX);

%now if the proposed point to be generated is not within the distance

%special_furthest_distance(J) of the corrected special points

%snapped_special_XY(J,:), then the proposed point would not be one of the K closest

%neighbours of [special_x(J), special_y(J)] and that would be disallowed

After this... you can be somewhat stuck. Consider for example,

x = [ 7 3 6 1 9 5 4 10 5 3 1 6 2 1 6];

y = [ 3 4 8 9 4 10 10 3 8 7 10 7 8 8 4];

special_x = x([4 7 8]);

special_y = y([4 7 8]);

scatter(x(:), y(:), 'b');

hold on

scatter(special_x(:), special_y(:), 70, 'r', 'filled', 'd')

The above shows the regular points in the graph as blue circles, and the three special points as red diamonds.

Then run the code I showed, but with k = 3. You will get

special_furthest_distance = [1.4142135623731; 2.82842712474619; 4.12310562561766]

(which is [sqrt(2), 2*sqrt(2), sqrt(17)])

Now,

viscircles(snapped_special_XY, special_furthest_distance, 'color', 'g')

hold off

this will show you, for each of the special points identified in red, a green circle around them that includes the 3 nearest neighbours. Two of the points will be obvious in each case; the third one is right on the circle.

Now, if the proposed point is not on or inside all of the circles simultaneously, then the proposed point is too far away to be one of the 3 nearest neighbours of the circle it is outside of.

But notice that in this example the large circle in the lower right does not intersect the two smaller circles in the upper left. Therefore no such point is possible -- any point that was close enough to the two special points in the upper left to be one of their three nearest neighbours, cannot possibly be one of the three nearest neighbours of the special point in the bottom right.

====

I was able to show that the centroid will not necessarily be within range for all points. I was also able to find code to get the solution if one is available:

%minimize sum of squares of euclidean distances to special points

G = @(Q)sum(sum((special_XY-Q).^2,2));

%under the constraint that the distances cannot be exceeded

nonlcon = @(Q)deal(sqrt(sum((special_XY-Q).^2,2))-special_furthest_distance,[]);

%use the centroid as the initial guess.

XY0 = mean(snapped_special_XY);

A = []; b = []; Aeq = []; beq = []; lb = []; ub = [];

try

[bestXY, ~, exitflag] = fmincon(G, XY0, A, b, Aeq, beq, lb, ub, nonlcon);

catch

exitflag = -2;

end

if exitflag < 0

fprintf('Search failed, no good trying again\n');

best_x = []; best_y = [];

elseif exitflag ~= 1

fprintf('Search failed, but perhaps a different starting point or more iterations would work\n');

best_x = []; best_y = [];

else

best_x = bestXY(1);

best_y = bestXY(2);

end

##### 0 Comments

### More Answers (0)

### See Also

### Categories

### Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!