Attempt on k-nearest neighbor pdf estimate in 1D

3 views (last 30 days)
I'm trying to write a function for estimating k-nearest neighbours pdf in one dimension. I've been going through this several times already and can't figure out what is wrong. The visualisation shows that my 'pdf' is clearly not how it should be: there's a peak on top of one sample and a sample-wise more dense area is flat. Any advice and corrections appreciated! Here is my code, the test data 't122' is a 1x10 vector i.e. ten 1D samples:
x = [0.553766713954610,0.683388501459509,0.274115313899635,0.586217332036812,0.531876523985898,0.369231170369473,0.456640797769432,0.534262446653865,0.857839693972576,0.776943702988488];
d = size(x,1);
d2 = size(x,2);
% k samples inside the Parzen window
k = 3; % sqrt(N) is a good guess for optimal k
% plotting the samples and the estimated pdf
xAxis = linspace(0,1,100);
plot(xAxis,nnPdf(xAxis,x,k));
title('t122 on the real line with nn-estimated pdf');
hold on;
plot(x,0,'o','MarkerSize',25);
legend(sprintf('%d nearest neighbours pdf',k),'t122');
And here is the function:
% k nearest neighbours 1D pdf-estimator function nnPdf()
% inputs:
% x0 = interval for the pdf
% x = data for which the pdf is estimated
% k = number of samples in every Parzen window
% output:
% V = 1D-pdf estimated with k nearest neighbours
function V = nnPdf(x0,x,k)
v = zeros(length(x0),size(x,2)); % for distances to all samples
V = zeros(length(x0),1); % for distance needed to include k samples
if k > size(x,2)
disp('*Invalid value for k: not so many samples in the data.');
return
end
standardize(x);
for i = 1:length(x0)
for j = 1:size(x,2)
% distance from interval point to all samples
v(i,j) = abs(x0(i)-x(j));
end
% sorted distances so v_ik is the distance for reaching to the
% kth sample from the point x0_i
sort(v,2);
% window size V at point x0_i based on the distance (volume in 1D)
V(i) = (k/size(x,2)) * 1/v(i,k);
end
end
And the outcome:

Answers (1)

Akshat
Akshat on 1 Sep 2023
Hi Jonne,
I have reproduced your code at my end, and I am currently using R2023A version of MATLAB. Kindly note the following differences, and I will paste the code below as well. I have attached the required PDF graph here.
  1. In the "nnPdf" function, you have used “standardize” but it isn’t defined in MATLAB. The function “zscore” can perform the task of standardizing.
  2. While sorting, you haven’t assigned the values back to v, and hence it isn’t working. Code:
[v(i,:), ~] = sort(v(i,:));
  1. The line where you calculate the window size "V(i)" is incorrect. Instead of dividing by "v(i,k)", you should divide by the distance to the k-th nearest neighbor, which is "v(i,k+1)" (since MATLAB indexing starts from 1).
Finally the code which gave me the attached result is:
x = [0.553766713954610,0.683388501459509,0.274115313899635,0.586217332036812,0.531876523985898,0.369231170369473,0.456640797769432,0.534262446653865,0.857839693972576,0.776943702988488];
d = size(x,1);
d2 = size(x,2);
% k samples inside the Parzen window
k = 3; % sqrt(N) is a good guess for optimal k
% plotting the samples and the estimated pdf
xAxis = linspace(0,1,100);
plot(xAxis,nnPdf(xAxis,x,k));
title('t122 on the real line with nn-estimated pdf');
hold on;
plot(x,0,'o','MarkerSize',25);
legend(sprintf('%d nearest neighbours pdf',k),'t122');
% k nearest neighbours 1D pdf-estimator function nnPdf()
% inputs:
% x0 = interval for the pdf
% x = data for which the pdf is estimated
% k = number of samples in every Parzen window
% output:
% V = 1D-pdf estimated with k nearest neighbours
function V = nnPdf(x0,x,k)
v = zeros(length(x0),size(x,2)); % for distances to all samples
V = zeros(length(x0),1); % for distance needed to include k samples
if k > size(x,2)
disp('*Invalid value for k: not so many samples in the data.');
return
end
zscore(x);
for i = 1:length(x0)
for j = 1:size(x, 2)
% distance from interval point to all samples
v(i, j) = abs(x0(i) - x(j));
end
% sorted distances so v_ik is the distance for reaching to the
% kth sample from the point x0_i
[v(i, :), ~] = sort(v(i, :));
% window size V at point x0_i based on the distance (volume in 1D)
V(i) = (k / size(x, 2)) * (1 / v(i, k+1));
end
end
Hope it helps!

Products


Release

R2022a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!