Check if 2D points are evenly distributed

46 views (last 30 days)
Hi,
I have a plane and a few data points (2 to 20) on that plane. Now I want to know how evenly they are distributed.
For example if I have 4 points and one is in every corner it would be perfect. If all 4 points are in the same spot it would be worst case.
I am not really interested in the perfect way to calculate this. I want to do this very often so my main focus is on speed.
I tried to find a solution to this problem but I think I don't know what I am even looking for. I hope someone can give me the right idea.
Best Regards

Accepted Answer

John D'Errico
John D'Errico on 1 Mar 2015
Edited: John D'Errico on 1 Mar 2015
Your goal is a measure of the how evenly points are distributed in what I assume is a rectangular domain. The problem is, you need to explain your goals here. How do you define goodness?
Suppose we had two points that were perfectly coincident, but all other points in the rectangle were perfectly evenly distributed. Would this be worse than having 3 points that were all moderately close together, and the remainder of the points all again perfectly evenly distributed? Or perhaps you might have two separate pairs that are each close together, but not coincident?
In order to define some measure of even-ness, it seems that you need to explain your goals far more clearly. How do we know that something is bad?
My suggestion, having now heard more of what you want, is to compute the interpoint distance matrix between all points in the set. I've got a tool on the file exchange that does this (IPDM), but it is trivial to compute, especially for such a small set of points.
For example, given a set of points in the unit square, lets see what we can do.
N = 20;
% the first column here is x, the second column y.
P = rand(N,2);
% D will be an NxN matrix.
D = sqrt(bsxfun(@minus,P(:,1),P(:,1).').^2 + bsxfun(@minus,P(:,2),P(:,2).').^2);
% compute the minimum interpoint distance, for
% each point. This is essentially the distance for
% each point to its nearest neighbor.
Dmin = min(D + realmax*eye(N));
Dmin
Dmin =
Columns 1 through 11
0.20294 0.11087 0.29086 0.11087 0.0066339 0.10327 0.13755 0.26768 0.23623 0.244 0.25469
Columns 12 through 20
0.10327 0.0066339 0.10841 0.13755 0.2456 0.13776 0.20294 0.10841 0.11378
As you can see, the minimum distance will in general appear twice in that list, because point 5 is closest to point 13, and point 13 is closest to point 5.
Alternatively, if you have the stats toolbox, you can use the function pdist. It too gives us the set of interpoint distances, although it returns only what is essentially the upper triangle of that distance matrix.
min(pdist(P))
ans =
0.0066339
One idea you might consider is to compare that overall minimum distance to the some measure of a best case possible. For this, think about the idea of circle packing in a rectangular region. While I could probably do some online research on the best case circle packing for N circles in a unit square, a simple measure is given by a ratio of areas.
% area of the rectangle, length*height
Arect = 1*1;
% for N circles, we assign each circle 1/N of that area,
% therefore compute an effective radius, IF we could pack
% N circles perfectly with no overlap.
Crad = sqrt(Arect/N/pi)
Crad =
0.12616
So if we could have N perfectly evenly distributed points (here, N=20 over a unit square) then the minimum interpoint distance could at best be roughly TWICE that radius. (This argument has some flaws in it, which are worst for a small number of points, where we would want to consider the optimum circle packing in some domain. I could spend some time and improve that estimate, were it of interest. This would take only a wee bit of algebra.)
The point is, we can now compare the minimum interpoint distance to that value. A truly even sampling of points will have no pair of points that are significantly closer together than twice Crad. As you can see, the random sampling I generated above is flawed in that respect, since
min(Dmin)/(2*Crad)
ans =
0.026292
is quite small. This is actually to be expected from the "uniform" random sampling, for anyone who has seen the birthday paradox in action. In fact, the closest point from such a random sampling will be much smaller than that predicted for an "even" sampling as I am choosing to define it.
As I said, if it is of interest, I can give you a better value to compare against.
Edit: I decided that I had some time to do this right.
Given a rectangle of length L, height H, with N points to be distributed around the rectangle. We want to find a measure of the best possible circle packing inside that rectangle, where the center of any circle can go as far as a corner of the rectangle, but no circle is allowed to overlap. Assume the circles have radius r.
syms N L H r
rmax = solve((L+2*r)*(H + 2*r) - N*(2*r)^2)
rmax =
(H + L - (H^2 + L^2 - 2*H*L + 4*H*L*N)^(1/2))/(4*(N - 1))
(H + L + (H^2 + L^2 - 2*H*L + 4*H*L*N)^(½))/(4*(N - 1))
I'll let the reader figure out how the expression I solve for arises. It seems clear to me. :)
As it turns out, only the second of those solutions is positive. I'll substitute 20 for N, and do it over the unit square, so L=H=1 here.
rmax = rmax(2);
subs(rmax,{N,L,H},[20 1 1])
ans =
80^(1/2)/76 + 1/38
vpa(ans)
ans =
0.14400357776314682612679861414375
So twice that radius would correspond to a maximally evenly packed set of 20 points over the unit square.
There are still some flaws with the above argument, wherein a truly optimal circle packing for N points would be a somewhat better result, but all we want is a basic measure to compare against. So it should be entirely adequate, although it will be only approximate for very small sets of points, like 2. So in fact, for two points in the unit square, the best packing would allow circles of radius sqrt(2)/2 = 0.7071...
subs(rmax,{N,L,H},[2 1 1])
ans =
8^(1/2)/4 + 1/2
vpa(ans)
ans =
1.2071067811865475244008443621048
It is only a simple measure as I said, and we need not have it be perfect.
  1 Comment
Patrick
Patrick on 2 Mar 2015
Thanks a lot for this detailed answer. I suppose I will go with this one for now. I already tried something like this but as you pointed out we need something to compare against which I was missing. Now I have that.

Sign in to comment.

More Answers (2)

Image Analyst
Image Analyst on 1 Mar 2015
Professor Adrian Baddeley is a leading researcher in spatial statistics and a Science Fellow with CSIRO and University of Western Australia. He wrote the bible on spatial statistics. He gives a workshop on "Analysing spatial point patterns in R". His book and papers may still be available online.
In his papers you'll see that a pattern or points can go from regular and periodic (like an array or grid), to random and uniform (Poisson distribution like raindrops on the ground), to clustered and clumped.
And of course there are states in between. The degree to which the pattern matches one of those classifications can be measured, and he gives algorithms for doing that. For example he says "One simple diagnostic for dependence between points is a Morishita plot. The spatial domain is divided into quadrats, and the chi square statistic based on quadrat counts is computed. The quadrats are repeatedly subdivided. The Morishita plot shows the chi square statistic against the linear size of the quadrats."
He uses real world cases like locations of trees and ants. You can even have multiple populations like 4 species of trees and 2 species of ants. Besides answering questions like how uniformly distributed are the individuals, you can also answer questions like is tree species 1 preferentially near tree species 2? Or are black ant hills preferentially near red ant hills? If you're interested in spatial statistics, I suggest you look up his books and papers.
  4 Comments
Patrick
Patrick on 2 Mar 2015
Edited: Patrick on 2 Mar 2015
I agree, this is very interesting. But I am not sure yet I want to put that much work in this project. But it is probably the best solution.
Edit: Seems like I can only select one as an answer. I wanted to mark this one too...
Joe
Joe on 25 Jun 2015
This paper was very useful and shows the equations to use to figure out how evenly points are distributed.
Paper titled: "Clustering, Randomness, and Regularity: Spatial Distributions and Human Performance on the Traveling Salesperson Problem and Minimum Spanning Tree Problem" from Purdue

Sign in to comment.


Star Strider
Star Strider on 28 Feb 2015
I don’t know how involved you want to get. I’m not aware of a statistical test for ‘evenly distributed’, so I created one that might work for you. It looks at the row and column distributions of the points on the plot, does a linear regression on that and does a simple statistical test (confidence intervals) on the slope. If the slope is not statistically different from zero (that is, it is not needed in the regression, so the confidence limits include zero), you can assume the row and column indices are essentially ‘evenly distributed’. (I used the normal distribution 95% confidence intervals, ±1.96. A t-distribution would be more accurate.)
Experiment with this to fit your application, interpret the results as you wish:
N = 9; % Number Of Points
P = randi(N, 20, 2); % Create Points
bins = [1:N]; % Bins For ‘histc’
den = [ones(N,1) bins']; % Regression Denominator
XTX = den'*den; % CovB Denominator
Kr = histc(P(:,1), bins); % Row Counts
Br = den\Kr; % Linear Regression On Row Counts
Kc = histc(P(:,2), bins); % Column Counts
Bc = den\Kc; % Linear Regression On Column Counts
df = N-2; % Degrees-Of-Freedom: Points-Parameters
CovBr = var(Kr-den*Br)./XTX; % Covariance Matrix For ‘Br’
CovBc = var(Kc-den*Bc)./XTX; % Covariance Matrix For ‘Bc’
MrCI = [-1.96 1.96]*CovBr(1,1) + Br(2); % Row-Slope Confidence Intervals
McCI = [-1.96 1.96]*CovBc(1,1) + Bc(2); % Column-Slope Confidence Intervals
figure(1)
plot(P(:,1), P(:,2), 'bp')
grid
axis([0 10 0 10])
  6 Comments
Patrick
Patrick on 1 Mar 2015
Hi & thanks
I didn't work through your answers yet. Just wanted to add the missing information:
The data points are "measured in real world". They are points found in a picture so I have a x and y value for each point. The points can't be exactly in the same spot but they can be extremely close to each other (like 0.1 pixels close). Sorry I should have added this in my post.
Star Strider
Star Strider on 1 Mar 2015
OK. I’ll add ‘image processing’ to the tags to see if Image Analyst has any thoughts.

Sign in to comment.

Products

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!