rangesearch

Find all neighbors within specified distance using input data

Syntax

Idx = rangesearch(X,Y,r)

[Idx,D] =
rangesearch(X,Y,r)

[Idx,D] =
rangesearch(X,Y,r,Name,Value)

Description

Idx = rangesearch(X,Y,r) finds all the X points that are within distance r of the Y points. The rows of X and Y correspond to observations, and the columns correspond to variables.

[Idx,D] = rangesearch(X,Y,r) also returns the distances between the Y points and the X points that are within a distance of r.

example

[Idx,D] = rangesearch(X,Y,r,Name,Value) specifies additional options using one or more name-value pair arguments. For example, you can specify the nearest neighbor search method and the distance metric used in the search.

example

Examples

collapse all

Find All Points Within Specified Distance

Open Live Script

Find the X points that are within a Euclidean distance 1.5 of each Y point. Both X and Y are samples of five-dimensional normally distributed variables.

rng('default') % For reproducibility
X = randn(100,5);
Y = randn(10,5);
[Idx,D] = rangesearch(X,Y,1.5)

Idx=10×1 cell array
    {[             25 62 33 99 87 92 16]}
    {[                            92 25]}
    {[ 93 42 31 73 60 28 78 83 48 89 85]}
    {[                            92 41]}
    {[44 7 28 78 75 42 69 31 1 26 83 93]}
    {[       15 31 89 41 27 17 29 60 34]}
    {[                               89]}
    {1×0 double                         }
    {1×0 double                         }
    {1×0 double                         }

D=10×1 cell array
    {[                                   0.9546 1.0987 1.2730 1.3981 1.4140 1.4249 1.4822]}
    {[                                                                      1.4203 1.4558]}
    {[       0.7114 0.7552 1.0081 1.1324 1.1424 1.1637 1.2108 1.3824 1.3944 1.4116 1.4605]}
    {[                                                                      1.1244 1.4672]}
    {[0.7863 0.9326 0.9773 1.0508 1.1722 1.1934 1.3218 1.3623 1.3869 1.3919 1.4814 1.4978]}
    {[                     1.2824 1.2843 1.3342 1.3469 1.4154 1.4237 1.4625 1.4626 1.4744]}
    {[                                                                             1.1739]}
    {1×0 double                                                                           }
    {1×0 double                                                                           }
    {1×0 double                                                                           }

In this case, the last three Y points are more than 1.5 distance away from any X point. X(89,:) is 1.1739 distance away from Y(7,:), and no other X point is within distance 1.5 of Y(7,:). X contains 12 points within distance 1.5 of Y(5,:).

Find Nearest Points in Clustered Data

Open Live Script

Generate 5000 random points from each of three distinct multivariate normal distributions. Shift the means of the distributions so that the randomly generated points are likely to form three separate clusters.

rng('default')  % For reproducibility
N = 5000;
dist = 10;
X = [mvnrnd([0 0],eye(2),N);
     mvnrnd(dist*[1 1],eye(2),N);
     mvnrnd(dist*[-1 -1],eye(2),N)];

For each point in X, find the points in X that are within a radius dist away from the point. For faster computation, specify to keep the indices of the nearest neighbors unsorted. Select the first point in X, and find its nearest neighbors.

Idx = rangesearch(X,X,dist,'SortIndices',false);
x = X(1,:);
nearestPoints = X(Idx{1},:);

Find the values in X that are not the nearest neighbors of x. Display those points in one color and the nearest neighbors of x in a different color. Label the point x with a black, filled circle.

nonNearestIdx = true(size(X,1),1);
nonNearestIdx(Idx{1}) = false;

scatter(X(nonNearestIdx,1),X(nonNearestIdx,2))
hold on
scatter(nearestPoints(:,1),nearestPoints(:,2))
scatter(x(1),x(2),'black','filled')
hold off

Figure contains an axes object. The axes object contains 3 objects of type scatter.

Find Nearest Points Using Custom Distance Function

Open Live Script

Find the patients in the patients data set that are within a certain age and weight range of the patients in Y.

Load the patients data set. The Age values are in years, and the Weight values are in pounds.

load patients
X = [Age Weight];
Y = [20 162; 30 169; 40 168];   % New patients

Create a custom distance function distfun that determines the distance between patients in terms of age and weight. For example, according to distfun, two patients that are one year apart in age and have the same weight are one distance unit apart. Similarly, two patients that have the same age and are five pounds apart in weight are also one distance unit apart.

type distfun.m % Display contents of distfun.m file

function D2 = distfun(ZI,ZJ)
ageDifference = abs(ZI(1)-ZJ(:,1));
weightDifference = abs(ZI(2)-ZJ(:,2));
D2 = ageDifference + 0.2*weightDifference;
end

Note: If you click the button located in the upper-right section of this example and open the example in MATLAB®, then MATLAB opens the example folder. This folder includes the function file distfun.m.

Find the patients in X that are within the distance 2 of the patients in Y.

[Idx,D] = rangesearch(X,Y,2,'Distance',@distfun)

Idx=3×1 cell array
    {1×0 double}
    {1×0 double}
    {[      41]}

D=3×1 cell array
    {1×0 double}
    {1×0 double}
    {[  1.8000]}

The third patient in Y is the only one to have a patient in X within a distance of 2.

Display the Age and Weight values for the nearest patient in X to the patient with age 40 and weight 168.

X(Idx{3},:)

ans = 1×2

    39   164

Input Arguments

collapse all

`X` — Input data
numeric matrix

Input data, specified as an mx-by-n numeric matrix, where each row represents one n-dimensional point. The number of columns n must equal the number of columns in Y.

Data Types: single | double

`Y` — Query points
numeric matrix

Query points, specified as an my-by-n numeric matrix, where each row represents one n-dimensional point. The number of columns n must equal the number of columns in X.

Data Types: single | double

`r` — Search radius
nonnegative scalar

Search radius around each query point, specified as a nonnegative scalar. rangesearch finds all X points (rows) that are within distance r of each Y point. The meaning of distance depends on the 'Distance' name-value pair argument.

Data Types: single | double

Name-Value Arguments

collapse all

Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Before R2021a, use commas to separate each name and value, and enclose Name in quotes.

Example: rangesearch(X,Y,1.4,'Distance','seuclidean','Scale',iqr(X)) specifies to find all the observations in X within distance 1.4 of each observation in Y, using a standardized Euclidean distance scaled by the interquartile range of X.

`NSMethod` — Nearest neighbor search method
`'kdtree'` | `'exhaustive'`

Nearest neighbor search method, specified as the comma-separated pair consisting of 'NSMethod' and one of these values.

Value Description

Value	Description
`'kdtree'`	Create and use a Kd-tree to find nearest neighbors. `'kdtree'` is valid only when the distance metric is one of these options: `'chebychev'` `'cityblock'` `'euclidean'` `'minkowski'`
`'exhaustive'`	Use the exhaustive search algorithm. The software computes the distances from all `X` points to each `Y` point to find nearest neighbors.

'kdtree'

Create and use a Kd-tree to find nearest neighbors. 'kdtree' is valid only when the distance metric is one of these options:

'chebychev'
'cityblock'
'euclidean'
'minkowski'

'exhaustive' Use the exhaustive search algorithm. The software computes the distances from all X points to each Y point to find nearest neighbors.

'kdtree' is the default value when the number of columns in X is less than or equal to 10, X is not sparse, and the distance metric is one of the valid 'kdtree' metrics. Otherwise, the default value is 'exhaustive'.

Example: 'NSMethod','exhaustive'

`Distance` — Distance metric
`'euclidean'` (default) | `'seuclidean'` | `'mahalanobis'` | `'cityblock'` | `'minkowski'` | `'chebychev'` | function handle | ...

Distance metric that rangesearch uses, specified as the comma-separated pair consisting of 'Distance' and one of the values in this table.

Value	Description
`'euclidean'`	Euclidean distance.
`'seuclidean'`	Standardized Euclidean distance. Each coordinate difference between a row in `X` and a query point is scaled by dividing by the corresponding element of the standard deviation computed from `X`, `std(X,'omitnan')`. To specify another scaling, use the `'Scale'` name-value pair argument.
`'mahalanobis'`	Mahalanobis distance, computed using a positive definite covariance matrix `C`. The default value of `C` is the sample covariance matrix of `X`, as computed by `cov(X,'omitrows')`. To specify a different value for `C`, use the `'Cov'` name-value pair argument.
`'cityblock'`	City block distance.
`'minkowski'`	Minkowski distance. The default exponent is `2`. To specify a different exponent, use the `'P'` name-value pair argument.
`'chebychev'`	Chebychev distance (maximum coordinate difference).
`'cosine'`	One minus the cosine of the included angle between observations (treated as vectors).
`'correlation'`	One minus the sample linear correlation between observations (treated as sequences of values).
`'hamming'`	Hamming distance, the percentage of coordinates that differ.
`'jaccard'`	One minus the Jaccard coefficient, the percentage of nonzero coordinates that differ.
`'spearman'`	One minus the sample Spearman's rank correlation between observations (treated as sequences of values).
`@distfun`	Custom distance function handle. A distance function has the form function D2 = distfun(ZI,ZJ) % calculation of distance ... where `ZI` is a 1-by-n vector containing one row of `X` or `Y`. `ZJ` is an m-by-n matrix containing multiple rows of `X` or `Y`. `D2` is an m-by-1 vector of distances, and `D2(j)` is the distance between the observations `ZI` and `ZJ(j,:)`.

For more information, see Distance Metrics.

Example: 'Distance','minkowski'

`P` — Exponent for Minkowski distance metric
`2` (default) | positive scalar

Exponent for the Minkowski distance metric, specified as the comma-separated pair consisting of 'P' and a positive scalar. This argument is valid only if 'Distance' is 'minkowski'.

Example: 'P',4

Data Types: single | double

`Cov` — Covariance matrix for Mahalanobis distance metric
`cov(X,'omitrows')` (default) | positive definite matrix

Covariance matrix for the Mahalanobis distance metric, specified as the comma-separated pair consisting of 'Cov' and a positive definite matrix. This argument is valid only when 'Distance' is 'mahalanobis'.

Example: 'Cov',eye(4)

Data Types: single | double

`Scale` — Scale parameter value for standardized Euclidean distance metric
`std(X,'omitnan')` (default) | nonnegative vector

Scale parameter value for the standardized Euclidean distance metric, specified as the comma-separated pair consisting of 'Scale' and a nonnegative vector. Scale has length equal to the number of columns in X. Each coordinate difference between a row in X and a query point is scaled by the corresponding element of Scale. This argument is valid only when 'Distance' is 'seuclidean'.

Example: 'Scale',iqr(X)

Data Types: single | double

`BucketSize` — Maximum number of data points in leaf node of Kd-tree
`50` (default) | positive integer scalar

Maximum number of data points in the leaf node of the Kd-tree, specified as the comma-separated pair consisting of 'BucketSize' and a positive integer scalar. This argument is valid only when NSMethod is 'kdtree'.

Example: 'BucketSize',20

Data Types: single | double

`SortIndices` — Flag to sort returned indices according to distance
`true` (`1`) (default) | `false` (`0`)

Flag to sort returned indices according to distance, specified as the comma-separated pair consisting of 'SortIndices' and either true (1) or false (0).

For faster performance when Y contains many observations that have many nearest points in X, you can set SortIndices to false. In this case, rangesearch returns the indices of the nearest points in no particular order. When SortIndices is true, the function arranges the indices of the nearest points in ascending order by distance.

Example: 'SortIndices',false

Data Types: logical

Output Arguments

collapse all

`Idx` — Indices of nearest points
cell array of numeric vectors

Indices of nearest points, returned as a cell array of numeric vectors.

Idx is an my-by-1 cell array, where my is the number of rows in Y. The vector Idx{j} contains the indices of points (rows) in X whose distances to Y(j,:) are not greater than r. If SortIndices is true, then rangesearch arranges the indices in ascending order by distance.

`D` — Distances of nearest points to query points
cell array of numeric vectors

Distances of the nearest points to the query points, returned as a cell array of numeric vectors.

D is an my-by-1 cell array, where my is the number of rows in Y. D{j} contains the distance values between Y(j,:) and the points (rows) in X(Idx{j},:). If SortIndices is true, then rangesearch arranges the distances in ascending order.

Tips

For a fixed positive real value r, rangesearch finds all the X points that are within a distance r of each Y point. To find the k points in X that are nearest to each Y point, for a fixed positive integer k, use knnsearch.
rangesearch does not save a search object. To create a search object, use createns.

Algorithms

For an overview of the kd-tree algorithm, see k-Nearest Neighbor Search Using a Kd-Tree.
The exhaustive search algorithm finds the distance from each point in X to each point in Y.

Alternative Functionality

If you set the rangesearch function 'NSMethod' name-value pair argument to the appropriate value ('exhaustive' for an exhaustive search algorithm or 'kdtree' for a Kd-tree algorithm), then the search results are equivalent to the results obtained by conducting a distance search using the rangesearch object function. Unlike the rangesearch function, the rangesearch object function requires an ExhaustiveSearcher or KDTreeSearcher model object.

Extended Capabilities

expand all

Tall Arrays
Calculate with arrays that have more rows than fit in memory.

The rangesearch function supports tall arrays with the following usage notes and limitations:

If X is a tall array, then Y cannot be a tall array. Similarly, if Y is a tall array, then X cannot be a tall array.

For more information, see Tall Arrays.

C/C++ Code Generation
Generate C and C++ code using MATLAB® Coder™.

Usage notes and limitations:

For code generation, the default value of the 'NSMethod' name-value pair argument is 'exhaustive' when the number of columns in X is greater than 7.
The value of the 'Distance' name-value pair argument must be a compile-time constant and cannot be a custom distance function.
The 'SortIndices' name-value pair argument is not supported. The output arguments are always sorted.
Names in name-value arguments must be compile-time constants. For example, to allow a user-defined exponent for the Minkowski distance in the generated code, include {coder.Constant('Distance'),coder.Constant('Minkowski'),coder.Constant('P'),0} in the -args value of codegen (MATLAB Coder).
The sorted order of tied distances in the generated code can be different from the order in MATLAB^® due to numerical precision.
When rangesearch uses the kd-tree search algorithm, and the code generation build type is a MEX function, codegen (MATLAB Coder) generates a MEX function using Intel^® Threading Building Blocks (TBB) for parallel computation. Otherwise, codegen generates code using parfor (MATLAB Coder).
- MEX function for the kd-tree search algorithm — codegen generates an optimized MEX function using Intel TBB for parallel computation on multicore platforms. You can use the MEX function to accelerate MATLAB algorithms. For details on Intel TBB, see https://www.intel.com/content/www/us/en/developer/tools/oneapi/onetbb.html.
  If you generate the MEX function to test the generated code of the parfor version, you can disable the usage of Intel TBB. Set the ExtrinsicCalls property of the MEX configuration object to false. For details, see coder.MexCodeConfig (MATLAB Coder).
- MEX function for the exhaustive search algorithm and standalone C/C++ code for both algorithms — The generated code of rangesearch uses parfor (MATLAB Coder) to create loops that run in parallel on supported shared-memory multicore platforms in the generated code. If your compiler does not support the Open Multiprocessing (OpenMP) application interface or you disable OpenMP library, MATLAB Coder™ treats the parfor-loops as for-loops. To find supported compilers, see Supported Compilers. To disable OpenMP library, set the EnableOpenMP property of the configuration object to false. For details, see coder.CodeConfig (MATLAB Coder).
rangesearch returns integer-type (int32) indices in generated standalone C/C++ code. Therefore, the function allows for strict single-precision support when you use single-precision inputs. For MEX code generation, the function still returns double-precision indices to match the MATLAB behavior.

For more information on code generation, see Introduction to Code Generation for Statistics and Machine Learning Functions and Overview of Code Generation Using MATLAB Coder (MATLAB Coder).

Version History

Introduced in R2011b

rangesearch

Syntax

Description

Examples

Find All Points Within Specified Distance

Find Nearest Points in Clustered Data

Find Nearest Points Using Custom Distance Function

Input Arguments

`X` — Input data
numeric matrix

`Y` — Query points
numeric matrix

`r` — Search radius
nonnegative scalar

Name-Value Arguments

`NSMethod` — Nearest neighbor search method
`'kdtree'` | `'exhaustive'`

`Distance` — Distance metric
`'euclidean'` (default) | `'seuclidean'` | `'mahalanobis'` | `'cityblock'` | `'minkowski'` | `'chebychev'` | function handle | ...

`P` — Exponent for Minkowski distance metric
`2` (default) | positive scalar

`Cov` — Covariance matrix for Mahalanobis distance metric
`cov(X,'omitrows')` (default) | positive definite matrix

`Scale` — Scale parameter value for standardized Euclidean distance metric
`std(X,'omitnan')` (default) | nonnegative vector

`BucketSize` — Maximum number of data points in leaf node of Kd-tree
`50` (default) | positive integer scalar

`SortIndices` — Flag to sort returned indices according to distance
`true` (`1`) (default) | `false` (`0`)

Output Arguments

`Idx` — Indices of nearest points
cell array of numeric vectors

`D` — Distances of nearest points to query points
cell array of numeric vectors

Tips

Algorithms

Alternative Functionality

Extended Capabilities

Tall Arrays
Calculate with arrays that have more rows than fit in memory.

C/C++ Code Generation
Generate C and C++ code using MATLAB® Coder™.

Version History

See Also

Topics

rangesearch

Syntax

Description

Examples

Find All Points Within Specified Distance

Find Nearest Points in Clustered Data

Find Nearest Points Using Custom Distance Function

Input Arguments

X — Input data numeric matrix

Y — Query points numeric matrix

r — Search radius nonnegative scalar

Name-Value Arguments

NSMethod — Nearest neighbor search method 'kdtree' | 'exhaustive'

Distance — Distance metric 'euclidean' (default) | 'seuclidean' | 'mahalanobis' | 'cityblock' | 'minkowski' | 'chebychev' | function handle | ...

P — Exponent for Minkowski distance metric 2 (default) | positive scalar

Cov — Covariance matrix for Mahalanobis distance metric cov(X,'omitrows') (default) | positive definite matrix

Scale — Scale parameter value for standardized Euclidean distance metric std(X,'omitnan') (default) | nonnegative vector

BucketSize — Maximum number of data points in leaf node of Kd-tree 50 (default) | positive integer scalar

SortIndices — Flag to sort returned indices according to distance true (1) (default) | false (0)

Output Arguments

Idx — Indices of nearest points cell array of numeric vectors

D — Distances of nearest points to query points cell array of numeric vectors

Tips

Algorithms

Alternative Functionality

Extended Capabilities

Tall Arrays Calculate with arrays that have more rows than fit in memory.

C/C++ Code Generation Generate C and C++ code using MATLAB® Coder™.

Version History

See Also

Topics

`X` — Input data
numeric matrix

`Y` — Query points
numeric matrix

`r` — Search radius
nonnegative scalar

`NSMethod` — Nearest neighbor search method
`'kdtree'` | `'exhaustive'`

`Distance` — Distance metric
`'euclidean'` (default) | `'seuclidean'` | `'mahalanobis'` | `'cityblock'` | `'minkowski'` | `'chebychev'` | function handle | ...

`P` — Exponent for Minkowski distance metric
`2` (default) | positive scalar

`Cov` — Covariance matrix for Mahalanobis distance metric
`cov(X,'omitrows')` (default) | positive definite matrix

`Scale` — Scale parameter value for standardized Euclidean distance metric
`std(X,'omitnan')` (default) | nonnegative vector

`BucketSize` — Maximum number of data points in leaf node of Kd-tree
`50` (default) | positive integer scalar

`SortIndices` — Flag to sort returned indices according to distance
`true` (`1`) (default) | `false` (`0`)

`Idx` — Indices of nearest points
cell array of numeric vectors

`D` — Distances of nearest points to query points
cell array of numeric vectors

Tall Arrays
Calculate with arrays that have more rows than fit in memory.

C/C++ Code Generation
Generate C and C++ code using MATLAB® Coder™.