# createns

Create nearest neighbor searcher object

## Description

creates either an `NS`

= createns(`X`

)`ExhaustiveSearcher`

or `KDTreeSearcher`

model object using the
*n*-by-*K* numeric matrix of the training data
`X`

.

specifies additional options using one or more name-value pair arguments. For
example, you can specify `NS`

= createns(`X`

,`Name,Value`

)`NSMethod`

to determine which type of
object to create.

## Examples

### Train Default Exhaustive Nearest Neighbor Searcher

Load Fisher's iris data set.

```
load fisheriris
X = meas;
[n,k] = size(X)
```

n = 150

k = 4

`X`

has 150 observations and 4 predictors.

Prepare an exhaustive nearest neighbor searcher using the entire data set as training data.

Mdl1 = ExhaustiveSearcher(X)

Mdl1 = ExhaustiveSearcher with properties: Distance: 'euclidean' DistParameter: [] X: [150x4 double]

`Mdl1`

is an `ExhaustiveSearcher`

model object, and its properties appear in the Command Window. The object contains information about the trained algorithm, such as the distance metric. You can alter property values using dot notation.

Alternatively, you can prepare an exhaustive nearest neighbor searcher by using `createns`

and specifying `'exhaustive'`

as the search method.

Mdl2 = createns(X,'NSMethod','exhaustive')

Mdl2 = ExhaustiveSearcher with properties: Distance: 'euclidean' DistParameter: [] X: [150x4 double]

`Mdl2`

is also an `ExhaustiveSearcher`

model object, and it is equivalent to `Mdl1`

.

To search `X`

for the nearest neighbors to a batch of query data, pass the `ExhaustiveSearcher`

model object and the query data to `knnsearch`

or `rangesearch`

.

### Grow Default *K*d-Tree

Grow a four-dimensional *K*d-tree that uses the Euclidean distance.

Load Fisher's iris data set.

```
load fisheriris
X = meas;
[n,k] = size(X)
```

n = 150

k = 4

`X`

has 150 observations and 4 predictors.

Grow a four-dimensional *K*d-tree using the entire data set as training data.

Mdl1 = KDTreeSearcher(X)

Mdl1 = KDTreeSearcher with properties: BucketSize: 50 Distance: 'euclidean' DistParameter: [] X: [150x4 double]

`Mdl1`

is a `KDTreeSearcher`

model object, and its properties appear in the Command Window. The object contains information about the grown four-dimensional *K*d-tree, such as the distance metric. You can alter property values using dot notation.

Alternatively, you can grow a *K*d-tree by using `createns`

.

Mdl2 = createns(X)

Mdl2 = KDTreeSearcher with properties: BucketSize: 50 Distance: 'euclidean' DistParameter: [] X: [150x4 double]

`Mdl2`

is also a `KDTreeSearcher`

model object, and it is equivalent to `Mdl1`

. Because `X`

has four columns and the default distance metric is Euclidean, `createns`

creates a `KDTreeSearcher`

model by default.

To find the nearest neighbors in `X`

to a batch of query data, pass the `KDTreeSearcher`

model object and the query data to `knnsearch`

or `rangesearch`

.

### Grow *K*d-Tree Using Minkowski Distance Metric

Grow a *K*d-tree that uses the Minkowski distance with an exponent of five.

Load Fisher's iris data set. Create a variable for the petal dimensions.

```
load fisheriris
X = meas(:,3:4);
```

Grow a *K*d-tree. Specify the Minkowski distance with an exponent of five.

Mdl = createns(X,'Distance','minkowski','P',5)

Mdl = KDTreeSearcher with properties: BucketSize: 50 Distance: 'minkowski' DistParameter: 5 X: [150x2 double]

Because `X`

has two columns and the distance metric is Minkowski, `createns`

creates a `KDTreeSearcher`

model object by default*.*

### Search for Nearest Neighbors of Query Data Using Mahalanobis Distance

Create an exhaustive searcher object by using the `createns`

function. Pass the object and query data to the `knnsearch`

function to find *k*-nearest neighbors.

Load Fisher's iris data set.

`load fisheriris`

Remove five irises randomly from the predictor data to use as a query set.

rng('default'); % For reproducibility n = size(meas,1); % Sample size qIdx = randsample(n,5); % Indices of query data X = meas(~ismember(1:n,qIdx),:); Y = meas(qIdx,:);

Prepare an exhaustive nearest neighbor searcher using the training data. Specify the Mahalanobis distance for finding nearest neighbors.

Mdl = createns(X,'Distance','mahalanobis')

Mdl = ExhaustiveSearcher with properties: Distance: 'mahalanobis' DistParameter: [4x4 double] X: [145x4 double]

Because the distance metric is Mahalanobis, `createns`

creates an `ExhaustiveSearcher`

model object by default.

The software uses the covariance matrix of the predictors (columns) in the training data for computing the Mahalanobis distance. To display this value, use `Mdl.DistParameter`

.

Mdl.DistParameter

`ans = `*4×4*
0.6547 -0.0368 1.2320 0.5026
-0.0368 0.1914 -0.3227 -0.1193
1.2320 -0.3227 3.0671 1.2842
0.5026 -0.1193 1.2842 0.5800

Find the indices of the training data (`Mdl.X`

) that are the two nearest neighbors of each point in the query data (`Y`

).

`IdxNN = knnsearch(Mdl,Y,'K',2)`

`IdxNN = `*5×2*
5 6
98 95
104 128
135 65
102 115

Each row of `IdxNN`

corresponds to a query data observation. The column order corresponds to the order of the nearest neighbors with respect to ascending distance. For example, based on the Mahalanobis metric, the second nearest neighbor of `Y(3,:)`

is `X(128,:)`

.

## Input Arguments

`X`

— Training data

numeric matrix

Training data, specified as a numeric matrix. `X`

has
*n* rows, each corresponding to an observation (that
is, an instance or example), and *K* columns, each
corresponding to a predictor (that is, a feature).

**Data Types: **`single`

| `double`

### Name-Value Arguments

Specify optional pairs of arguments as
`Name1=Value1,...,NameN=ValueN`

, where `Name`

is
the argument name and `Value`

is the corresponding value.
Name-value arguments must appear after other arguments, but the order of the
pairs does not matter.

*
Before R2021a, use commas to separate each name and value, and enclose*
`Name`

*in quotes.*

**Example: **`NS = createns(X,'Distance','mahalanobis')`

creates an
`ExhaustiveSearcher`

model object that uses the Mahalanobis
distance metric when searching for nearest neighbors.

**For Exhaustive and**

*K*d-Tree Nearest Neighbor Searchers`NSMethod`

— Nearest neighbor search method

`'kdtree'`

| `'exhaustive'`

Nearest neighbor search method used to define the type of object
created, specified as the comma-separated pair consisting of
`'NSMethod'`

and `'kdtree'`

or
`'exhaustive'`

.

`'kdtree'`

—`createns`

creates a`KDTreeSearcher`

model object using the*K*d-tree algorithm.`'exhaustive'`

—`createns`

creates an`ExhaustiveSearcher`

model object using the exhaustive search algorithm.

The default value is `'kdtree'`

when these three
conditions are true:

Otherwise, the default value is
`'exhaustive'`

.

**Example: **`'NSMethod','exhaustive'`

`Distance`

— Distance metric

`'euclidean'`

(default) | character vector or string scalar of distance metric | function handle

Distance metric used when you call `knnsearch`

or `rangesearch`

to find nearest
neighbors for future query points, specified as a character vector or
string scalar of the distance metric name, or a function handle.

For both types of nearest neighbor searchers,
`createns`

supports these distance
metrics.

Value | Description |
---|---|

`'chebychev'` | Chebychev distance (maximum coordinate difference) |

`'cityblock'` | City block distance |

`'euclidean'` | Euclidean distance |

`'minkowski'` | Minkowski distance. The default exponent is 2. To specify a different exponent, use the
`'P'` name-value
argument. |

If `createns`

uses the exhaustive search
algorithm (`'NSMethod'`

is
`'exhaustive'`

), then
`createns`

also supports these distance
metrics.

Value | Description |
---|---|

`'correlation'` | One minus the sample linear correlation between observations (treated as sequences of values) |

`'cosine'` | One minus the cosine of the included angle between observations (treated as row vectors) |

`'fasteuclidean'` | Euclidean distance computed by using an alternative
algorithm that saves time when the number of predictors
is at least 10. In some cases, this faster algorithm can
reduce accuracy. Algorithms starting with
`'fast'` do not support sparse
data. For details, see Algorithms. |

`'fastseuclidean'` | Standardized Euclidean distance computed by using an
alternative algorithm that saves time when the number of
predictors is at least 10. In some cases, this faster
algorithm can reduce accuracy. Algorithms starting with
`'fast'` do not support sparse
data. For details, see Algorithms. |

`'hamming'` | Hamming distance, which is the percentage of coordinates that differ |

`'jaccard'` | One minus the Jaccard coefficient, which is the percentage of nonzero coordinates that differ |

`'mahalanobis'` | Mahalanobis distance |

`'seuclidean'` | Standardized Euclidean distance |

`'spearman'` | One minus the sample Spearman's rank correlation between observations (treated as sequences of values) |

If `createns`

uses the exhaustive search algorithm
(`'NSMethod'`

is
`'exhaustive'`

), then you can also specify a function
handle for a custom distance metric by using `@`

(for
example, `@distfun`

). A custom distance function must:

Have the form

`function D2 = distfun(ZI,ZJ)`

.Take as arguments:

A 1-by-

*K*vector`ZI`

containing a single row from`X`

or from the query points`Y`

, where*K*is the number of columns in`X`

.An

*m*-by-*K*matrix`ZJ`

containing multiple rows of`X`

or`Y`

, where*m*is a positive integer.

Return an

*m*-by-1 vector of distances`D2`

, where`D2(`

is the distance between the observations)`j`

`ZI`

and`ZJ(`

.,:)`j`

For more details, see Distance Metrics.

**Example: **`'Distance','minkowski'`

**Data Types: **`char`

| `string`

| `function_handle`

`P`

— Exponent for Minkowski distance metric

`2`

(default) | positive scalar

Exponent for the Minkowski distance metric, specified as the comma-separated pair consisting
of `'P'`

and a positive scalar. This argument is valid only if
`'Distance'`

is `'minkowski'`

.

**Example: **`'P',3`

**Data Types: **`single`

| `double`

**For Exhaustive Nearest Neighbor Searchers**

`Cov`

— Covariance matrix for Mahalanobis distance metric

`cov(X,'omitrows')`

(default) | positive definite matrix

Covariance matrix for the Mahalanobis distance metric, specified as the comma-separated pair
consisting of `'Cov'`

and a
*K*-by-*K* positive definite matrix, where
*K* is the number of columns in `X`

. This
argument is valid only if `'Distance'`

is
`'mahalanobis'`

.

**Example: **`'Cov',eye(3)`

**Data Types: **`single`

| `double`

`Scale`

— Scale parameter value for standardized Euclidean distance metric

`std(X,'omitnan')`

(default) | nonnegative numeric vector

Scale parameter value for the standardized Euclidean distance metric, specified as the
comma-separated pair consisting of `'Scale'`

and a nonnegative numeric
vector of length *K*, where *K* is the number of
columns in `X`

. The software scales each difference between the
training and query data using the corresponding element of `Scale`

.
This argument is valid only if `'Distance'`

is
`'seuclidean'`

.

**Example: **`'Scale',quantile(X,0.75) - quantile(X,0.25)`

**Data Types: **`single`

| `double`

**For Nearest Neighbor Searchers Using**

*K*d-Tree`BucketSize`

— Maximum number of data points in each leaf node

`50`

(default) | positive integer

Maximum number of data points in each leaf node of the
*K*d-tree, specified as the comma-separated pair
consisting of `'BucketSize'`

and a positive
integer.

This argument is valid only when you create a
`KDTreeSearcher`

model object.

**Example: **`'BucketSize',10`

**Data Types: **`single`

| `double`

## Output Arguments

`NS`

— Nearest neighbor searcher

`ExhaustiveSearcher`

model object | `KDTreeSearcher`

model object

Nearest neighbor searcher, returned as an `ExhaustiveSearcher`

model object
or a `KDTreeSearcher`

model
object.

Once you create a nearest neighbor searcher model object, you can find the
neighboring points in the training data to the query data by performing a
nearest neighbor search using `knnsearch`

or a radius search
using `rangesearch`

.

## Algorithms

### Fast Euclidean Distance Algorithm

The values of the `Distance`

argument that begin `fast`

(such as `'fasteuclidean'`

and `'fastseuclidean'`

)
calculate Euclidean distances using an algorithm that uses extra memory to save
computational time. This algorithm is named "Euclidean Distance Matrix Trick" in Albanie
[1] and elsewhere. Internal
testing shows that this algorithm saves time when the number of predictors is at least 10.
Algorithms starting with `'fast'`

do not support sparse data.

To find the matrix D of distances between all the points *x _{i}* and

*x*, where each

_{j}*x*has

_{i}*n*variables, the algorithm computes distance using the final line in the following equations:

$$\begin{array}{c}{D}_{i,j}^{2}=\Vert {x}_{i}-{x}_{j}{\Vert}^{2}\\ ={(}^{{x}_{i}}({x}_{i}-{x}_{j})\\ =\Vert {x}_{i}{\Vert}^{2}-2{x}_{i}^{T}{x}_{j}+\Vert {x}_{j}{\Vert}^{2}.\end{array}$$

The matrix $${x}_{i}^{T}{x}_{j}$$ in the last line of the equations is called the *Gram
matrix*. Computing the set of squared distances is faster, but slightly less
numerically stable, when you compute and use the Gram matrix instead of computing the
squared distances by squaring and summing. For a discussion, see Albanie [1].

## References

[1] Albanie, Samuel. *Euclidean Distance Matrix Trick.* June, 2019. Available at https://www.robots.ox.ac.uk/%7Ealbanie/notes/Euclidean_distance_trick.pdf.

## Version History

**Introduced in R2010a**

### R2023a: Fast Euclidean distance using a cache

The `'fasteuclidean'`

and `'fastseuclidean'`

distance metrics accelerate the computation of Euclidean distances by using a cache
and a different algorithm (see Algorithms).

## Open Example

You have a modified version of this example. Do you want to open this example with your edits?

## MATLAB Command

You clicked a link that corresponds to this MATLAB command:

Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.

Select a Web Site

Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .

You can also select a web site from the following list:

## How to Get Best Site Performance

Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.

### Americas

- América Latina (Español)
- Canada (English)
- United States (English)

### Europe

- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)

- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)