Documentation

Using KDTreeSearcher Objects

Nearest neighbor search using Kd-tree

KDTreeSearcher model objects store results of a nearest neighbors search using the Kd-tree algorithm. Results that you can store include the training data, the distance metric and its parameters, and the maximal number of data points in each leaf node (i.e., the bucket size). The Kd-tree algorithm partitions an n-by-K data set by recursively splitting n points in K-dimensional space into a binary tree. To find the nearest neighbors of a query observation, KDTreeSearcher restricts the training data space to the training observations in the leaf node that the query observation belongs to.

Once you create or train a KDTreeSearcher model object, you can search the stored tree to find all neighboring points to the query data by performing a nearest neighbors search using knnsearch or radius search using rangesearch. The Kd-tree algorithm is particularly useful when:

  • K is relatively small (i.e., K < 10).

  • The training and query sets are not sparse.

  • The training and query sets have many observations.

Examples

collapse all

Grow a Default K d-Tree

Load Fisher's iris data set.

load fisheriris
X = meas;
[n,k] = size(X)
n =

   150


k =

     4

X has 150 observations and 4 predictors.

Grow a 4-dimensional K d-tree using the entire data set as training data.

Mdl = KDTreeSearcher(X)
Mdl = 

  KDTreeSearcher with properties:

       BucketSize: 50
         Distance: 'euclidean'
    DistParameter: []
                X: [150x4 double]

Mdl is a KDTreeSearcher model object, and its properties appear in the Command Window. It contains information about the grown 4-dimensional K d-tree, such as the distance metric. You can alter property values using dot notation

To find the nearest neighbors in X to a batch of query data, pass Mdl and the query data to knnsearch or rangesearch.

Alter Properties of KDTreeSearcher Model

Load Fisher's iris data set.

load fisheriris
X = meas;

Grow a default four-dimensional K d-tree using the entire data set as training data.

Mdl = KDTreeSearcher(X)
Mdl = 

  KDTreeSearcher with properties:

       BucketSize: 50
         Distance: 'euclidean'
    DistParameter: []
                X: [150x4 double]

Specify that the neighbor searcher use the Minkowski metric to compute the distances between the training and query data.

Mdl.Distance = 'minkowski'
Mdl = 

  KDTreeSearcher with properties:

       BucketSize: 50
         Distance: 'minkowski'
    DistParameter: 2
                X: [150x4 double]

Pass Mdl and the query data to either knnsearch or rangesearch to find the nearest neighbors to the points in the query data using the Minkowski distance.

Search for Nearest Neighbors of Query Data Using the Minkowski Distance

Load Fisher's iris data set.

load fisheriris

Remove five irises randomly from the predictor data to use as a query set.

rng(1);                     % For reproducibility
n = size(meas,1);           % Sample size
qIdx = randsample(n,5);     % Indices of query data
tIdx = ~ismember(1:n,qIdx); % Indices of training data
Q = meas(qIdx,:);
X = meas(tIdx,:);

Grow a four-dimensional K d-tree using the training data. Specify to use the Minkowski distance for finding nearest neighbors later.

Mdl = createns(X,'NSMethod','kdtree','Distance','minkowski')
Mdl = 

  KDTreeSearcher with properties:

       BucketSize: 50
         Distance: 'minkowski'
    DistParameter: 2
                X: [145x4 double]

Mdl is a KDTreeSearcher model object. By default, the Minkowski distance exponent is 2.

Find the indices of the training data (X) that are the two nearest neighbors of each point in the query data (Q).

IdxNN = knnsearch(Mdl,Q,'K',2)
IdxNN =

    17     4
     6     2
     1    12
    89    66
   124   100

Each row of NN corresponds to a query data observation, and the column order corresponds to the order of the nearest neighbors. For example, using the Minkowski distance, the second nearest neighbor of Q(3,:) is X(12,:).

Properties

collapse all

DistanceDistance metric'chebychev' | 'cityblock' | 'euclidean' | 'minkowski'

Distance metric used to find nearest neighbors of query points, specified as one of these strings.

ValueDescription
'chebychev'Chebychev distance (maximum coordinate difference)
'cityblock'City block distance
'euclidean'Euclidean distance
'minkowski'Minkowski distance

For more details, see Distance Metrics.

The software does not use the distance metric for training the Kd-tree, so you can alter it after training using dot notation.

Data Types: char

DistParameterDistance metric parameter values[] | positive scalar

Distance metric parameter values, specified as empty ([]) or as a positive scalar.

If Distance is 'minkowski', then:

  • DistParameter is the exponent in the Minkowski distance formula.

  • You can alter DistParameter of a trained KDTreeSearcher model.

Otherwise, DistParameter is [], indicating that the specified distance metric formula has no parameters.

Data Types: single | double

XTraining datanumeric matrix

This property is read only.

Training data that grows the Kd-tree, specified as a numeric matrix. X has n rows, each corresponding to an observation (i.e., an instance or example), and K columns, each corresponding to a predictor or feature.

Data Types: single | double

Object Functions

knnsearch k-nearest neighbors search using Kd-tree or exhaustive search
rangesearch Find all neighbors within specified distance using exhaustive search or Kd-tree

Create Object

Train a KDTreeSearcher model object using KDTreeSearcher or createns.

Was this topic helpful?