Classical Multidimensional Scaling

This example shows how to use `cmdscale` to perform classical (metric) multidimensional scaling, also known as principal coordinates analysis.

`cmdscale` takes as an input a matrix of inter-point distances and creates a configuration of points. Ideally, those points are in two or three dimensions, and the Euclidean distances between them reproduce the original distance matrix. Thus, a scatter plot of the points created by `cmdscale` provides a visual representation of the original distances.

As a very simple example, you can reconstruct a set of points from only their inter-point distances. First, create some four dimensional points with a small component in their fourth coordinate, and reduce them to distances.

```rng default; % For reproducibility X = [normrnd(0,1,10,3),normrnd(0,.1,10,1)]; D = pdist(X,'euclidean');```

Next, use `cmdscale` to find a configuration with those inter-point distances. `cmdscale` accepts distances as either a square matrix, or, as in this example, in the vector upper-triangular form produced by `pdist`.

`[Y,eigvals] = cmdscale(D);`

`cmdscale` produces two outputs. The first output, `Y`, is a matrix containing the reconstructed points. The second output, `eigvals`, is a vector containing the sorted eigenvalues of what is often referred to as the "scalar product matrix," which, in the simplest case, is equal to `Y*Y'`. The relative magnitudes of those eigenvalues indicate the relative contribution of the corresponding columns of `Y` in reproducing the original distance matrix `D` with the reconstructed points.

```format short g [eigvals eigvals/max(abs(eigvals))]```
```ans = 10×2 35.41 1 11.158 0.31511 1.6894 0.04771 0.1436 0.0040553 2.9678e-15 8.3812e-17 1.7158e-15 4.8454e-17 1.5224e-15 4.2995e-17 -1.4626e-15 -4.1303e-17 -1.7759e-15 -5.0153e-17 -8.4529e-15 -2.3871e-16 ```

If `eigvals` contains only positive and zero (within round-off error) eigenvalues, the columns of `Y` corresponding to the positive eigenvalues provide an exact reconstruction of `D`, in the sense that their inter-point Euclidean distances, computed using `pdist`, for example, are identical (within round-off) to the values in `D`.

`maxerr4 = max(abs(D - pdist(Y))) % Exact reconstruction`
```maxerr4 = 5.107e-15 ```

If two or three of the eigenvalues in `eigvals` are much larger than the rest, then the distance matrix based on the corresponding columns of `Y` nearly reproduces the original distance matrix `D`. In this sense, those columns form a lower-dimensional representation that adequately describes the data. However it is not always possible to find a good low-dimensional reconstruction.

`maxerr3 = max(abs(D - pdist(Y(:,1:3)))) % Good reconstruction in 3D`
```maxerr3 = 0.043142 ```
`maxerr2 = max(abs(D - pdist(Y(:,1:2)))) % Poor reconstruction in 2D`
```maxerr2 = 0.98315 ```

The reconstruction in three dimensions reproduces `D` very well, but the reconstruction in two dimensions has errors that are of the same order of magnitude as the largest values in `D`.

`max(max(D))`
```ans = 5.8974 ```

Often, `eigvals` contains some negative eigenvalues, indicating that the distances in `D` cannot be reproduced exactly. That is, there might not be any configuration of points whose inter-point Euclidean distances are given by `D`. If the largest negative eigenvalue is small in magnitude relative to the largest positive eigenvalues, then the configuration returned by `cmdscale` might still reproduce `D` well.