The procrustes
function analyzes the
distribution of a set of shapes using Procrustes analysis. This analysis method
matches landmark data (geometric locations representing significant features in a
given shape) to calculate the best shape-preserving Euclidean transformations. These
transformations minimize the differences in location between compared landmark data.
Procrustes analysis is also useful in conjunction with multidimensional scaling.
In Construct a Map Using Multidimensional Scaling there is an
observation that the orientation of the reconstructed points is arbitrary. Two
different applications of multidimensional scaling could produce reconstructed
points that are very similar in principle, but that look different because they have
different orientations. The procrustes
function transforms one
set of points to make them more comparable to the other.
The procrustes
function takes two matrices
as input:
The target shape matrix X has dimension
n
× p
, where
n
is the number of landmarks in the shape and
p
is the number of measurements per
landmark.
The comparison shape matrix Y has dimension
n
× q
with
q
≤ p
. If there are fewer
measurements per landmark for the comparison shape than the target shape
(q
< p
), the function adds
columns of zeros to Y, yielding an
n
× p
matrix.
The equation to obtain the transformed shape, Z, is
(1) |
where:
b is a scaling factor that stretches (b > 1) or shrinks (b < 1) the points.
T is the orthogonal rotation and reflection matrix.
c is a matrix with constant values in each column, used to shift the points.
The procrustes
function chooses b,
T, and c to minimize the distance between
the target shape X and the transformed shape Z
as measured by the least squares criterion:
Procrustes analysis is appropriate when all p
measurement
dimensions have similar scales. The analysis would be inaccurate, for example, if
the columns of Z had different scales:
The first column is measured in milliliters ranging from 2,000 to 6,000.
The second column is measured in degrees Celsius ranging from 10 to 25.
The third column is measured in kilograms ranging from 50 to 230.
In such cases, standardize your variables by:
Subtracting the sample mean from each variable.
Dividing each resultant variable by its sample standard deviation.
Use the zscore
function to perform this
standardization.