msalign
Align peaks in signal to reference peaks
Syntax
IntensitiesOut
=
msalign(X
, Intensities
, RefX
)
... = msalign(..., 'Rescaling', RescalingValue
,
...)
... = msalign(..., 'Weights', WeightsValue
,
...)
... = msalign(..., 'MaxShift', MaxShiftValue
,
...)
... = msalign(..., 'WidthOfPulses', WidthOfPulsesValue
,
...)
... = msalign(..., 'WindowSizeRatio', WindowSizeRatioValue
,
...)
... = msalign(..., 'Iterations', IterationsValue
,
...)
... = msalign(..., 'GridSteps', GridStepsValue
,
...)
... = msalign(..., 'SearchSpace', SearchSpaceValue
,
...)
... = msalign(..., 'ShowPlot', ShowPlotValue
,
...)
[IntensitiesOut, RefXOut
]
= msalign(..., 'Group', GroupValue
,
...)
Input Arguments
X | Vector of separation-unit values for a set of signals with
peaks. The number of elements in the vector equals the number of rows
in the matrix Intensities . The separation
unit can quantify wavelength, frequency, distance, time, or m/z depending
on the instrument that generates the signal data. |
Intensities | Matrix of intensity values for a set of peaks that share the
same separation-unit range. Each row corresponds to a separation-unit
value, and each column corresponds to either a set of signals with
peaks or a retention time. The number of rows equals the number of
elements in vector X . |
RefX | Vector of separation-unit values of known reference masses
in a sample signal. Tip For reference peaks, select compounds that are not expected to have significant shifts among the different signals. For example, in mass spectrometry, select compounds that do not undergo structural transformation, such as phosphorylation. Doing so increases the accuracy of your alignment and lets you detect compounds that exhibit structural transformations among the sample signal. |
RescalingValue | Controls the rescaling of X . Choices
are true (default) or false .
When false , the output signal is aligned only to
the reference peaks by using constant shifts. By default, msalign estimates
a rescaling factor, unless RefX contains
only one reference peak. |
WeightsValue | Vector of positive values, with the same number of elements
as RefX . The default vector is ones(size( . |
MaxShiftValue | Two-element vector, in which the first element is negative
and the second element is positive, that specifies the lower and upper
limits of a range, in separation units, relative to each peak. No
peak shifts beyond these limits. Default is [-100 100] . |
WidthOfPulsesValue | Positive value that specifies the width, in separation units,
for all the Gaussian pulses used to build the correlating synthetic
signal. The point of the peak where the Gaussian pulse reaches 60.65 %
of its maximum is set to the width specified by WidthOfPulsesValue .
Default is 10 . |
WindowSizeRatioValue | Positive value that specifies a scaling factor that determines
the size of the window around every alignment peak. The synthetic
signal is compared to the input signal only within these regions,
which saves computation time. The size of the window is given in separation-units
by WidthOfPulsesValue * WindowSizeRatioValue .
Default is 2.5 , which means at the limits of the
window, the Gaussian pulses have a value of 4.39 %
of their maximum. |
IterationsValue | Positive integer that specifies the number of refining iterations.
At every iteration, the search grid is scaled down to improve the
estimates. Default is 5 . |
GridStepsValue | Positive integer that specifies the number of steps for the
search grid. At every iteration, the search area is divided by .
Default is 20 . |
SearchSpaceValue | Character vector or string that specifies the type of search space. Choices are:
|
ShowPlotValue | Controls the display of a plot of an original and aligned signal
over the reference masses specified by RefX .
Choices are true , false , or I ,
an integer specifying the index of a signal in Intensities .
If you set to true , the first signal in Intensities is
plotted. Default is:
|
GroupValue | Controls the creation of RefXOut ,
a new vector of separation-unit values to be used as reference masses
for aligning the peaks. This vector is created by adjusting the values
in RefX , based on the sample data from
multiple signals in Intensities , such that
the overall shifting and scaling of the peaks is minimized. Choices
are true or false (default).Tip Set |
Output Arguments
IntensitiesOut | Matrix of intensity values for a set of peaks that share the same separation-unit range. Each row corresponds to a separation-unit value, and each column corresponds to either a set of signals with peaks or a retention time. The intensity values represent a shifting and scaling of the data. |
RefXOut | Vector of separation-unit values of reference masses, calculated
from RefX and the sample data from multiple
signals in Intensities , when you set GroupValue to true . |
Description
Tip
Use the following syntaxes with data from any separation technique that produces signal data, such as spectroscopy, NMR, electrophoresis, chromatography, or mass spectrometry.
aligns
the peaks in raw, noisy signal data, represented by IntensitiesOut
=
msalign(X
, Intensities
, RefX
)Intensities
and X
,
to reference peaks, provided by RefX
. First,
it creates a synthetic signal from the reference peaks using Gaussian
pulses centered at the separation-unit values specified by RefX
.
Then, it shifts and scales the separation-unit scale to find the maximum
alignment between the input signals and the synthetic signal. (It
uses an iterative multiresolution grid search until it finds the best
scale and shift factors for each signal.) Once the new separation-unit
scale is determined, the corrected signals are created by resampling
their intensities at the original separation-unit values, creating IntensitiesOut
,
a vector or matrix of corrected intensity values. The resampling method
preserves the shape of the peaks.
Tip
The msalign
function works best with three to five reference peaks that you
know will appear in the signal. If you use a single reference peak (internal standard), there is
a possibility of aligning sample peaks to the incorrect reference peaks as
msalign
both scales and shifts the X
vector. If
using a single reference peak, you might need to only shift the X
vector. To do this, use
.IntensitiesOut
=
interp1(X
, Intensities
,
X
-(ReferencePeak
-ExperimentalPeak
))
... = msalign(..., '
calls PropertyName
', PropertyValue
,
...)msalign
with optional properties
that use property name/property value pairs. You can specify one or
more properties in any order. Each PropertyName
must
be enclosed in single quotation marks and is case insensitive. These
property name/property value pairs are as follows:
... = msalign(..., 'Rescaling',
controls the rescaling of RescalingValue
,
...)X
.
Choices are true
(default) or false
.
When false
, the output signal is aligned only to
the reference peaks by using constant shifts. By default, msalign
estimates
a rescaling factor, unless RefX
contains
only one reference peak.
... = msalign(..., 'Weights',
specifies the relative weight for each mass in WeightsValue
,
...)RefX
,
the vector of reference separation-unit values. WeightsValue
is
a vector of positive values, with the same number of elements as RefX
.
The default vector is ones(size(
,
which means each reference peak is weighted equally, so that more
intense reference peaks have a greater effect in the alignment algorithm.
If you have a less intense reference peak, you can increase its weight
to emphasize it more in the alignment algorithm.RefX
))
... = msalign(..., 'MaxShift',
specifies the lower and upper limits of the range,
in separation units, relative to each peak. No peak shifts beyond
these limits. MaxShiftValue
,
...)MaxShiftValue
is a two-element
vector, in which the first element is negative and the second element
is positive. Default is [-100 100]
.
Note
Use these values to tune the robustness of the algorithm. Ideally, you should keep the range within the maximum expected shift. If you try to correct larger shifts by increasing the limits, you increase the possibility of picking incorrect peaks to align to the reference masses.
... = msalign(..., 'WidthOfPulses',
specifies the width, in separation units, for all
the Gaussian pulses used to build the correlating synthetic signal.
The point of the peak where the Gaussian pulse reaches WidthOfPulsesValue
,
...)60.65
%
of its maximum is set to the width you specify with WidthOfPulsesValue
.
Choices are any positive value. Default is 10
. WidthOfPulsesValue
may
also be a function handle. The function is evaluated at the respective
separation-unit values and returns a variable width for the pulses.
Its evaluation should give reasonable values from 0
to max(abs(Range))
;
otherwise, the function returns an error.
Note
Tuning the spread of the Gaussian pulses controls a tradeoff between robustness (wider pulses) and precision (narrower pulses). However, the spread of the pulses is unrelated to the shape of the observed peaks in the signal. The purpose of the pulse spread is to drive the optimization algorithm.
... = msalign(..., 'WindowSizeRatio',
specifies a scaling factor that determines the size
of the window around every alignment peak. The synthetic signal is
compared to the sample signal only within these regions, which saves
computation time. The size of the window is given in separation units
by WindowSizeRatioValue
,
...)
.
Choices are any positive value. Default is WidthOfPulsesValue
* WindowSizeRatioValue
2.5
,
which means at the limits of the window, the Gaussian pulses have
a value of 4.39
% of their maximum.
... = msalign(..., 'Iterations',
specifies the number of refining iterations. At every
iteration, the search grid is scaled down to improve the estimates.
Choices are any positive integer. Default is IterationsValue
,
...)5
.
... = msalign(..., 'GridSteps',
specifies the number of steps for the search grid.
At every iteration, the search area is divided by GridStepsValue
,
...)
.
Choices are any positive integer. Default is GridStepsValue
^220
.
... = msalign(..., 'SearchSpace',
specifies the type of search space. Choices are:SearchSpaceValue
,
...)
'regular'
— Default. Evenly spaced lattice.'latin'
— Random Latin hypercube withGridStepsValue
^2 samples.
... = msalign(..., 'ShowPlot',
controls the display of a plot of an original and
aligned signal over the reference masses specified by ShowPlotValue
,
...)RefX
.
Choices are true
, false
, or I
,
an integer specifying the index of a signal in Intensities
.
If set to true
, the first signal in Intensities
is
plotted. Default is:
false
— When return values are specified.true
— When return values are not specified.
[
controls the creation of IntensitiesOut, RefXOut
]
= msalign(..., 'Group', GroupValue
,
...)RefXOut
,
a new vector of separation-unit values to use as reference masses
for aligning the peaks. This vector is created by adjusting the values
in RefX
, based on the sample data from
multiple signals in Intensities
, such that
the overall shifting and scaling of the peaks is minimized. Choices
are true
or false
(default).
Tip
Set GroupValue
to true
only
if Intensities
contains data for a large
number of signals, and you are not confident of the separation-unit
values used for your reference peaks in RefX
.
Leave GroupValue
set to false
if
you are confident of the separation-unit values used for your reference
peaks in RefX
.
Examples
Load a MAT-file, included with the Bioinformatics Toolbox™ software, that contains sample data, reference masses, and parameter data for synthetic peak width.
load sample_lo_res R = [3991.4 4598 7964 9160]; W = [60 100 60 100];
Display a color image of the mass spectra before alignment.
msheatmap(MZ_lo_res,Y_lo_res,'markers',R,'range',[3000 10000]) title('before alignment')
Align spectra with reference masses and display a color image of mass spectra after alignment.
YA = msalign(MZ_lo_res,Y_lo_res,R,'weights',W); msheatmap(MZ_lo_res,YA,'markers',R,'range',[3000 10000]) title('after alignment')
It is not recommended to use the msalign
function
if you have only one reference peak. Instead, use the following procedure,
which shifts the X
input vector, but does
not scale it.
Load sample data and view the first sample spectrum.
load sample_lo_res MZ = MZ_lo_res; Y = Y_lo_res(:,1); msviewer(MZ, Y)
Use the tall peak around 4000 m/z as the reference peak. To determine the reference peak's m/z value, click , and then click-drag to zoom in on the peak. Right-click in the center of the peak, and then click Add Marker to label the peak with its m/z value.
Shift a spectrum by the difference between
RP
, the known reference mass of 4000 m/z, andSP
, the experimental mass of 4051.14 m/z.RP = 4000; SP = 4051.14; YOut = interp1(MZ, Y, MZ-(RP-SP));
Plot the original spectrum in red and the shifted spectrum in blue and zoom in on the reference peak.
plot(MZ,Y,'r',MZ,YOut,'b:') xlabel('Mass/Charge (M/Z)') ylabel('Relative Intensity') legend('Y','YOut') axis([3600 4800 -2 60])
References
[1] Monchamp, P., Andrade-Cetto, L., Zhang, J.Y., and Henson, R. (2007) Signal Processing Methods for Mass Spectrometry. In Systems Bioinformatics: An Engineering Case-Based Approach, G. Alterovitz and M.F. Ramoni, eds. (Artech House Publishers).
Version History
Introduced before R2006a
See Also
mspalign
| msbackadj
| msdotplot
| msheatmap
| mslowess
| msnorm
| mspeaks
| msresample
| msppresample
| mssgolay
| msviewer
Topics
- Mass Spectrometry and Bioanalytics
- Preprocessing Raw Mass Spectrometry Data
- Visualizing and Preprocessing Hyphenated Mass Spectrometry Data Sets for Metabolite and Protein/Peptide Profiling
- Differential Analysis of Complex Protein and Metabolite Mixtures Using Liquid Chromatography/Mass Spectrometry (LC/MS)