nwalign
Globally align two sequences using Needleman-Wunsch algorithm
Syntax
Description
returns the optimal global alignment score in bits after aligning two sequences
Score
= nwalign(Seq1
,Seq2
)Seq1
and Seq2
. The scale factor used to
calculate the score is provided by ScoringMatrix
.
uses additional options specified by one or more name-value arguments.Score
= nwalign(Seq1
,Seq2
,Name=Value
)
Examples
Perform global alignment of two sequences
Globally align two amino acid sequences using the BLOSUM50
(default) scoring matrix and the default values for the GapOpen
and ExtendGap
properties. Return the optimal global alignment score in bits and the alignment character array.
seq1 = "VSPAGMASGYD"; seq2 = "IPGKASYD"; [Score, Alignment] = nwalign(seq1,seq2)
Score = 7.3333
Alignment = 3x11 char array
'VSPAGMASGYD'
': | | || ||'
'I-P-GKAS-YD'
Specify the PAM250
scoring matrix and a gap open penalty of 5.
[Score,Alignment] = nwalign(seq1,seq2,ScoringMatrix="PAM250",GapOpen=5)
Score = 6
Alignment = 3x11 char array
'VSPAGMASGYD'
': | |:|| ||'
'I-P-GKAS-YD'
Return the Score
in nat units (nats) by specifying a scale factor of log(2)
.
[Score,Alignment] = nwalign(seq1,seq2,Scale=log(2))
Score = 5.0831
Alignment = 3x11 char array
'VSPAGMASGYD'
': | | || ||'
'I-P-GKAS-YD'
Input Arguments
Seq1
— Amino or nucleotide sequence to align
character vector | string scalar | vector of integers | structure
Amino or nucleotide sequence to align, specified as a character vector or string scalar, vector of integers, or structure.
You can specify:
Tip
For help with letter and integer representations of amino acids and nucleotides, see Amino Acid Lookup or Nucleotide Lookup.
Data Types: char
| string
| double
| struct
Seq2
— Amino or nucleotide sequence to align
character vector | string scalar | vector of integers | structure
Amino or nucleotide sequence to align, specified as a character vector or string
scalar, vector of integers, or structure. For details, see
Seq1
.
Data Types: char
| string
| double
| struct
Name-Value Arguments
Specify optional pairs of arguments as
Name1=Value1,...,NameN=ValueN
, where Name
is
the argument name and Value
is the corresponding value.
Name-value arguments must appear after other arguments, but the order of the
pairs does not matter.
Example: [s,a] =
nwalign("HEAGAWGHEE","PAWHEAE",GapOpen=5,ShowScore=true)
specifies to use the
value of 5 as a penalty for gap opening and to show the scoring space and winning
path.
Before R2021a, use commas to separate each name and value, and enclose
Name
in quotes.
Example: [s,a] =
nwalign("HEAGAWGHEE","PAWHEAE",'GapOpen',5,'ShowScore',true)
Alphabet
— Type of sequence
"AA"
(default) | "NT"
Type of sequence, specified as "AA"
(amino acid) or
"NT"
(nucleotide).
Data Types: char
| string
ScoringMatrix
— Scoring matrix for global alignment
"BLOSUM50"
(for amino acid sequences) (default) | character vector | string scalar | numeric matrix
Scoring matrix for the global alignment, specified as a character vector, string scalar, or numeric matrix.
You can specify a scoring matrix name. Valid choices are:
"BLOSUM50"
(default for amino acid sequences)"NUC44"
(default for nucleotide sequences)"BLOSUM62"
"BLOSUM30"
increasing by5
up to"BLOSUM90"
"BLOSUM100"
"PAM10"
increasing by10
up to"PAM500"
"DAYHOFF"
"GONNET"
Note
The above scoring matrices, provided with the software, also include a scale
factor that converts the units of the output score to bits. You can also specify
the Scale
name-value argument to specify an additional scale
factor to convert the output score from bits to another unit.
You can also specify a numeric matrix, such as the one returned by the
blosum
, pam
, dayhoff
,
gonnet
, or nuc44
function.
Note
If you use a scoring matrix that you created or was created by one of these scoring matrix functions, the matrix does not include a scale factor. The output score will be returned in the same units as the scoring matrix. You can use the
Scale
name-value argument to specify a scale factor to convert the output score to another unit.If you need to compile
nwalign
into a standalone application or software component using MATLAB® Compiler™, use a numeric matrix instead of the scoring matrix name.
Data Types: double
| char
| string
Scale
— Scale factor applied to output score
1
(default) | numeric scalar | numeric vector
Scale factor applied to the output score, specified as a numeric scalar or vector.
If you specify a vector, the function returns Score
as a vector
of the same length. By default, there is no scaling or change in the units of the
output score.
Use this argument to control the units of the output scores. For example, if the
output score is initially determined in bits, you can specify
Scale=log(2)
to return the output score in nats instead.
Note
If the
ScoringMatrix
argument also specifies a scale factor, then the function uses it first to scale the output score, then applies the scale factor specified by theScale
argument to rescale the output score.Before comparing alignment scores from multiple alignments, ensure that the scores are in the same units.
Data Types: double
GapOpen
— Penalty for opening gap
8
(default) | positive scalar
Penalty for opening a gap, specified as a positive scalar.
Data Types: double
ExtendGap
— Penalty for extending gap
positive scalar
Penalty for extending a gap using the affine gap penalty scheme, specified as a positive scalar.
If you specify this value, the function uses the affine gap penalty scheme, that
is, it scores the first gap using the GapOpen
value and scores
subsequent gaps using the ExtendGap
value. If you do not specify
this value, the function scores all gaps equally, using the
GapOpen
penalty.
Data Types: double
Glocal
— Flag to perform semiglobal alignment
false
or 0
(default) |
true
or 1
Flag to perform a semiglocal alignment, specified as a numeric or logical
1
(true
) or 0
(false
).
In a semiglobal alignment, gap penalties at the end of the sequences are null.
Showscore
— Flag to display scoring space and winning path of alignment
false
or 0
(default) |
true
or 1
Flag to display the scoring space and winning path of the alignment, specified as
a numeric or logical 1
(true
) or 0
(false
).
The scoring space is a heat map displaying the best scores for all the partial
alignments of two sequences. The color of each (n1,n2
) coordinate
in the scoring space represents the best score for the pairing of subsequences
Seq1(1:n1)
and Seq2(1:n2)
, where
n1
is a position in Seq1
and
n2
is a position in Seq2
. The best score for a
pairing of specific subsequences is determined by scoring all possible alignments of
the subsequences by summing matches and gap penalties.
The winning path is represented by black dots in the scoring space, and it
illustrates the pairing of positions in the optimal global alignment. The color of the
last point (lower right) of the winning path represents the optimal global alignment
score for the two sequences and is the Score
output.
Note
The scoring space visually indicates if there are potential alternate winning paths, which is useful when aligning sequences with big gaps. Visual patterns in the scoring space can also indicate a possible sequence rearrangement.
Output Arguments
Score
— Optimal global alignment score
numeric scalar | numeric vector
Optimal global alignment score, returned as a numeric scalar or vector. It is
returned as a vector when you specify a numeric vector for the
Scale
name-value argument.
Alignment
— Aligned sequences
character array
Aligned sequences, returned as a character array. The first and third rows are
Seq1
and Seq2
, respectively. The second row
shows symbols representing the optimal global alignment for two sequences. The symbol
|
indicates amino acids or nucleotides that match exactly. The
symbol :
indicates amino acids or nucleotides that are related as
defined by the scoring matrix (nonmatches with a zero or positive scoring matrix
value).
Start
— Starting point in each sequence for alignment
[1;1]
Starting point in each sequence for the alignment, returned as a vector of indices.
Because the function performs a global alignment, Start
is always
returned as [1;1]
. The function returns this output to be consistent
with the swalign
function.
References
[1] Durbin, Richard, Sean R. Eddy, Anders Krogh, and Graeme Mitchison. Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. 1st ed. Cambridge University Press, 1998.
Version History
Introduced before R2006a
See Also
aa2int
| aminolookup
| baselookup
| blosum
| dayhoff
| gonnet
| int2aa
| int2nt
| localalign
| multialign
| nt2aa
| nt2int
| nuc44
| pam
| profalign
| seqdotplot
| swalign
MATLAB Command
You clicked a link that corresponds to this MATLAB command:
Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
You can also select a web site from the following list:
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)