Main Content

bioinfo.pipeline.block.BwaMEM

Bioinformatics pipeline block to map sequence reads to reference genome

Since R2023a

  • bwamem block icon

Description

A BwaMEM block enables you to map sequencing reads to a reference genome.

The block requires the BWA Support Package for Bioinformatics Toolbox™. If the support package is not installed, then a download link is provided. For details, see Bioinformatics Toolbox Software Support Packages.

Creation

Description

example

b = bioinfo.pipeline.block.BwaMEM creates a BwaMEM block.

b = bioinfo.pipeline.block.BwaMEM(options) also specifies additional options.

b = bioinfo.pipeline.block.BwaMEM(OutFilename=fileName) also specifies the output file name.

b = bioinfo.pipeline.block.BwaMEM(Name=Value) specifies additional options as the property names and values of a BWAMEMOptions object. This object is set as the value of the Options property of the block.

Input Arguments

expand all

Output file name, specified as a string or character vector. The block saves the mapping results to this file.

Data Types: char | string

BwaMEM options, specified as a BWAMEMOptions object, string, or character vector.

If you are specifying a string or character vector, it must be in the bwa native syntax (prefixed by a dash) [1][2].

Data Types: char | string

Name-Value Arguments

Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Note

The following list of arguments is a partial list. For the complete list, refer to the properties of BwaMEMOptions object.

Threshold for determining which hits receive an XA tag in the output SAM file, specified as a nonnegative integer n or two-element numeric vector [n m], where n and m must be nonnegative integers.

If a read has less than n hits with a score greater than 80% of the best score for that read, all hits receive an XA tag in the output SAM file.

When you also specify m, the software returns up to m hits if the hit list contains a hit to an ALT contig.

Data Types: double

Flag to append FASTA or FASTQ comments to the output SAM file, specified as true or false. The comments appear as text after a space in the file header.

Data Types: logical

Properties

expand all

Function to handle errors from the run method of the block, specified as a function handle. The handle specifies the function to call if the run method encounters an error within a pipeline. For the pipeline to continue after a block fails, ErrorHandler must return a structure that is compatible with the output ports of the block. The error handling function is called with the following two inputs:

  • Structure with these fields:

    FieldDescription
    identifierIdentifier of the error that occurred
    messageText of the error message
    indexLinear index indicating which block process failed in the parallel run. By default, the index is 1 because there is only one run per block. For details on how block inputs can be split across different dimensions for multiple run calls, see Bioinformatics Pipeline SplitDimension.

  • Input structure passed to the run method when it fails

Data Types: function_handle

This property is read-only.

Input ports of the block, specified as a structure. The field names of the structure are the names of the block input ports, and the field values are bioinfo.pipeline.Input objects. These objects describe the input port behaviors. The input port names are the expected field names of the input structure that you pass to the block run method.

The BwaMEM block Inputs structure has the following fields:

  • IndexBaseName — Base name of the reference index files. The index files are in the AMB, ANN, BWT, PAC, and SA file formats. For example, the base name of an index file Dmel_chr4.bwt is "Dmel_chr4". This input is a required input that must be satisfied.

  • Reads1File — Name of FASTQ file for the first mate reads or single-end reads. For paired-end data, sequences in Reads1File must correspond read-for-read to sequences in Reads2File. This input is a required input that must be satisfied.

  • Reads2File — Name of FASTQ file for the second mate reads for paired-end data. This input is an optional input.

The default value for each of these inputs is a bioinfo.pipeline.datatypes.Unset object, which means that the input value is not set yet.

Data Types: struct

This property is read-only.

Output ports of the block, specified as a structure. The field names of the structure are the names of the block output ports, and the field values are bioinfo.pipeline.Output objects. These objects describe the output port behaviors. The field names of the output structure returned by the block run method are the same as the output port names.

The BwaMEM block Outputs structure has the field named SAMFile.

Tip

To see the actual location of the output file, first get the results of the block. Then use the unwrap method as shown in this example.

Data Types: struct

BwaMEM options, specified as a BWAMEMOptions object. The default value is a default BwaMEMOptions object.

Output file name, specified as a string. By default, the output file is named as Aligned.sam, which contains the mapping results.

Data Types: string

Object Functions

compilePerform block-specific additional checks and validations
copyCopy array of handle objects
emptyInputsCreate input structure for use with run method
evalEvaluate block object
runRun block object

Examples

collapse all

Map reads to the Drosophila chromosome 4 sequence using the BwaMEM block.

import bioinfo.pipeline.block.*
import bioinfo.pipeline.Pipeline

FC1 = FileChooser(which("Dmel_chr4.fa"));
FC2 = FileChooser(which("SRR6008575_10k_1.fq"));
BI = BwaIndex;
BM = BwaMEM;

P = Pipeline;
addBlock(P,[FC1,FC2,BI,BM]);
connect(P,FC1,BI,["Files","ReferenceFASTAFile"]);
connect(P,BI,BM,["IndexBaseName", "IndexBaseName"]);
connect(P,FC2,BM,["Files", "Reads1File"]);

run(P);
results(P,BM)
ans = 

  struct with fields:

    SAMFile: [1×1 bioinfo.pipeline.datatypes.File]

Call unwrap on SAMFile to see the location of the output file.

unwrap(R.FilteredFASTQFiles)
ans = 

    "C:\PipelineResults\BwaMEM_1\1\Aligned.sam"

References

[1] Li, Heng, and Richard Durbin. “Fast and Accurate Short Read Alignment with Burrows-Wheeler Transform.” Bioinformatics 25, no. 14 (July 15, 2009): 1754–60. https://doi.org/10.1093/bioinformatics/btp324.

[2] Li, Heng, and Richard Durbin. “Fast and Accurate Long-Read Alignment with Burrows–Wheeler Transform.” Bioinformatics 26, no. 5 (March 1, 2010): 589–95. https://doi.org/10.1093/bioinformatics/btp698.

Version History

Introduced in R2023a