run
Description
run(
runs the pipeline using the structure pipeline
,inputStruct
)inputStruct
as an input. This
syntax is one of three ways
to satisfy input ports by matching the field names of inputStruct
to
unconnected inport port names in the pipeline
.
run(___,
uses
additional options specified by one or more name-value arguments for any of the above
syntaxes.Name=Value
)
Examples
Create a Simple Pipeline to Plot Sequence Quality Data
Import the Pipeline and block objects needed for the example.
import bioinfo.pipeline.Pipeline import bioinfo.pipeline.block.*
Create a pipeline.
qcpipeline = Pipeline;
Select an input FASTQ file using a FileChooser
block.
fastqfile = FileChooser(which("SRR005164_1_50.fastq"));
Create a SeqFilter
block.
sequencefilter = SeqFilter;
Define the filtering threshold value. Specifically, filter out sequences with a total of more than 10 low-quality bases, where a base is considered a low-quality base if its quality score is less than 20.
sequencefilter.Options.Threshold = [10 20];
Add the blocks to the pipeline.
addBlock(qcpipeline,[fastqfile,sequencefilter]);
Connect the output of the first block to the input of the second block. To do so, you need to first check the input and output port names of the corresponding blocks.
View the Outputs
(port of the first block) and Inputs
(port of the second block).
fastqfile.Outputs
ans = struct with fields:
Files: [1×1 bioinfo.pipeline.Output]
sequencefilter.Inputs
ans = struct with fields:
FASTQFiles: [1×1 bioinfo.pipeline.Input]
Connect the Files
output port of the fastqfile
block to the FASTQFiles
port of sequencefilter
block.
connect(qcpipeline,fastqfile,sequencefilter,["Files","FASTQFiles"]);
Next, create a UserFunction
block that calls the seqqcplot
function to plot the quality data of the filtered sequence data. In this case, inputFile
is the required argument for the seqqcplot
function. The required argument name can be anything as long as it is a valid variable name.
qcplot = UserFunction("seqqcplot",RequiredArguments="inputFile",OutputArguments="figureHandle");
Alternatively, you can also use dot notation to set up your UserFunction
block.
qcplot = UserFunction; qcplot.RequiredArguments = "inputFile"; qcplot.Function = "seqqcplot"; qcplot.OutputArguments = "figureHandle";
Add the block.
addBlock(qcpipeline,qcplot);
Check the port names of sequencefilter
block and qcplot
block.
sequencefilter.Outputs
ans = struct with fields:
FilteredFASTQFiles: [1×1 bioinfo.pipeline.Output]
NumFilteredIn: [1×1 bioinfo.pipeline.Output]
NumFilteredOut: [1×1 bioinfo.pipeline.Output]
qcplot.Inputs
ans = struct with fields:
inputFile: [1×1 bioinfo.pipeline.Input]
Connect the FilteredFASTQFiles
port of the sequencefilter
block to the inputFile
port of the qcplot
block.
connect(qcpipeline,sequencefilter,qcplot,["FilteredFASTQFiles","inputFile"]);
Run the pipeline to plot the sequence quality data.
run(qcpipeline);
Run Bioinformatics Pipeline Using Input Structure
Import the Pipeline and block objects needed for the example.
import bioinfo.pipeline.Pipeline import bioinfo.pipeline.block.*
Create a pipeline.
P = Pipeline;
Create a Bowtie2Build
block to build index files for the reference genome.
bowtie2build = Bowtie2Build;
Create a Bowtie2
block to map the read sequences to the reference sequence.
bowtie2 = Bowtie2;
Add the blocks to the pipeline.
addBlock(P,[bowtie2build,bowtie2],["bowtie2build","bowtie2"]);
Get the list of names of all the required input ports from every block in the pipeline that are needed to be set or connected. IndexBaseName
is an input port of both bowtie2build
and bowtie2
block. Reads1File
is the input port of the bowtie2
block and ReferenceFASTAFile
is the input of bowtie2build
block.
portnames = inputNames(P)
portnames = 1×3 string
"IndexBaseName" "Reads1Files" "ReferenceFASTAFiles"
Some blocks have optional input ports. To see the names of these ports, set IncludeOptional=true
. For instance, the Bowtie2
block has an optional input port (Reads2Files
) that accepts files for the second mate reads when you have paired-end read data.
allportnames = inputNames(P,IncludeOptional=true)
allportnames = 1×4 string
"IndexBaseName" "Reads1Files" "Reads2Files" "ReferenceFASTAFiles"
Create an input structure to set the input port values of the bowtie2
and bowtie2build
blocks. Specifically, set IndexBaseName
to "Dmel_chr4"
which is the base name for the reference index files for the Drosophila genome. Set Reads1Files
to "SRR6008575_10k_1.fq"
and Reads2Files
to "SRR6008575_10k_2.fq"
. Set ReferenceFASTAFile
to "Dmel_chr4.fa
". These read files are already provided with the toolbox.
inputStruct.IndexBaseName = "Dmel_chr4"; inputStruct.Reads1Files = "SRR6008575_10k_1.fq"; inputStruct.Reads2Files = "SRR6008575_10k_2.fq"; inputStruct.ReferenceFASTAFiles = "Dmel_chr4.fa";
Optionally, you can compile and check if the input structure is set up correctly. Note that this compilation also happens automatically when you run the pipeline.
compile(P,inputStruct);
Run the pipeline using the structure as an input.
run(P,inputStruct);
Get the bowtie2
block result after the pipeline finishes running.
wait(P); mappedFile = results(P,bowtie2)
mappedFile = struct with fields:
SAMFile: [1×1 bioinfo.pipeline.datatype.File]
The Bowtie2
block generates a SAM file that contains the mapped results. To see the location of the file, use unwrap
.
unwrap(mappedFile.SAMFile)
Input Arguments
pipeline
— Bioinformatics pipeline
bioinfo.pipeline.Pipeline
object
Bioinformatics pipeline, specified as a bioinfo.pipeline.Pipeline
object.
inputStruct
— Input structure to satisfy input ports
structure
Input structure to satisfy unconnected input ports, specified as a structure.
The field names of inputStruct
must match the names of unconnected ports in the pipeline.
Tip
Use inputNames
to get the list of names for all unconnected input ports and use them as field names in inputStruct
.
Data Types: struct
Name-Value Arguments
Specify optional pairs of arguments as
Name1=Value1,...,NameN=ValueN
, where Name
is
the argument name and Value
is the corresponding value.
Name-value arguments must appear after other arguments, but the order of the
pairs does not matter.
Example: run(pipeline,UseParallel=true)
runs the pipeline in
parallel.
ResultsDirectory
— Location to store pipeline results
PipelineResults
folder in the current working
directory (default) | character vector | string scalar
Location to store the pipeline results, specified as a character vector or string
scalar. The default location is the PipelineResults
folder within
the current working directory (pwd
). In the PipelineResults
folder, results from
each block of the pipeline are stored separately in a subfolder that is named after
the block name.
If you rerun the pipeline with the same results directory, what happens to the
existing results depends on RunMode
:
When
RunMode=Minimal
(default), the existing results are reused unless the block has become stale.When
RunMode=Full
, the existing results are always overwritten.
Data Types: char
| string
DisplayLevel
— Information to print to command line
"Warn"
or 2 (default) | "off"
or 0 | "Error"
or 1 | ...
Information to print to the MATLAB® command line while the pipeline is running, specified as one of the following:
"Off"
or 0 — Display no messages."Error"
or 1 — Display only error messages."Warn"
or 2 — Display warnings and errors."Info"
or 3 — Display warnings, errors, and pipeline run progress information."Debug"
or 4 — Display more detailed debugging information.
Data Types: double
| char
| string
UseParallel
— Flag to run pipeline in parallel
false
or 0 (default) | true
or 1
Flag to run the pipeline in parallel, specified as a numeric or logical 1
(true
) or 0 (false
). Parallel Computing Toolbox™ is required to run in parallel.
Note
Only process-based pools are supported. Thread-based pools are not.
Data Types: double
| logical
RunMode
— Run mode of pipeline
"Minimal"
(default) | "Full"
Run mode of the pipeline, specified as one of the following:
"Minimal"
— The pipeline runs only the blocks for which one of the following statements is true:The block has not been run before or its results have been deleted.
You have modified the block since the last time it ran.
Input data, including new runtime inputs, to the block has changed since the last run.
The block has one or more upstream blocks which have run since the last time the block was run.
Tip
If you specify a subset of blocks to run using
To
,From
, andOnly
name-value arguments, these rules are applied only to those selected blocks. It is recommended that you use the default run mode"Minimal"
because skipping up-to-date blocks can save significant time running the pipeline, especially when the pipeline has long-running blocks that do not need to rerun."Full"
— The pipeline runs all blocks even if they have previously computed results.
Data Types: char
| string
From
— Starting blocks
bioinfo.pipeline.Block
object | vector of objects | character vector | string scalar | string vector | cell array of character vector
Starting blocks when you run the pipeline, specified as a
bioinfo.pipeline.Block
object or vector of block objects. You can
also specify a character vector, string scalar, string vector, or cell array of
character vectors representing block names. By default, the pipeline runs every block
that needs to be run as defined by the Minimal
run mode.
If you specify this argument, the pipeline starts running from the specified blocks and all the downstream blocks.
If you specify both To
and From
blocks,
there must exist one block between the blocks specified by To
and
From
.
You cannot use this argument together with the Only
name-value argument.
To
— Ending blocks
bioinfo.pipeline.Block
object | vector of objects | character vector | string scalar | string vector | cell array of character vector
Ending blocks when you run the pipeline, specified as a
bioinfo.pipeline.Block
object, vector of block objects, character
vector, string scalar, string vector, or cell array of character vectors representing
block names. By default, the pipeline runs every block that needs to be run as defined
by the Minimal
run mode.
If you specify this argument, the pipeline runs all the upstream blocks and stops at the specified blocks.
If you specify both To
and From
blocks,
there must exist one block between the blocks specified by To
and
From
.
You cannot use this argument together with the Only
name-value argument.
Only
— Only blocks to run
bioinfo.pipeline.Block
object | vector of objects | string scalar | ...
Only blocks to run, specified as a bioinfo.pipeline.Block
object, vector of block objects, character vector, string scalar, string vector, or
cell array of character vectors representing block names. By default, the pipeline
runs every block that needs to be run as defined by the Minimal
run mode.
If you specify this argument, the pipeline runs only the specified blocks.
You cannot use this argument together with the To
or
From
name-value arguments.
SaveResults
— Blocks with results that are saved to MAT-files
"-all"
(default) | bioinfo.pipeline.Block
object | vector of objects | string scalar | ...
Blocks with results that are saved to MAT-files, specified as a
bioinfo.pipeline.Block
object, vector of block objects, character
vector, string scalar, string vector, or cell array of character vectors representing
block names.
By default (SaveResults = "-all"
), results from each block are
saved in the corresponding MAT-file in the block folder.
More About
Satisfy Input Ports
All required input ports of every block in a pipeline must be satisfied before you can run the pipeline.
To satisfy an input port, you must do one of the following:
Connect to another port.
Set the value of the input port, that is,
myBlock.Inputs.PropertyName.Value
. For example, consider aBamSort
block. To specify the name of a BAM file as the block input value, set the value asbamsortBlock.Inputs.BAMFile.Value = "ex1.bam"
.Pass in an input structure by calling
run(pipeline,inputStruct)
, where inputStruct has the field name equivalent to the input port name and the field value as the input port value.
Version History
Introduced in R2023a
MATLAB Command
You clicked a link that corresponds to this MATLAB command:
Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
You can also select a web site from the following list:
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)