Main Content

getGenes

Return table of unique genes in GTFAnnotation object

Description

genes = getGenes(AnnotObj) returns genes, a table of genes referenced by exons in AnnotObj.

genes = getGenes(AnnotObj,"Reference",R) returns one or more genes that belong to one or more references specified by R.

example

genes = getGenes(AnnotObj,"Gene",G) returns one or more genes specified by G.

genes = getGenes(AnnotObj,"Transcript",T) returns one or more genes that contains one or more transcripts specified by T.

Examples

collapse all

Create a GTFAnnotation object from a GTF-formatted file.

obj = GTFAnnotation("hum37_2_1M.gtf");

Retrieve unique reference names. In this case, there is only one reference sequence, which is chromosome 2 (chr2).

ref = getReferenceNames(obj)
ref = 1x1 cell array
    {'chr2'}

Get a table of all genes which belong to chr2.

genes = getGenes(obj,"Reference",ref)
genes=28×7 table
        GeneID         GeneName     Reference    Start      Stop     Strand    NumTranscripts
    ______________    __________    _________    ______    ______    ______    ______________

    {'uc010yim.1'}    {0x0 char}      chr2        41609     46385      -             1       
    {'uc002qvu.2'}    {0x0 char}      chr2       218138    249852      -             1       
    {'uc002qvv.2'}    {0x0 char}      chr2       218138    256690      -             1       
    {'uc002qvw.2'}    {0x0 char}      chr2       218138    260702      -             1       
    {'uc002qvx.2'}    {0x0 char}      chr2       218138    264068      -             1       
    {'uc002qvy.2'}    {0x0 char}      chr2       218138    264068      -             1       
    {'uc002qvz.2'}    {0x0 char}      chr2       218138    264392      -             1       
    {'uc002qwa.2'}    {0x0 char}      chr2       218138    264743      -             1       
    {'uc010ewe.2'}    {0x0 char}      chr2       218138    264810      -             1       
    {'uc002qwb.2'}    {0x0 char}      chr2       239563    242178      -             1       
    {'uc002qwc.1'}    {0x0 char}      chr2       243503    262786      -             1       
    {'uc002qwd.2'}    {0x0 char}      chr2       264869    272481      +             1       
    {'uc002qwe.3'}    {0x0 char}      chr2       264869    273148      +             1       
    {'uc002qwg.2'}    {0x0 char}      chr2       264869    278280      +             1       
    {'uc002qwh.2'}    {0x0 char}      chr2       264869    278280      +             1       
    {'uc002qwf.2'}    {0x0 char}      chr2       264869    278280      +             1       
      ⋮

Input Arguments

collapse all

GTF annotation, specified as a GTFAnnotation object.

Names of reference sequences, specified as a character vector, string, string vector, cell array of character vectors, or categorical array.

The names must come from the Reference field of AnnotObj. If a name does not exist, the function provides a warning and ignores it.

Data Types: char | string | cell | categorical

Names of genes, specified as a character vector, string, string vector, cell array of character vectors, or categorical array.

The names must come from the Gene field of AnnotObj. If a name does not exist, the function provides a warning and ignores the name.

Data Types: char | string | cell | categorical

Names of transcripts, specified as a character vector, string, string vector, cell array of character vectors, or categorical array.

The names must come from the Transcript field of AnnotObj. If a name does not exist, the function gives a warning and ignores the name.

Data Types: char | string | cell | categorical

Output Arguments

collapse all

Genes referenced by exons in AnnotObj, returned as a table. The table contains the following variables for each gene.

Variable NameDescription
GeneIDCell array of character vectors containing gene IDs as listed in AnnotObj, obtained from the Gene field of AnnotObj.
GeneNameCell array of character vectors containing gene names, obtained from the Attributes field of AnnotObj. This cell array can contain empty character vectors if the corresponding gene names are not found in Attributes.
ReferenceCategorical array representing the names of reference sequences to which the genes belong, obtained from the Reference field of AnnotObj.
StartStart location of the first exon in each gene.
StopStop location of the last exon in each gene.
StrandCategorical array containing the strand of each gene.
NumTranscriptsInteger array listing the number of transcripts in each gene.

Version History

Introduced in R2014b