Main Content

Data Formats and Databases

The Bioinformatics Toolbox™ lets you access many of the databases on the web and other online data repositories. It lets you copy data into the MATLAB® workspace, and read and write to files with standard bioinformatic formats. It also reads many common genome file formats so that you do not have to write and maintain your own file readers.

Web-based databases — You can directly access public databases on the Web and copy sequence and gene expression information into the MATLAB environment.

The sequence databases currently supported are GenBank® (getgenbank), GenPept (getgenpept), European Molecular Biology Laboratory (EMBL) (getembl), and Protein Data Bank (PDB) (getpdb). You can also access data from the NCBI Gene Expression Omnibus (GEO) Web site by using a single function (getgeodata).

Get multiply aligned sequences (gethmmalignment), hidden Markov model profiles (gethmmprof), and phylogenetic tree data (gethmmtree) from the PFAM database.

Gene Ontology database — Load the database from the Web into a gene ontology object (geneont). Select sections of the ontology with methods for the geneont object (getancestors, getdescendents, getmatrix, getrelatives), and manipulate data with utility functions (goannotread, num2goid).

Read data from instruments — Read data generated from gene sequencing instruments (scfread, joinseq, traceplot), mass spectrometers (jcampread), and Agilent® microarray scanners (agferead).

Reading data formats — The toolbox provides a number of functions for reading data from common bioinformatic file formats.

Writing data formats — The functions for getting data from the Web include the option to save the data to a file. However, there is a function to write data to a file using the FASTA format (fastawrite).

BLAST searches — Request Web-based BLAST searches (blastncbi), get the results from a search (getblast) and read results from a previously saved BLAST formatted report file (blastread).

The MATLAB environment has built-in support for other industry-standard file formats including Microsoft® Excel® and comma-separated-value (CSV) files. Additional functions perform ASCII and low-level binary I/O, allowing you to develop custom functions for working with any data format.

Related Topics