What Is Text Analytics Toolbox?

Text Analytics Toolbox provides tools for extracting text from documents, preprocessing raw text, visualizing text, and performing machine learning on text data.  

You can use Text Analytics Toolbox to analyze data from sources like maintenance reports, operations logs, financial documents, web and social media sources.

You can extract raw text from a variety of sources including Microsoft Word, Microsoft Excel, and PDF and use word clouds to view the relative frequency of words and interactive scatter plots to understand the numeric relationships between words.

Text Analytics Toolbox provides functions for pre-processing raw text such as removing common words and punctuation and tokenizing documents into individual words or phrases.

Once text is pre-processed, converting text to numeric representations let you do more analysis and visualizations to understand word frequencies including: 

  • Histograms to compare word counts
  • Bag of Words and Ngrams to enable efficient visualization  and computation 
  • and TF-IDF models for text mining and machine learning 

Statistics and machine learning algorithms can be used with text analytics to perform topic modeling to identify themes in documents, classify documents and make predictions. 

You can train machine learning models or use pre-trained word embedding models such as word2vec, FastText and GloVe. 

In this example, the Latent Dirichlet Allocation algorithm is used to build a topic model with 60 topics in storm reports to identify damage and weather patterns. 

You can also use deep learning algorithms to build accurate classifiers when you have large sets of documents and use parallel computing to speed up text processing and training.  

For more information about Text Analytics Toolbox, see the product page, or choose a link below.

Product Focus