This is machine translation

Translated by Microsoft
Mouseover text to see original. Click the button below to return to the English version of the page.

Note: This page has been translated by MathWorks. Click here to see
To view all translated materials including this page, select Country from the country navigator on the bottom of this page.


Remove short words from documents or bag-of-words model


newDocuments = removeShortWords(documents,len)
newBag = removeShortWords(bag,len)



newDocuments = removeShortWords(documents,len) removes words of length len or less from documents.


newBag = removeShortWords(bag,len) removes words of length len or less from the bagOfWords object bag.


collapse all

Remove the words with two or fewer characters from a document.

document = tokenizedDocument("An example of a short sentence");
newDocument = removeShortWords(document,2)
newDocument = 

   3 tokens: example short sentence

Remove the words with two or fewer characters from a bag-of-words model.

documents = tokenizedDocument([ ...
    "an example of a short sentence"
    "a second short sentence"]);
bag = bagOfWords(documents);
newBag = removeShortWords(bag,2)
newBag = 
  bagOfWords with properties:

          Counts: [2x4 double]
      Vocabulary: ["example"    "short"    "sentence"    "second"]
        NumWords: 4
    NumDocuments: 2

Input Arguments

collapse all

Input documents, specified as a tokenizedDocument array.

Input bag-of-words model, specified as a bagOfWords object.

Maximum length of words to remove, specified as a positive integer. The function removes words with len or fewer characters.

Output Arguments

collapse all

Output documents, returned as a tokenizedDocument array.

Output bag-of-words model, returned as a bagOfWords object.

Introduced in R2017b