This is machine translation

Translated by Microsoft
Mouseover text to see original. Click the button below to return to the English version of the page.

Note: This page has been translated by MathWorks. Click here to see
To view all translated materials including this page, select Country from the country navigator on the bottom of this page.

removeInfrequentWords

Remove words with low counts from bag-of-words model

Syntax

newBag = removeInfrequentWords(bag,count)

Description

example

newBag = removeInfrequentWords(bag,count) removes the words that appear at most count times in total from the bag-of-words model bag.

Examples

collapse all

Remove the words that appear two times or fewer from a bag-of-words model.

Create a bag-of-words model from an array of tokenized documents.

documents = tokenizedDocument([
    "an example of a short sentence"
    "a second short sentence"
    "another example"
    "a short example"]);
bag = bagOfWords(documents)
bag = 
  bagOfWords with properties:

          Counts: [4x8 double]
      Vocabulary: [1x8 string]
        NumWords: 8
    NumDocuments: 4

Remove the words that appear two times or fewer from the bag-of-words model.

count = 2;
newBag = removeInfrequentWords(bag,count)
newBag = 
  bagOfWords with properties:

          Counts: [4x3 double]
      Vocabulary: ["example"    "a"    "short"]
        NumWords: 3
    NumDocuments: 4

Input Arguments

collapse all

Input bag-of-words model, specified as a bagOfWords object.

Count threshold to remove words, specified as a positive integer. The function removes the words that appear count times in total or fewer.

Introduced in R2017b