This is machine translation

Translated by Microsoft
Mouseover text to see original. Click the button below to return to the English version of the page.

Note: This page has been translated by MathWorks. Click here to see
To view all translated materials including this page, select Country from the country navigator on the bottom of this page.

removeDocument

Remove documents from bag-of-words or bag-of-n-grams model

Syntax

newBag = removeDocument(bag,idx)

Description

example

newBag = removeDocument(bag,idx) removes the documents with indices specified by idx from the bag-of-words or bag-of-n-grams model bag. If the removed documents contain words or n-grams that do not appear in the remaining documents, then the function also removes these words or n-grams from bag.

Examples

collapse all

Remove selected documents from a bag-of-words model.

documents = tokenizedDocument([ ...
    "an example of a short sentence" 
    "a second short sentence"
    "a third example"
    "a final sentence"]);
bag = bagOfWords(documents)
bag = 
  bagOfWords with properties:

          Counts: [4x9 double]
      Vocabulary: [1x9 string]
        NumWords: 9
    NumDocuments: 4

Remove the first and third documents from bag.

idx = [1 3];
newBag = removeDocument(bag,idx)
newBag = 
  bagOfWords with properties:

          Counts: [2x5 double]
      Vocabulary: ["a"    "short"    "sentence"    "second"    "final"]
        NumWords: 5
    NumDocuments: 2

Remove the same documents using logical indices.

idx = logical([1 0 1 0]);
newBag = removeDocument(bag,idx)
newBag = 
  bagOfWords with properties:

          Counts: [2x5 double]
      Vocabulary: ["a"    "short"    "sentence"    "second"    "final"]
        NumWords: 5
    NumDocuments: 2

Input Arguments

collapse all

Input bag-of-words or bag-of-n-grams model, specified as a bagOfWords object or a bagOfNgrams object.

Indices of documents to remove, specified as a vector of numeric indices or a vector of logical indices.

Example: [2 4 6]

Example: [0 1 0 1 0 1]

Output Arguments

collapse all

Output model, returned as a bagOfWords object or a bagOfNgrams object. The type of newBag is the same as the type of bag.

Introduced in R2017b