removeShortWords

Remove short words from documents or bag-of-words model

collapse all in page

Syntax

newDocuments = removeShortWords(documents,len)

newBag = removeShortWords(bag,len)

Description

newDocuments = removeShortWords(documents,len) removes words of length len or less from documents.

example

newBag = removeShortWords(bag,len) removes words of length len or less from the bagOfWords object bag.

example

Examples

collapse all

Remove Short Words from Document

Open Live Script

Remove the words with two or fewer characters from a document.

document = tokenizedDocument("An example of a short sentence");
newDocument = removeShortWords(document,2)

newDocument = 
  tokenizedDocument:

   3 tokens: example short sentence

Remove Short Words from Bag-of-Words Model

Open Live Script

Remove the words with two or fewer characters from a bag-of-words model.

documents = tokenizedDocument([ ...
    "an example of a short sentence"
    "a second short sentence"]);
bag = bagOfWords(documents);
newBag = removeShortWords(bag,2)

newBag = 
  bagOfWords with properties:

        NumWords: 4
          Counts: [2×4 double]
      Vocabulary: ["example"    "short"    "sentence"    "second"]
    NumDocuments: 2

Input Arguments

collapse all

`documents` — Input documents
`tokenizedDocument` array

Input documents, specified as a tokenizedDocument array.

`bag` — Input bag-of-words model
`bagOfWords` object

Input bag-of-words model, specified as a bagOfWords object.

`len` — Maximum length of words to remove
positive integer

Maximum length of words to remove, specified as a positive integer. The function removes words with len or fewer characters.

Output Arguments

collapse all

`newDocuments` — Output documents
`tokenizedDocument` array

Output documents, returned as a tokenizedDocument array.

`newBag` — Output bag-of-words model
`bagOfWords` object

Output bag-of-words model, returned as a bagOfWords object.

Version History

Introduced in R2017b

removeShortWords

Syntax

Description

Examples

Remove Short Words from Document

Remove Short Words from Bag-of-Words Model

Input Arguments

`documents` — Input documents
`tokenizedDocument` array

`bag` — Input bag-of-words model
`bagOfWords` object

`len` — Maximum length of words to remove
positive integer

Output Arguments

`newDocuments` — Output documents
`tokenizedDocument` array

`newBag` — Output bag-of-words model
`bagOfWords` object

Version History

See Also

Topics

removeShortWords

Syntax

Description

Examples

Remove Short Words from Document

Remove Short Words from Bag-of-Words Model

Input Arguments

documents — Input documents tokenizedDocument array

bag — Input bag-of-words model bagOfWords object

len — Maximum length of words to remove positive integer

Output Arguments

newDocuments — Output documents tokenizedDocument array

newBag — Output bag-of-words model bagOfWords object

Version History

See Also

Topics

`documents` — Input documents
`tokenizedDocument` array

`bag` — Input bag-of-words model
`bagOfWords` object

`len` — Maximum length of words to remove
positive integer

`newDocuments` — Output documents
`tokenizedDocument` array

`newBag` — Output bag-of-words model
`bagOfWords` object