splitParagraphs

Split text into paragraphs

Since R2023a

collapse all in page

Syntax

newStr = splitParagraphs(str)

newDocuments = splitParagraphs(document)

Description

newStr = splitParagraphs(str) splits str into an array of paragraphs.

example

newDocuments = splitParagraphs(document) splits a single tokenizedDocument object into a tokenizedDocument array of paragraphs.

Examples

collapse all

Split String into Paragraphs

Open Live Script

Extract the text from the file exampleParagraphs.txt.

str = extractFileText("exampleParagraphs.txt")

str = 
    "This example file contains three paragraphs. The first paragraph contains three sentences. The third sentence is short.
     
     The second paragraph contains one sentence only.
     
     The third (and final) paragraph has seventeen words in total. The final sentence concludes the example file.
     "

Split the text into paragraphs.

paragraphs = splitParagraphs(str)

paragraphs = 3×1 string
    "This example file contains three paragraphs. The first paragraph contains three sentences. The third sentence is short."
    "The second paragraph contains one sentence only."
    "The third (and final) paragraph has seventeen words in total. The final sentence concludes the example file.↵"

Split Document into Paragraphs

Open Live Script

Extract the text from the file exampleParagraphs.txt and tokenize it.

str = extractFileText("exampleParagraphs.txt");
document = tokenizedDocument(str)

document = 
  tokenizedDocument:

   49 tokens: This example file contains three paragraphs . The first paragraph contains three sentences . The third sentence is short . The second paragraph contains one sentence only . The third ( and final ) paragraph has seventeen words in total . The final sentence concludes the example file .

Split the document into paragraphs.

paragraphs = splitParagraphs(document)

paragraphs = 
  3×1 tokenizedDocument:

    20 tokens: This example file contains three paragraphs . The first paragraph contains three sentences . The third sentence is short .
     8 tokens: The second paragraph contains one sentence only .
    21 tokens: The third ( and final ) paragraph has seventeen words in total . The final sentence concludes the example file .

Input Arguments

collapse all

`str` — Input text
string scalar | character vector | scalar cell array containing a character vector

Input text, specified as a string scalar, a character vector, or a scalar cell array containing a character vector.

Data Types: string | char | cell

`document` — Input document
scalar `tokenizedDocument` object

Input document, specified as a scalar tokenizedDocument object.

Output Arguments

collapse all

`newStr` — Output text
string array | cell array of character vectors

Output text, returned as a string array or cell array of character vectors.

If str is a string, then newStr is a string. Otherwise, newStr is a cell array of character vectors.

Data Types: string | cell

`newDocuments` — Output documents
`tokenizedDocument` array

Output documents, returned as a tokenizedDocument array.

Version History

Introduced in R2023a

splitParagraphs

Syntax

Description

Examples

Split String into Paragraphs

Split Document into Paragraphs

Input Arguments

`str` — Input text
string scalar | character vector | scalar cell array containing a character vector

`document` — Input document
scalar `tokenizedDocument` object

Output Arguments

`newStr` — Output text
string array | cell array of character vectors

`newDocuments` — Output documents
`tokenizedDocument` array

Version History

See Also

Topics

splitParagraphs

Syntax

Description

Examples

Split String into Paragraphs

Split Document into Paragraphs

Input Arguments

str — Input text string scalar | character vector | scalar cell array containing a character vector

document — Input document scalar tokenizedDocument object

Output Arguments

newStr — Output text string array | cell array of character vectors

newDocuments — Output documents tokenizedDocument array

Version History

See Also

Topics

`str` — Input text
string scalar | character vector | scalar cell array containing a character vector

`document` — Input document
scalar `tokenizedDocument` object

`newStr` — Output text
string array | cell array of character vectors

`newDocuments` — Output documents
`tokenizedDocument` array