how to find most common words in text by matlab

14 views (last 30 days)
how to tag POS on nouns and verbs in MATLAB, Is it related to regular expressions? I know that regular expressions find a pattern in a text, but I want to find the most common words in texts and tag POS on them( I mean the words are nouns or verbs) and then exchange that POS and make an unfamiliar pair of words. how can I find the most common words in texts by MATLAB?is there any solution for that or I should use another software?

Accepted Answer

Christopher Creutzig
Christopher Creutzig on 2 Nov 2017
Edited: Christopher Creutzig on 26 Nov 2018
Finding the most common words is easy with Text Analytics Toolbox:
>> sonnets = extractFileText("sonnets.txt");
>> sonnets = erasePunctuation(sonnets);
>> tokenizedSonnets = tokenizedDocument(lower(sonnets));
>> bag = bagOfWords(tokenizedSonnets);
>> topkwords(bag, 10)
ans =
10×2 table
Word Count
______ _____
"and" 490
"the" 436
"to" 409
"my" 371
"of" 370
"i" 344
"in" 321
"that" 320
"thy" 281
"thou" 234
You probably want to remove some words (check out removeWords and stopWords). POS tagging is supported in release R2018b and later, see addPartOfSpeechDetails.
  2 Comments
Christopher Creutzig
Christopher Creutzig on 2 May 2018
What command(s) did you try to read that file? The error message looks like you tried to read it as a table; try using the commands listed above instead.

Sign in to comment.

More Answers (2)

Sarah Palfreyman
Sarah Palfreyman on 30 Apr 2018
Edited: Sarah Palfreyman on 30 Apr 2018

Charmaine Tan
Charmaine Tan on 26 Nov 2018
Hi, after finding my topkwords (most frequent words), how can I plot a histogram of these?
  2 Comments

Sign in to comment.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!