Main Content

Create Co-occurrence Network

This example shows how to create a co-occurrence network using a bag-of-words model.

Given a corpus of documents, a co-occurrence network is an undirected graph, with nodes corresponding to unique words in a vocabulary and edges corresponding to the frequency of words co-occurring in a document. Use co-occurrence networks to visualize and extract information of the relationships between words in a corpus of documents. For example, you can use a co-occurrence network to discover which words commonly appear with a specified word.

Import Text Data

Extract the text data in the file weekendUpdates.xlsx using readtable. The file weekendUpdates.xlsx contains status updates containing the hashtags "#weekend" and "#vacation". Read the data using the readtable function and extract the text data from the TextData column.

filename = "weekendUpdates.xlsx";
tbl = readtable(filename,'TextType','string');
textData = tbl.TextData;

View the first few observations.

textData(1:5)
ans = 5x1 string
    "Happy anniversary! ❤ Next stop: Paris! ✈ #vacation"
    "Haha, BBQ on the beach, engage smug mode!   ❤  #vacation"
    "getting ready for Saturday night  #yum #weekend "
    "Say it with me - I NEED A #VACATION!!! ☹"
    " Chilling  at home for the first time in ages…This is the life!  #weekend"

Preprocess Text Data

Tokenize the text, convert it to lowercase, and remove the stop words.

documents = tokenizedDocument(textData);

documents = lower(documents);
documents = removeStopWords(documents);

Create a matrix of word counts using a bag-of-words model.

bag = bagOfWords(documents);
counts = bag.Counts;

To compute the word co-occurrences, multiply the word-count matrix by its transpose.

cooccurrence = counts.'*counts;

Convert the co-occurrence matrix to a network using the graph function.

G = graph(cooccurrence,bag.Vocabulary,'omitselfloops');

Visualize the network using the plot function. Set the line thickness to a multiple of the edge weight.

LWidths = 5*G.Edges.Weight/max(G.Edges.Weight);

plot(G,'LineWidth',LWidths)
title("Co-occurrence Network")

Find neighbors of the word "great" using the neighbors function.

word = "great"
word = 
"great"
idx = find(bag.Vocabulary == word);
nbrs = neighbors(G,idx);
bag.Vocabulary(nbrs)'
ans = 18x1 string
    "next"
    "#vacation"
    ""
    "#weekend"
    "☹"
    "excited"
    "flight"
    "delayed"
    "stuck"
    "airport"
    "way"
    "spend"
    ""
    "lovely"
    "friends"
    "-"
    "mini"
    "everybody"

Visualize the co-occurrences of the word "great" by extracting a subgraph of this word and its neighbors.

H = subgraph(G,[idx; nbrs]);

LWidths = 5*H.Edges.Weight/max(H.Edges.Weight);
plot(H,'LineWidth',LWidths)
title("Co-occurrence Network - Word: """ + word + """");

For more information about graphs and network analysis, see Graph and Network Algorithms.

See Also

| | |

Related Topics