Main Content

Create Co-occurrence Network

This example shows how to create a co-occurrence network using a bag-of-words model.

Given a corpus of documents, a co-occurrence network is an undirected graph, with nodes corresponding to unique words in a vocabulary and edges corresponding to the frequency of words co-occurring in a document. Use co-occurrence networks to visualize and extract information of the relationships between words in a corpus of documents. For example, you can use a co-occurrence network to discover which words commonly appear with a specified word.

Import Text Data

Extract the text data in the file weekendUpdates.xlsx using readtable. The file weekendUpdates.xlsx contains status updates containing the hashtags "#weekend" and "#vacation". Read the data using the readtable function and extract the text data from the TextData column.

filename = "weekendUpdates.xlsx";
tbl = readtable(filename,'TextType','string');
textData = tbl.TextData;

View the first few observations.

ans = 5x1 string
    "Happy anniversary! ❤ Next stop: Paris! ✈ #vacation"
    "Haha, BBQ on the beach, engage smug mode! 😍 😎 ❤ 🎉 #vacation"
    "getting ready for Saturday night 🍕 #yum #weekend 😎"
    "Say it with me - I NEED A #VACATION!!! ☹"
    "😎 Chilling 😎 at home for the first time in ages…This is the life! 👍 #weekend"

Preprocess Text Data

Tokenize the text, convert it to lowercase, and remove the stop words.

documents = tokenizedDocument(textData);

documents = lower(documents);
documents = removeStopWords(documents);

Create a matrix of word counts using a bag-of-words model.

bag = bagOfWords(documents);
counts = bag.Counts;

To compute the word co-occurrences, multiply the word-count matrix by its transpose.

cooccurrence = counts.'*counts;

Convert the co-occurrence matrix to a network using the graph function.

G = graph(cooccurrence,bag.Vocabulary,'omitselfloops');

Visualize the network using the plot function. Set the line thickness to a multiple of the edge weight.

LWidths = 5*G.Edges.Weight/max(G.Edges.Weight);

title("Co-occurence Network")

Figure contains an axes object. The axes object with title Co-occurence Network contains an object of type graphplot.

Find neighbors of the word "great" using the neighbors function.

word = "great"
word = 
idx = find(bag.Vocabulary == word);
nbrs = neighbors(G,idx);
ans = 18x1 string

Visualize the co-occurrences of the word "great" by extracting a subgraph of this word and its neighbors.

H = subgraph(G,[idx; nbrs]);

LWidths = 5*H.Edges.Weight/max(H.Edges.Weight);
title("Co-occurence Network - Word: """ + word + """");

Figure contains an axes object. The axes object with title Co-occurence Network - Word: "great" contains an object of type graphplot.

For more information about graphs and network analysis, see Graph and Network Algorithms.

See Also

| | |

Related Topics