Skip to content

demonstrate text mining, TermDocumentMatrix and clustering in R

Notifications You must be signed in to change notification settings

hagai-lvi/twitter_term_document_matrix

Repository files navigation

Twitter networks

This is the second assignment in the data scientist course at BGU university. The previous assignment is at yelp-api-exploration

The main idea of this assignment is to experiment with networks analysis. We are showing some clustering, and visualization of networks and clusters.

In this assignment we are gathering tweets by Barack Obama. Then we create a Document-term matrix that shows us which terms appear in each of the documents (a document is a tweet in our case).
After that we are constructing a term-term matrix, that shows to which terms each term is connected. This is actually an adjacency matrix representation of a graph that shows the connections between the terms.
There are many terms, so we only choose the most frequent terms, and show them in a graph with clustering.
In addition, we show some metrics for the vertices - betweenness, closeness and eigen values.

Check the results

View directly in github here

About

demonstrate text mining, TermDocumentMatrix and clustering in R

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published