Skip to content

Scripts to explore and visualize distributional semantic models using graphs.

License

Notifications You must be signed in to change notification settings

evanmiltenburg/dm-graphs

Repository files navigation

dm-graphs

The scripts in dm_graphs.py provide a way to explore the relationships between words in a distributional model. It relies on the NetworkX package. Please see the paper for more context. For an interactive demo, go here.

This work was presented as a poster at the ADS workshop. Bibtex:

  @unpublished{miltenburg2015exploring,
  	Author = {Emiel van Miltenburg},
  	Date-Added = {2016-07-17 09:22:11 +0000},
  	Date-Modified = {2016-07-17 16:43:09 +0000},
  	Note = {Presented as a poster at the Advances in Distributional Semantics workshop, collocated with IWCS. GitHub page: \url{https://github.com/evanmiltenburg/dm-graphs}},
  	Title = {Exploring and visualizing distributional models using graphs},
  	Year = {2015}}

Basic usage

The file googlenews.py shows how to use the dm_graphs module. Just create an iterable that contains tuples (u,v,w) corresponding to edges between u and v with weight w. The weight in this case is the cosine similarity between u and v. Then create a graph, and fill it with the data. The rest of the code shows how to make the network easier to visualize by using dm_graphs.graph_reduce(G) and dm_graphs.MST_pathfinder(G). These return a sparser version of the network.

The folder googlenews-demo contains some output files (.gexf) that you can open with Gephi. The pdf files contain visualizations of this data. For these, I used YifanHu's MultiLevel layout algorithm, followed by ForceAtlas2. The colors were randomly generated by Gephi. I used the modularity detection function to detect clusters, and then applied partition coloring by modularity class.

Features

Main functions

  • MST_pathfinder() is an implementation of MST-pathfinder algorithm (Quirin et al. 2008).
  • graph_reduce() reduces the graph by only including the edges that link each node to its top-n similar neighbors. It also has an optional restriction such that every edge should have a weight above a particular threshold.
  • maxmax_transform() is an implementation of phase 1 of the MaxMax algorithm (Hope & Keller 2013). It transforms a weighted undirected graph into a directed graph.
  • maxmax_clusters() is a function inspired by the MaxMax algorithm that produces soft clusters of words. Clusters should correspond to word senses.

Graph analysis

  • main_graph() returns the largest connected component.
  • graph_analysis() returns some statistics about the graph, including a suggested partition based on the Louvain method.

Utilities

  • add_partition_data() adds partition data from the analysis to the graph.
  • invert_weights() changes weights on all edges to 1-weight.
  • rank_reweight() reweights the edges based on the similarity ranks of the nodes.
  • remove_weights() removes the weights from the graph. Useful if you don't want Gephi to make the edges thicker.
  • write_functions() displays the available methods to write out the graphs.

About

Scripts to explore and visualize distributional semantic models using graphs.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages