Welcome to the documentation of pathcensus
package.
It is a Python (3.8+) implementation of structural similarity and
complementarity coefficients for undirected (un)weighted networks based
on efficient counting of 2- and 3-paths (triples and quadruples)
and 3- and 4-cycles (triangles and quadrangles).
Structural coefficients are graph-theoretic measures of the extent to which relations at different levels (of edges, nodes or entire networks) are driven by similarity or complementarity between different nodes. Even though they are defined in purely combinatorial manner they are motivated by geometric arguments which link them to the family of latent space/random geometric graph models. In particular, the geometric view allow the identification of network motifs charactersitic for similarity (triangles) and complementarity (quadrangles). They can be seen as a generalization of the well-known local and global clustering coefficients which summarize the structure of a network in terms of density of ego subgraph(s).
Even though it is a Python package pathcensus
is performant as its main
workhorse functions are just-in-time (JIT) compiled to efficient C code
thanks to the numba library. It is compatible with numpy
arrays and scipy sparse matrices making it easy to use in practice.
Moreover, it allows registering graph classes implemented by different
third-party packages such as networkx so they can be converted
automatically to sparse matrices. Conversion methods for networkx,
igraph and graph-tool are registered automatically
provided the packages are installed.
NOTE
pathcensus
uses theA_{ij} = 1
convention to indicate that a node i sends a tie to a node j. Functions converting graph-like objects to arrays / sparse matrices need to be aware of that.
NOTE
pathcensus
is compatible only with Python versions supported
by numba. In practice it means that it is compatible with all
versions (starting from 3.8) except for the latest one, which usually
starts to be supported by numba with some (often significant)
delay.
For the sake of convenience pathcensus
also provides implementations
of most appropriate null models for statistical calibration of structural
coefficients which are simple wrappers around the excellent NEMtropy
package. It also defines the pathcensus.inference
submodule with
utility class for facilitating approximate statistical inference based on
sampling from null models.
See examples
subfolder and the main documentation for more details.
At the command line via pip:
# Install from PyPI pip install pathcensus
The current development version (not guaranteed to be stable) can be installed directly from the github repo
pip install git+ssh://[email protected]/sztal/pathcensus.git
You find the package useful? Please cite our work properly.
Main theory paper
Talaga, S., & Nowak, A. (2022). Structural measures of similarity and complementarity in complex networks. Scientific Reports, 12(1), 16580. https://doi.org/10.1038/s41598-022-20710-w
NOTE
Main internal functions for calculating path census are JIT-compiled
when used for the first time. Thus, the first initialization of a
PathCensus
object may be quite slow as its execution time will include
the time required for compilation. However, this happens only once.
We will use igraph to generate graphs used in examples. However, even though
it is automatically integrated with pathcensus
, igraph is not
a dependency and needs to be installed separately.
# Main imports used in the examples below
import random
import numpy as np
import igraph as ig
from pathcensus import PathCensus
# Set random and numpy rng seeds
random.seed(303)
np.random.seed(101)
More detailed examples can be found in the official documentation.
Path census is a set of counts of different paths and cycles per edge, node or in the entire graph. The counts are subsequently used to calculate different kinds of structural coefficients.
# Generate simple undirected ER random graph
G = ig.Graph.Erdos_Renyi(100, p=.05, directed=False)
# Initialize path census object.
# it precomputed path/cycle counts at the level of edges.
# Other counts are derived from them.
P = PathCensus(G)
# Get edge-level census
P.census("edges")
# Get node-level census
P.census("nodes") # or just P.census()
# Get global census
P.census("global")
# Column definitions
?P.definitions
Once path census is computed it can be used to calculate structural coefficients.
# Similarity coefficients
P.tclust() # triangle-clustering equivalent to local clustering coefficient
P.tclosure() # triangle-closure equivalent to local closure coefficient
P.similarity() # structural similarity (weighted average of clustering and closure)
# Edge-wise similarity
P.similarity("edges")
# Global similarity (equivalent to global clustering coefficient)
P.similarity("global")
The figure below sums up the design of structural similarity coefficients, their geometric motivation and some of the main properties.
# Complementarity coefficients
P.qclust() # quadrangle-based clustering
P.qclosure() # quadrangle-based closure
P.complementarity() # structural complementarity (weighted average of clustering and closure)
# Edge-wise complementarity
P.complementarity("edges")
# Global complementarity
P.complementarity("global")
The figure below sums up the design and the geometric motivation of complementarity coefficients as well as their main properties.
Similarity and/or complementarity coefficients may be calculated in one go using appropriate methods as shown below.
# Similarity + corresponding clustering and closure coefs
P.simcoefs() # node-wise
P.simcoefs("global") # global
# Complementarity + corresponding clustering and closure coefs
P.compcoefs() # node-wise
P.compcoefs("global") # global
# All coefficients
P.coefs()
# All coefficients + full path census
P.coefs(census=True)
Below we create an ER random graph with random integer edge weights
between 1 and 10. As long as edge weights are assigned to an edge property
of the standard name ("weight"
) they should be detected automatically
and pathcensus
will calculate weighted census. However, unweighted census
may be enforced by using weighted=False
.
G = ig.Graph.Erdos_Renyi(100, p=0.05, directed=False)
G.es["weight"] = np.random.randint(1, 11, G.ecount())
P = PathCensus(G)
P.weighted # True
# Get all coefficients and full path census
P.coefs(census=True)
# Use unweighted census
P = PathCensus(G, weighted=False)
P.weighted # False
P.coefs(census=True)
Below is the summary of the construction of weighted coefficients.
PathCensus
objects may be initialized using parallelized algorithms
by using parallel=True
.
NOTE
Parallel algorithms require an extra compilation step so the first
time parallel=True
is used there will be a significant extra
overhead.
NOTE
The parallel=True
argument may not work and lead to segmentation
faults on some MacOS machines.
# By default all available threads are used
P = PathCensus(G, parallel=True)
# Use specific number of threads
P = PathCensus(G, parallel=True, num_threads=2)
Other main features of pathcensus
are:
- Null models based on the ERGM family.
- Utilities for conducting statistical inference based on null models.
- Integration with arbitrary classes of graph-like objects.
All these features are documented in the official documentation.
The repository with the package source code can be cloned easily from the github repo.
git clone [email protected]:sztal/pathcensus.git
It is recommended to work within an isolated virtual environment. This can be done easily for instance using conda. Remember about using a proper Python version (i.e. 3.8+).
conda create --name my-env python=3.8 conda activate my-env
After entering the directory in which pathcensus
repository
was cloned it is enough to install the package locally.
pip install .
# Or in developer/editable mode
pip install --editable .
In order to run tests it is necessary to install also test dependencies.
pip install -r ./requirements-tests.txt
# Now tests can be run
pytest
# Or alternatively
make test
# And to run linter
make lint
And similarly for building the documentation from source.
pip install -r ./requirements-docs.txt
# Now documentation can be built
make docs
Tests targeting different Python versions can be run using tox test
automation framework. You may first need to install tox
(e.g. pip install tox
).
make test-all
# Or alternatively
tox
Unit test coverage report can be generated easily.
make coverage # Report can be displayed again after running coverage make cov-report
If you have any suggestions or questions about Path census feel free to email me at [email protected].
If you encounter any errors or problems with Path census, please let me know! Open an Issue at the GitHub http://github.com/sztal/pathcensus main repository.
- Szymon Talaga <[email protected]>