Skip to content

Commit

Permalink
Fix eigen decomposition; new feature to store all optimization tries;…
Browse files Browse the repository at this point in the history
… and updates to documentation (#95)
  • Loading branch information
d-schindler authored Nov 14, 2024
1 parent 6c993a3 commit 6614d19
Show file tree
Hide file tree
Showing 25 changed files with 3,810 additions and 1,589 deletions.
5 changes: 3 additions & 2 deletions .pylintrc
Original file line number Diff line number Diff line change
Expand Up @@ -2,15 +2,16 @@
# for some of the options that are available

[MESSAGES CONTROL]
disable=C0103,R0904,R0903,W0511,R0801,R0401,I0013,W0622,C0325,R0205,W1201,W0621,R0913,R0914,C0415,W0719
disable=C0103,R0904,R0903,W0511,R0801,R0401,I0013,W0622,C0325,R0205,W1201,W0621,R0913,R0914,C0415,W0719,E0606

[FORMAT]
# Maximum number of characters on a single line.
max-line-length=100

[DESIGN]
# Maximum number of arguments for function / method
max-args=8
max-args=20
max-positional-arguments=30
# Argument names that match this expression will be ignored. Default to name
# with leading underscore
ignored-argument-names=_.*
Expand Down
20 changes: 11 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -103,7 +103,7 @@ There are also additional post-processing and analysis functions, including:
Optimal scale selection [6] is performed by default with the run function but can be repeated with different parameters if needed, see `pygenstability/optimal_scales.py`. To reduce noise, e.g., one can increase the parameter values for `block_size` and `window_size`. The optimal network partitions can then be plotted given a NetworkX nx_graph.

```python
results = pgs.identify_optimal_scales(results, block_size=10, window_size=5)
results = pgs.identify_optimal_scales(results, kernel_size=10, window_size=5)
pgs.plot_optimal_partitions(nx_graph, results)
```

Expand Down Expand Up @@ -134,7 +134,7 @@ We provide an easy-to-use interface in our `pygenstability.data_clustering.py` m

```python
clustering = pgs.DataClustering(
graph_method="cknn",
graph_method="cknn-mst",
k=5,
constructor="continuous_normalized")

Expand All @@ -146,7 +146,7 @@ clustering.scale_selection(kernel_size=0.2)
clustering.plot_scan()
```

We currently support $k$-Nearest Neighbor (kNN) and Continuous $k$-Nearest Neighbor (CkNN) [10] graph constructions (specified by `graph_method`) and `k` refers to the number of neighbours considered in the construction. See documentation for a list of all parameters. All functionalities of PyGenStability including plotting and scale selection are also available for data clustering. For example, given two-dimensional coordinates of the data points one can plot the optimal partitions directly:
We currently support $k$-Nearest Neighbor (kNN) and Continuous $k$-Nearest Neighbor (CkNN) [10] graph constructions (specified by `graph_method`) augmented with the minimum spanning tree to guarentee connectivity, where `k` refers to the number of neighbours considered in the construction. See documentation for a list of all parameters. All functionalities of PyGenStability including plotting and scale selection are also available for data clustering. For example, given two-dimensional coordinates of the data points one can plot the optimal partitions directly:

```python
# plot robust partitions
Expand All @@ -168,11 +168,13 @@ Please cite our paper if you use this code in your own work:
```
@article{pygenstability,
author = {Arnaudon, Alexis and Schindler, Dominik J. and Peach, Robert L. and Gosztolai, Adam and Hodges, Maxwell and Schaub, Michael T. and Barahona, Mauricio},
title = {PyGenStability: Multiscale community detection with generalized Markov Stability},
publisher = {arXiv},
year = {2023},
doi = {10.48550/ARXIV.2303.05385},
url = {https://arxiv.org/abs/2303.05385}
title = {Algorithm 1044: PyGenStability, a Multiscale Community Detection Framework with Generalized Markov Stability},
journal = {ACM Trans. Math. Softw.},
volume = {50},
number = {2},
pages = {15:1–15:8}
year = {2024},
doi = {10.1145/3651225}
}
```

Expand Down Expand Up @@ -248,7 +250,7 @@ If you are interested in trying our other packages, see the below list:

[6] D. J. Schindler, J. Clarke, and M. Barahona, 'Multiscale Mobility Patterns and the Restriction of Human Movement', *Royal Society Open Science*, vol. 10, no. 10, p. 230405, Oct. 2023, doi: 10.1098/rsos.230405.

[7] A. Arnaudon, D. J. Schindler, R. L. Peach, A. Gosztolai, M. Hodges, M. T. Schaub, and M. Barahona, 'PyGenStability: Multiscale community detection with generalized Markov Stability', *arXiv pre-print*, Mar. 2023, doi: 10.48550/arXiv.2303.05385.
[7] A. Arnaudon, D. J. Schindler, R. L. Peach, A. Gosztolai, M. Hodges, M. T. Schaub, and M. Barahona, 'Algorithm 1044: PyGenStability, a Multiscale Community Detection Framework with Generalized Markov Stability', *ACM Trans. Math. Softw.*, vol. 50, no. 2, p. 15:1–15:8, Jun. 2024, doi: 10.1145/3651225.

[8] S. Gomez, P. Jensen, and A. Arenas, 'Analysis of community structure in networks of correlated data'. *Physical Review E*, vol. 80, no. 1, p. 016114, Jul. 2009, doi: 10.1103/PhysRevE.80.016114.

Expand Down
21 changes: 12 additions & 9 deletions docs/index_readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -88,7 +88,7 @@ There are also additional post-processing and analysis functions, including:
Optimal scale selection [6] is performed by default with the run function but can be repeated with different parameters if needed, see `pygenstability/optimal_scales.py`. To reduce noise, e.g., one can increase the parameter values for `block_size` and `window_size`. The optimal network partitions can then be plotted given a NetworkX nx_graph.

```python
results = pgs.identify_optimal_scales(results, block_size=10, window_size=5)
results = pgs.identify_optimal_scales(results, kernel_size=10, window_size=5)
pgs.plot_optimal_partitions(nx_graph, results)
```

Expand Down Expand Up @@ -119,7 +119,7 @@ We provide an easy-to-use interface in our `pygenstability.data_clustering.py` m

```python
clustering = pgs.DataClustering(
graph_method="cknn",
graph_method="cknn-mst",
k=5,
constructor="continuous_normalized")

Expand All @@ -131,7 +131,7 @@ clustering.scale_selection(kernel_size=0.2)
clustering.plot_scan()
```

We currently support $k$-Nearest Neighbor (kNN) and Continuous $k$-Nearest Neighbor (CkNN) [10] graph constructions (specified by `graph_method`) and `k` refers to the number of neighbours considered in the construction. See documentation for a list of all parameters. All functionalities of PyGenStability including plotting and scale selection are also available for data clustering. For example, given two-dimensional coordinates of the data points one can plot the optimal partitions directly:
We currently support $k$-Nearest Neighbor (kNN) and Continuous $k$-Nearest Neighbor (CkNN) [10] graph constructions (specified by `graph_method`) augmented with the minimum spanning tree to guarentee connectivity, where `k` refers to the number of neighbours considered in the construction. See documentation for a list of all parameters. All functionalities of PyGenStability including plotting and scale selection are also available for data clustering. For example, given two-dimensional coordinates of the data points one can plot the optimal partitions directly:

```python
# plot robust partitions
Expand All @@ -153,11 +153,13 @@ Please cite our paper if you use this code in your own work:
```
@article{pygenstability,
author = {Arnaudon, Alexis and Schindler, Dominik J. and Peach, Robert L. and Gosztolai, Adam and Hodges, Maxwell and Schaub, Michael T. and Barahona, Mauricio},
title = {PyGenStability: Multiscale community detection with generalized Markov Stability},
publisher = {arXiv},
year = {2023},
doi = {10.48550/ARXIV.2303.05385},
url = {https://arxiv.org/abs/2303.05385}
title = {Algorithm 1044: PyGenStability, a Multiscale Community Detection Framework with Generalized Markov Stability},
journal = {ACM Trans. Math. Softw.},
volume = {50},
number = {2},
pages = {15:1–15:8}
year = {2024},
doi = {10.1145/3651225}
}
```

Expand Down Expand Up @@ -233,7 +235,7 @@ If you are interested in trying our other packages, see the below list:

[6] D. J. Schindler, J. Clarke, and M. Barahona, 'Multiscale Mobility Patterns and the Restriction of Human Movement', *Royal Society Open Science*, vol. 10, no. 10, p. 230405, Oct. 2023, doi: 10.1098/rsos.230405.

[7] A. Arnaudon, D. J. Schindler, R. L. Peach, A. Gosztolai, M. Hodges, M. T. Schaub, and M. Barahona, 'PyGenStability: Multiscale community detection with generalized Markov Stability', *arXiv pre-print*, Mar. 2023, doi: 10.48550/arXiv.2303.05385.
[7] A. Arnaudon, D. J. Schindler, R. L. Peach, A. Gosztolai, M. Hodges, M. T. Schaub, and M. Barahona, 'Algorithm 1044: PyGenStability, a Multiscale Community Detection Framework with Generalized Markov Stability', *ACM Trans. Math. Softw.*, vol. 50, no. 2, p. 15:1–15:8, Jun. 2024, doi: 10.1145/3651225.

[8] S. Gomez, P. Jensen, and A. Arenas, 'Analysis of community structure in networks of correlated data'. *Physical Review E*, vol. 80, no. 1, p. 016114, Jul. 2009, doi: 10.1103/PhysRevE.80.016114.

Expand All @@ -248,3 +250,4 @@ This program is free software: you can redistribute it and/or modify it under th
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see http://www.gnu.org/licenses/.

39 changes: 26 additions & 13 deletions src/pygenstability/constructors.py
Original file line number Diff line number Diff line change
Expand Up @@ -53,8 +53,8 @@ def limit(*args, **kwargs):


def _compute_spectral_decomp(matrix):
"""Solve eigenalue problem."""
lambdas, v = la.eig(matrix.toarray())
"""Solve eigenalue problem for symmetric matrix."""
lambdas, v = la.eigh(matrix.toarray())
vinv = la.inv(v.real)
return lambdas.real, v.real, vinv

Expand Down Expand Up @@ -168,14 +168,20 @@ def _get_data(self, scale):
class constructor_continuous_combinatorial(Constructor):
r"""Constructor for continuous combinatorial Markov Stability.
The quality matrix is:
This implementation follows equation (16) in [1]_. The quality matrix is:
.. math::
F(t) = \Pi\exp(-Lt)
F(t) = \Pi\exp(-tL/<d>)
where :math:`L=D-A` is the combinatorial Laplacian and :math:`\Pi=\mathrm{diag}(\pi)`,
where :math:`<d>=(\boldsymbol{1}^T D \boldsymbol{1})/N` is the average degree,
:math:`L=D-A` is the combinatorial Laplacian and :math:`\Pi=\mathrm{diag}(\pi)`,
with null model :math:`v_1=v_2=\pi=\frac{\boldsymbol{1}}{N}`.
References:
.. [1] Lambiotte, R., Delvenne, J.-C., & Barahona, M. (2019). Random Walks, Markov
Processes and the Multiscale Modular Organization of Complex Networks.
IEEE Trans. Netw. Sci. Eng., 1(2), p. 76-90.
"""

@_limit_numpy
Expand Down Expand Up @@ -208,11 +214,11 @@ def _get_data(self, scale):
class constructor_continuous_normalized(Constructor):
r"""Constructor for continuous normalized Markov Stability.
The quality matrix is:
This implementation follows equation (10) in [1]_. The quality matrix is:
.. math::
F(t) = \Pi\exp(-Lt)
F(t) = \Pi\exp(-tL)
where :math:`L=D^{-1}(D-A)` is the random-walk normalized Laplacian and
:math:`\Pi=\mathrm{diag}(\pi)` with null model :math:`v_1=v_2=\pi=\frac{d}{2M}`.
Expand Down Expand Up @@ -250,12 +256,12 @@ def _get_data(self, scale):
class constructor_signed_modularity(Constructor):
"""Constructor of signed modularity.
This implementation is equation (18) of [1]_, where quality is the adjacency matrix and
This implementation is equation (18) of [2]_, where quality is the adjacency matrix and
the null model is the difference between the standard modularity null models based on
positive and negative degree vectors.
References:
.. [1] Gomez, S., Jensen, P., & Arenas, A. (2009). Analysis of community structure in
.. [2] Gomez, S., Jensen, P., & Arenas, A. (2009). Analysis of community structure in
networks of correlated data. Physical Review E, 80(1), 016114.
"""

Expand Down Expand Up @@ -293,16 +299,20 @@ def _get_data(self, scale):
class constructor_signed_combinatorial(Constructor):
r"""Constructor for continuous signed combinatorial Markov Stability.
The quality matrix is:
This implementation follows equation (19) in [3]_. The quality matrix is:
.. math::
F(t) = \exp(-Lt)^T\exp(-Lt)
F(t) = \exp(-tL)^T\exp(-tL)
where :math:`L=D_{\mathrm{abs}}-A` is the signed combinatorial Laplacian,
:math:`D_{\mathrm{abs}}=\mathrm{diag}(d_\mathrm{abs})` the diagonal matrix of absolute node
strengths :math:`d_\mathrm{abs}`, and the associated null model is given by
:math:`v_1=v_2=\boldsymbol{0}`, where :math:`\boldsymbol{0}` is the vector of zeros.
References:
.. [3] Schaub, M., Delvenne, J.-C., Lambiotte, R., & Barahona, M. (2019). Multiscale
dynamical embeddings of complex networks. Physical Review E, 99(6), 062308.
"""

def prepare(self, **kwargs):
Expand Down Expand Up @@ -337,7 +347,8 @@ class constructor_directed(Constructor):
where :math:`I` denotes the identity matrix, :math:`M(\alpha)` is the transition matrix of a
random walk with teleportation and damping factor :math:`0\le \alpha < 1`, and
:math:`\Pi=\mathrm{diag}(\pi)` for the associated null model :math:`v_1=v_2=\pi` given by the
eigenvector solving :math:`\pi M(\alpha) = \pi`, which is related to PageRank.
eigenvector solving :math:`\pi M(\alpha) = \pi`, which is related to PageRank. See [1]_ for
details.
The transition matrix :math:`M(\alpha)` is given by
Expand All @@ -354,7 +365,9 @@ class constructor_directed(Constructor):
@_limit_numpy
def prepare(self, **kwargs):
"""Prepare the constructor with non-scale dependent computations."""
assert self.exp_comp_mode == "expm"
assert (
self.exp_comp_mode == "expm"
), 'exp_comp_mode="expm" is required for "constructor_directed"'

alpha = kwargs.get("alpha", 0.8)
n_nodes = self.graph.shape[0]
Expand Down
25 changes: 13 additions & 12 deletions src/pygenstability/optimal_scales.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,12 @@ def _pool2d_nvi(A, kernel_size, stride, padding=0):
)
shape_w = (output_shape[0], output_shape[1], kernel_size, kernel_size)
# pylint: disable=unsubscriptable-object
strides_w = (stride * A.strides[0], stride * A.strides[1], A.strides[0], A.strides[1])
strides_w = (
stride * A.strides[0],
stride * A.strides[1],
A.strides[0],
A.strides[1],
)
A_w = as_strided(A, shape_w, strides_w)

# Return the result of pooling
Expand All @@ -57,7 +62,7 @@ def identify_optimal_scales(results, kernel_size=3, window_size=3, max_nvi=1, ba
basin_radius (int): radius of basin around local minima of the pooled diagonal
Returns:
result dictionary with two new keys: 'selected_partitions' and 'block_detection_curve'
result dictionary with two new keys: 'selected_partitions' and 'block_nvi'
References:
.. [1] D. J. Schindler, J. Clarke, and M. Barahona, 'Multiscale Mobility Patterns and
Expand All @@ -74,33 +79,29 @@ def identify_optimal_scales(results, kernel_size=3, window_size=3, max_nvi=1, ba
diagonal = np.diag(nvi_tt_pooled)[: len(nvi_t)]

# smooth diagonal with moving window
block_detection_curve = np.roll(
block_nvi = np.roll(
np.asarray(pd.Series(diagonal).rolling(window=window_size, win_type="triang").mean()),
-int(window_size / 2),
)
results["block_detection_curve"] = block_detection_curve
results["block_nvi"] = block_nvi

# find local minima on diagonal of pooled NVI(s,s')
basin_centers, _ = find_peaks(-block_detection_curve, height=-max_nvi)
basin_centers, _ = find_peaks(-block_nvi, height=-max_nvi)

# add robust scales located in large 0 margins
not_nan_ind = np.argwhere(~np.isnan(block_detection_curve)).flatten()
not_nan_ind = np.argwhere(~np.isnan(block_nvi)).flatten()

if (
np.count_nonzero(
np.around(
block_detection_curve[not_nan_ind[0] : not_nan_ind[0] + 2 * basin_radius + 1], 5
)
np.around(block_nvi[not_nan_ind[0] : not_nan_ind[0] + 2 * basin_radius + 1], 5)
)
== 0
):
basin_centers = np.insert(basin_centers, 0, not_nan_ind[0] + basin_radius)

if (
np.count_nonzero(
np.around(
block_detection_curve[not_nan_ind[-1] - 2 * basin_radius : not_nan_ind[-1] + 1], 5
)
np.around(block_nvi[not_nan_ind[-1] - 2 * basin_radius : not_nan_ind[-1] + 1], 5)
)
== 0
):
Expand Down
28 changes: 21 additions & 7 deletions src/pygenstability/plotting.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@

def plot_scan(
all_results,
figsize=(6, 5),
scale_axis=True,
figure_name="scan_results.pdf",
use_plotly=False,
Expand All @@ -41,6 +42,7 @@ def plot_scan(
Args:
all_results (dict): results of pygenstability scan
figsize (tuple): matplotlib figure size
scale_axis (bool): display scale of scale index on scale axis
figure_name (str): name of matplotlib figure
use_plotly (bool): use matplotlib or plotly backend
Expand All @@ -54,7 +56,9 @@ def plot_scan(

if use_plotly:
return plot_scan_plotly(all_results, live=live, filename=plotly_filename)
return plot_scan_plt(all_results, scale_axis=scale_axis, figure_name=figure_name)
return plot_scan_plt(
all_results, figsize=figsize, scale_axis=scale_axis, figure_name=figure_name
)


def plot_scan_plotly( # pylint: disable=too-many-branches,too-many-statements,too-many-locals
Expand Down Expand Up @@ -263,15 +267,24 @@ def plot_optimal_partitions(

for optimal_scale_id in selected_scales:
plot_single_partition(
graph, all_results, optimal_scale_id, edge_color=edge_color, edge_width=edge_width
graph,
all_results,
optimal_scale_id,
edge_color=edge_color,
edge_width=edge_width,
)
plt.savefig(f"{folder}/scale_{optimal_scale_id}{ext}", bbox_inches="tight")
if show: # pragma: no cover
plt.show()


def plot_communities(
graph, all_results, folder="communities", edge_color="0.5", edge_width=0.5, ext=".pdf"
graph,
all_results,
folder="communities",
edge_color="0.5",
edge_width=0.5,
ext=".pdf",
):
"""Plot the community structures at each scale in a folder.
Expand Down Expand Up @@ -394,15 +407,15 @@ def _plot_optimal_scales(all_results, ax, scales, ax1, ax2):
"""Plot stability."""
ax.plot(
scales,
all_results["block_detection_curve"],
all_results["block_nvi"],
"-",
lw=2.0,
c="C4",
label="Block NVI",
)
ax.plot(
scales[all_results["selected_partitions"]],
all_results["block_detection_curve"][all_results["selected_partitions"]],
all_results["block_nvi"][all_results["selected_partitions"]],
"o",
lw=2.0,
c="C4",
Expand All @@ -420,9 +433,10 @@ def _plot_optimal_scales(all_results, ax, scales, ax1, ax2):
ax2.axvline(scale, ls="--", color="C4")


def plot_scan_plt(all_results, scale_axis=True, figure_name="scan_results.svg"):
def plot_scan_plt(all_results, figsize=(6, 5), scale_axis=True, figure_name="scan_results.svg"):
"""Plot results of pygenstability with matplotlib."""
scales = _get_scales(all_results, scale_axis=scale_axis)
plt.figure(figsize=figsize)
gs = gridspec.GridSpec(3, 1, height_ratios=[0.5, 1.0, 0.5])
gs.update(hspace=0)
axes = []
Expand Down Expand Up @@ -456,7 +470,7 @@ def plot_scan_plt(all_results, scale_axis=True, figure_name="scan_results.svg"):
_plot_number_comm(all_results, ax=ax3, scales=scales)
axes.append(ax3)

if "block_detection_curve" in all_results:
if "block_nvi" in all_results:
ax4 = plt.subplot(gs[2, 0])
_plot_optimal_scales(all_results, ax=ax4, scales=scales, ax1=ax1, ax2=ax2)
axes.append(ax4)
Expand Down
Loading

0 comments on commit 6614d19

Please sign in to comment.