-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory issue for large directed graphs #69
Comments
Hello! But, @d-schindler just implemented (#68) another variant for computing the matrix exponential with spectral decomposition. As it is done now, it will not work either, but we could try something with sparse matrices and only compute the largest eigenvalues to approximate the matrix exponential. You really want to use matrix exponentials for such a large graph? Did you try 'simple' louvain our leiden with modularity? It will not be so easy to scale the code to such large networks, so we need to see if it is worth before jumping into coding this. |
Hello @arnaudon , Thanks a lot for your answer. I could try the simplest version of the model. Basically you are suggesting to use the default run option:
Which use a linearized constructor and louvain method. I'am basically try to run the following command now:
Which i believe correspond to your suggestion. |
I'm not expert in directed networks, so I'm not sure how modularity works in this case. For weights, it's perfectly ok. Probably there are some extensions/adaptaions of modularity for directed networks, we can implement one if you want to use it. Also, constructor='linearized' corresponds to modularity with a scaling factor in the null model. |
I understand. You use the genealied Modularity. So the only point is whether it also works for directed network or needs some modifications. |
Hello @MatteoSerafino, our current implementation of linearized Markov Stability is aimed for undirected graphs. It can be extended to directed graphs though and we will try and implement a "linearized directed" constructor ASAP. |
Hello @d-schindler, |
@MatteoSerafino could you try the code dominik just implemented, to see if it does something reasonable for you? |
Hello @arnaudon, As graphs, I generated some modular directed graph with I made different tests with different combinations of However, for large network still give an memory error, even with the linearized directed version. The problem is here:
|
Ok, I see. If we use Google teleportation, we necessarily obtain a dense matrix and this leads to memory issues in large graphs. So probably we should turn off teleportation (i.e. set alpha=1) for linearized_directed. |
Could you try again now? I set alpha=1 as default. It means that the constructor only works for strongly connected graphs. |
I got the latest version, where you also fixed the following:
Which was causing a memory error. It seems is working fine. I will let it run as follows:
and see it reaches the end. |
Hello @MatteoSerafino, great to hear it's running now. Yes, I had to use scipy.sparse consistently to make it work. It would be great if you could let us know whether it works fine, once the run is complete. |
Hello @d-schindler,
I got four optional partitions, and all of them have at least 49749 communities. So it seems that if we input a weakly connected graph, the alghortims fails. What do you think? |
Great to hear! If this is an issue with the optimal scale selection, please try to change the parameters for scale selection, e.g. change kernel size or window size.
Also, you will probably need to increase the resolution of the MS analyis to get decent results, i.e. increase n_scales. If you're partitions are too large, you need to decrease min_scale. It's a bit of trial and error. |
Hello @d-schindler, |
With We will merge #70 soon, do you think this suffices to closes this issue here @MatteoSerafino ? |
@d-schindler do you think we can try to implement sparse with alpha<1? |
@d-schindler, Yes, I think it does. I believe that if you could implement the linearized_directed constructor with |
yes, let's try to make it work with sparse matrices all the way, I'll have a look at it tonight |
@arnaudon , I actually don't think it is theoretically possible to use One solution might be to apply teleportation only to those nodes that are outside of the LSCC, but this has never been tested before I think. And @MatteoSerafino , |
ah yes, you are right, it will be hard to do, but maybe there is some linear algebra trick |
Perhaps I am missing something here, but I don't think there should be a problem with having a "directed linearized" version that works for large graphs. In terms of the constructor: Such a linear operator can be used to compute eigenvectors as well, and thus you can get the stationary distribution / null model of the Markov stability. The second aspect is that Louvain/Leiden might not like to be given a LinearOperator as input; however as teleportation allows one to connect to any neighbor, this information does not really have to be encoded as a graph, but can also be passed on in other terms. Due to the low-rank structure some parts of this can probably also be pulled into the null model term (though I have not done the calculations here), as the dominant updates of community structure should follow the "actual links" most of the time. |
Ah, here is the linear algebra trick, thanks a lot, Michael! This definitely sounds like a possibility. I was also wondering why we cannot do it, as this is basically pagerank, that works on pretty large graphs. Let's give it a try asap! |
Hi all, I have been following this discussion because I was the one that suggested using PyGenStabillity to @MatteoSerafino and I am very interested in this code. I had to deal with a similar issue in my code for flow stability. I did what @michaelschaub mentioned. My issue was not because of page rank teleportation but for dealing with covariance matrices that have a sparse part and a dense low-rank part (the null model). As @michaelschaub explained, the covariance for the PageRank teleportation can probably be also written like that. Maybe this is useful for you to have a look. I created a new class for covariance matrices that stores things in sparse matrices internally and implemented the methods I needed in Cython (for the Louvain algo). Anyway, I'm glad that there is an effort to implement Markov Stability and all its variants in Python 👍 |
Really cool! Yes, implementing this is on the table, but we need to find the time to have a proper try at it. Hopefully soon! |
Hello,
I'am trying to use the package on a big network (around 170000 nodes).
However, when I call the contractor:
defining the constructor externally
directed_constructor = constructors.load_constructor('directed', adjacency, alpha=0.85)
I get the following error:
Unable to allocate 241. GiB for an array with shape (179783, 179783) and data type float6
Is there a way to avoid it?
Moreover, could i give as input a weighted adj matrix?
Thanks a lot
The text was updated successfully, but these errors were encountered: