GDS - RandomWalk - Unable to load NODE #337

Mintactus · 2024-11-20T05:41:03Z

Neo4j 5.25.1
GDS 2.12
GDS Python Client 1.12

The randomWalk algo doesn't load my sourceNode, details below:

My in memory GDS graph has been build from a pandas DataFrame using the construct method of gds, so it doesn't exists and will not exists on disk, its intended for an in memory analysis only.

Here is the content of the in memory extracted from gds.graph.nodeProperty.stream

             nodeId  propertyValue nodeLabels
0 6335695024714629015 -0.00003 
1 531768015437695177 0.00009 
2 3558886278460545694 -0.00012 
3 7960371801618416072 -0.00006 
4 688712822280937494 0.00009 
5 6445645390101772454 0.00000 
6 4640442843099832304 -0.00006 
7 6026970582286088324 0.00006 
8 5356341080109221825 0.00003 
9 1843909622001289035 0.00006 
10 5984421542275516993 -0.00009 
11 1113611838033320553 -0.00003 
12 4162479979561917907 0.00003

When trying to run randomWalk

    sourceNode = self.markov_chain_nodes['nodeId'].last() <- This output an signed int64
    random_walk_config = {
        'sourceNodes': [sourceNode],
        'walkLength': FUTURE_SIZE,
        'walksPerNode': 1,
        'relationshipWeightProperty': 'transition probability',
        'concurrency': 4
    }
    future = self.gds.randomWalk.stream(self.graph, **random_walk_config)

I got this error, {message: Failed to invoke procedure gds.randomWalk.stream: Caused by: org.neo4j.internal.kernel.api.exceptions.EntityNotFoundException: Unable to load NODE 4162479979561917907.}.

But the node id 4162479979561917907 clearly exist in the in memory graph

I read that I'm suppose to use gds.find_node_id to match the sourceNode, but this is an in memory graph only and doesn't need to become an on-disk graph. Having to create an on disk graph just to make it work doesn't make any sens to me.

This might also be considered as a feature request then...

Thanks for your support :)

The text was updated successfully, but these errors were encountered:

IoannisPanagiotas · 2024-11-21T12:51:28Z

Hi @Mintactus ,

I have looked into your issue. I can verify there is a bug when working with graphs not backed by a database for randomwalk. We have applied a fix which should be out in the next gds release, but I am not sure when that is going to be.

In the meantime, as a workaround, I would suggest the following

Instead of running randomWalk on the gds python client, you can run with the neo4j python client and call a cypher query directly. There are instructions on https://neo4j.com/docs/python-manual/current/ for how to do this.

The Cypher query that you need is the following, where X is
sourceNode = self.markov_chain_nodes['nodeId'].last()

 CALL gds.randomWalk.stream(
  'myGraph',
  {
    sourceNodes: X,
    walkLength: 3,
    walksPerNode: 1,
    randomSeed: 42,
    concurrency: 1
  }
)
YIELD nodeIds

I believe that execute_query in the page I shared should work.

This should work as it avoids doing the faulty computation. Let us know if you need any help in running that query.

FlorentinD · 2024-11-21T13:06:00Z

you also can still use the GDS client -

gds.run_cypher("""CALL gds.randomWalk.stream(
  'myGraph',
  {
    sourceNodes: X,
    walkLength: 3,
    walksPerNode: 1,
    randomSeed: 42,
    concurrency: 1
  }
)
YIELD nodeIds
""")

Mintactus · 2024-11-22T05:48:44Z

Thank you guys,

@IoannisPanagiotas
@FlorentinD

I'm glad to know I wasn't crazy, I'have used it for a while and on that one I couldn't explain what i was doing wrong.

Amazing support

Mintactus · 2024-11-22T18:52:34Z

I did some deeper test and investigation,

If I'm right, graph created using the construct method ( graph that do not exists on disk ) will use the nodeId provided in the dataframe as actual nodeIds usable as sourceNodes inside an algo. Which seems to be right based on the picture provided.

As suggested, I tried the above using only the cypher statement inside the browser instead of the GDS Python Client randomWalk method, but still GDS is not able to locate the nodeID. So it seems the problem is not comming from the GDS Python Client but rather GDS itself not being able to locate a nodeID on a not existant on disk graph.

To reproduce the issue, you basically build an in-memoery graph from a dataframe using the construct method , then try to run the randomWalk algo using cypher with any sourceNode in it, it fails.

Unless I missed something in the doc, this behavior obliged the dev to:

-Export it's in-memory graph into a new database ( Because it has to be a new, you can't use the one the gds initiate it's connection with )
-Create a new gds object linked to this new database
-Create a new native in-memory projection from from this new database
-Then run the algo from this new projection

Kind of a huge workaround making the usuge of in-memory graph drasticly less exiting to use.
But thanks for your support, hopefully a patched version will come out soon :)

IoannisPanagiotas · 2024-11-23T20:31:55Z

@Mintactus

Please remove the 'path' from the yields as in the query we shared above!
The bug is contained in that part because it relies on having a neo4j graph. It should run normally after that.

Best.

Mintactus · 2024-11-25T19:45:16Z

Thanks for your support

I will match the ids given by randomWalk with the gds.graph.nodeProperty.stream to kind of find the nodes and their property involded in the walk, but I will be on line to test the new patch, as essentiel path informations can still not be retreive when removing the path from the algo. So at the end for now you still need to recreate an GDS object and create a new projection and database as a complete workaround.

Unless there are something else you want to add, you can, I might close the ticket soon.

Thanks again, I will test the patched version when it's out

Mintactus added the BUG Something isn't working label Nov 20, 2024

IoannisPanagiotas mentioned this issue Nov 21, 2024

GDS 2.12 - Graph Object has no attribute 'name' #338

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GDS - RandomWalk - Unable to load NODE #337

GDS - RandomWalk - Unable to load NODE #337

Mintactus commented Nov 20, 2024

IoannisPanagiotas commented Nov 21, 2024 •

edited

Loading

FlorentinD commented Nov 21, 2024

Mintactus commented Nov 22, 2024

Mintactus commented Nov 22, 2024 •

edited

Loading

IoannisPanagiotas commented Nov 23, 2024 •

edited

Loading

Mintactus commented Nov 25, 2024 •

edited

Loading

GDS - RandomWalk - Unable to load NODE #337

GDS - RandomWalk - Unable to load NODE #337

Comments

Mintactus commented Nov 20, 2024

IoannisPanagiotas commented Nov 21, 2024 • edited Loading

FlorentinD commented Nov 21, 2024

Mintactus commented Nov 22, 2024

Mintactus commented Nov 22, 2024 • edited Loading

IoannisPanagiotas commented Nov 23, 2024 • edited Loading

Mintactus commented Nov 25, 2024 • edited Loading

IoannisPanagiotas commented Nov 21, 2024 •

edited

Loading

Mintactus commented Nov 22, 2024 •

edited

Loading

IoannisPanagiotas commented Nov 23, 2024 •

edited

Loading

Mintactus commented Nov 25, 2024 •

edited

Loading