Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

concurrent async calls of get_or_create lead to duplicate nodes #807

Open
AnikinNN opened this issue May 30, 2024 · 1 comment
Open

concurrent async calls of get_or_create lead to duplicate nodes #807

AnikinNN opened this issue May 30, 2024 · 1 comment

Comments

@AnikinNN
Copy link

Expected Behavior (Mandatory)

At concurrent calling of get_or_create from different places of a program we get the same node which was created once

Actual Behavior (Mandatory)

At concurrent calling of get_or_create from different places of a program we can get different nodes which we consider as duplicates

How to Reproduce the Problem

Run the code in example. Probably you should increase the amount of created nodes if your's hardware better than mine. I wrapped code into for loop because sometimes the bug can not be caught on the first attempt. Looks like race condition.

Simple Example

As a simple alternative to concurrent calls from different places of entire program we can use asyncio.gather

import asyncio

import neomodel.config
from neomodel import AsyncStructuredNode, StringProperty

neomodel.config.DATABASE_URL = 'bolt://neo4j:12345678@neo4j:7687'
name = 'some_name'


class SomeNode(AsyncStructuredNode):
    name = StringProperty(required=True)


async def main():
    for counter in range(100):
        print(f'iteration {counter}')

        # cleanup
        await neomodel.adb.cypher_query("MATCH (n) DETACH DELETE n")

        # create a lot of the same nodes
        # this should return a lot of instances with the same id
        created_nodes = await asyncio.gather(*(
            SomeNode.get_or_create(dict(name=name))
            for _ in range(1000)
        ))

        # check that all instances have the same id
        ids = set(i[0].element_id_property for i in created_nodes)
        assert len(ids) == 1

        # check that there is only one node in the neo4j
        assert len(await SomeNode.nodes) == 1


if __name__ == '__main__':
    asyncio.run(main())

Specifications (Mandatory)

Currently used versions

Versions

-OS: linux, 3.10.14-slim
-Library:5.3.0 [extras]
-Neo4j:5.19.0 community

@AnikinNN AnikinNN changed the title concurrent async calls of get_or_create lead to duplikate nodes concurrent async calls of get_or_create lead to duplicate nodes May 30, 2024
@mariusconjeaud
Copy link
Collaborator

Hmmmm... Yes, it makes sense the this would happen, as it is a kind of race condition. The thing is, get_or_create is intended for batch operations, which should not be expected to avoid such collisions. Batch operations can run parallel, but then the user has to ensure that a given object is not shared across different parallel batches.

So... I don't see a way around this. Do you have an idea ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants