-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Data stored in the Neo4j database #23
Comments
Hi Petr, First of all, thanks for using SylvaDB and the kind words about it. This kind of messages is why we write software :D You are completely right in your point. The main reason for using numeric ids instead of the Neo4j supported labels or types is that SylvaDB has been around way before than Neo4j 2.0, when they introduced labels and Cypher became usable. So we needed to figure out a good way to make SylvaDB feature complete and future proof. Back then, we felt that node types was a feature we needed and as such we decided to store the SylvaDB schema types ids as a reserved property on the nodes. To be consistent, we did the same thing for relationships and ignored the Neo4j relationship types. On the bright side, this design allows us to plug any other graph backend, since all we need is a property-value store in the nodes and relationships, avoiding the need to be tied to Neo4j--the industry changes faster than we can code. Furthermore, it allowed nodes and relationship types to be changed at no cost, while in Neo4j is still a feature that does not exist. Unfortunately, as a side effect, this made practically impossible to use the Neo4j backend without the SylvaDB frontend. That being said, we understand that the expressive power of our query builder is still pretty limited. We are thinking on ways to integrate algorithms, like In summary, I see three choices:
Well, as you see, every option has advantages and drawbacks, and it is hard to make decisions that we will not regret in the future :) We believe that option 2 is the most reasonable one in terms of time needed to be implemented. But option 3 will make a nice feature to have, even more taking into account the future SylvaDB API that we are planning. Please, let me know your thoughts. |
Thank you for the insight, Javier.
Let me suggest another option: Summary of what I think:
It is my pleasure to discuss this with you, Javier. Petr. PS: Just a note regarding the Neo4j version: now I see why you are still at 1.x... you don't need the node labels, so why to migrate, right? |
Sure. Thank you, Petr, for the healthy discussion 👍
So I think that our approach will be 2 first, and then 3. I will keep thinking on ways to improve this. Maybe we could add an advanced option to synchronize SylvaDB types and allowed relationships with Neo4j labels and types. That will take some time, but if the user is who initiates the action, he knows what is doing. This synchronization process could be executed at any time, but there would be a warning about how much time this could take. It would execute in a task and make the specific graph unavailable while the transaction is being executed. How that sounds? |
Just a quick comment on point 4: But you are now proposing something much better: this 'user-initiated on-demand syncing' of labels and relationship types is actually one of the best solutions of the problem (at least from my point of view). This will provide better query performance and easier cypher query building. A precondition is a 2.x Neo4j version, indeed. As a matter of fact - I just wanted to make my problem clear and I feel you entirely grasped my thoughts, immediately. All my subsequent posts are how to make it rather than what I need. This is your home and I am sure you will come with an optimal solution. Should you need me for testing or anything else, let me know. I'm ready to help you. Thank you for listening. |
Thanks Petr. I'm still thinking on ways to implement this. While my last proposal sounds like the best approach in your case, unfortunately there is some nuisances. We can sync current SylvaDB types and allowed relationships to Neo4j labels and types. But the problem is what happens when there are two or more graphs, from the same or different users, using the same name for a type. We keep track of those cases in the schema by creating a unique slug per type. Let's say then that in graph A the type However, if you are still OK with using slugs, although a bit more verbose, we can proceed and plan the feature :) |
Oh, I see this obstacle... Well, if I consider all of this - it seems to me that the point No.4 (using special named properties) is the easiest solution to implement, now. No.5 (syncing labels and types into running database) is also perfectly valid. But the cost is higher: There is another point to mention: the query performace will be probably the same for both the no.4 and no.5, because for every node and for every relationship a `_graph' property has to be consulted, which significantly slows the process. I'm not sure if indexing can be any help. I still hold the view that No.4 will be sufficient and easy. And if you combine it with an option to dump the database in the 2.x version format (i.e. generating labels for nodes and relationships based on real user data), it is perfect. No.5 is OK, too. But uniqueness should be achieved as mentioned above to allow for stripping. Thank you Javier to paying so much attention to this issue. I'm just afraid that my highly demanding comments and opinions may lead to putting you off this issue. At the same time I don't want to push you somewhere you don't want to go. Should you feel unconfortable with any proposal, just go your way. I know you will come with a great solution just based on the knowledge of my needs (as you did with SylvaDB ever before). |
You made a good point. Maybe instead of using slugs, just suffixing the type with the internal schema id is enough. I'll think about it. Before labels existed in Neo4j, what now are legacy indices were the only way to speed up queries. Therefore adding that information, in a I will leave this thread open and discuss it with the team. Probably a implementation of 4 would be our first approach. But you know, it is not one of the priorities right now, so sorry in advance for delays in delivering the feature. And thank you very much for your insights. It is only with real users input that we can build a cool platform. |
Hi,
SylvaDB is great system. It fulfills my needs almoust perfectly. Nevertheless, I have some issues...
You describe SylvaDB on GitHub as "... a Relaxed-Schema Graph Database Management System." The main problem with this definition is that there is a difference between data stored inside the SylvaDB system and data stored in Neo4j database. I'll try to explain my issue using an example:
In the 'Schema' I create a new 'Type' with the name 'Person' and in the 'Properties' section I fill in a 'Key' called 'Full name'. In the same way I add a new type 'Movie' with a key 'Title'. Next I create new 'Allowed Relationship' using 'Person' as 'Source', 'Movie' as 'Destination' and 'ACTS_IN' as the 'Name' of the relationship.
Then I put some data into the SylvaDB. First I add 'New Person' with a key 'Full name' holding the value 'Keanu Reeves'. Then I use 'New Movie' to create a node with the key 'Title' and the value 'Matrix'. At the same time I fill in the field '<- ACTS_IN' by searching for the value 'Keanu Reeves'.
I made this detailed description just to show you what I would expect to be created in the Neo4j database (following your statement that SylvaDB is a GraphDB Management System):
(even if your Neo4j version does not support labels, you are using the '_label' key toto mimic this function, which is fine).
If I look into the Neo4j database with Cypher, I get the following data:
Node[3]{_id:3,_label:"8",_graph:"2",Full name:"Keanu Reeves"}
Node[4]{_id:4,_label:"9",_graph:"2",Title:"Matrix"}
:8[1] {_id:1,_label:"8",_graph:"2"}
The first problem here is the substitution of node '_label' value by some SylvaDB internal IDs. The second problem is the substitution of relationship type by another internal ID. I'm afraid I don't know the logic of assigning those IDs and even if I knew it, it would be really hard for a human to translate those IDs into some meaningful queries and results.
The need for internal IDs within Neo4j database is obvious - SylvaDB works with objects and those objects should be identified somehow. But why those IDs replace some important values and/or types? From my point of view - internal IDs should be an addition to the data, not a replacement. If you intention was to make user changes simple (like changing the relationship type name, which is quite compliated within Neo4j - but not impossible), the cost is too high. Data stored in the Neo4j are useless without the SylvaDB frontend. This comes to surface as soon as the 'Queries' interface is not sufficient for certain type of queries (like getting the shortest path between nodes etc).
I can imagine how complicated a rework of the system could be to allow for real values / types. Nevertheless, will you consider to change the system accordingly? There is no better system to work with the Neo4j graph database than SylvaDB. It is simple, intuitive, flexible, powerful and user-friendly. And can be even better...
Thanks for your opinion.
Petr
The text was updated successfully, but these errors were encountered: