Performance #216
Replies: 1 comment 1 reply
-
I notice the performance is what I expect, if I don't use the Without With the That's 158x longer. Edit: Experimenting further, I think the issue is really more about the order of the columns in the relation. Changing the order to put the attributes first (which tend to be fixed), seems to help some of the queries, even with the Edit2: I see the bit about indices here: https://docs.cozodb.org/en/latest/stored.html#indices - it does still double the storage requirements, which means this isn't really an "index" as much as it is a re-arranged copy of the entire dataset (a book's index isn't as long as the whole book, it's just a list of pointers to the original data). Still, a handy feature. I also think I was mistaken extrapolating that increasing my db size by 100x might make query times too long. The attribute names in my test db is far less diverse than the prod db will be, so specifying an attribute in the prod db will narrow the candidates a lot more. So I will proceed with more involved tests! |
Beta Was this translation helpful? Give feedback.
-
Hi All,
What an awesome project! I'm considering using it in my application so I'm testing it out.
What I'm considering is a graph database of attributes where there are no real "entities", just shared attributes. So, to that end the attributes relation is just
[a_attr, a_val, b_attr, b_val]
and that forms the edge between two attribute/value pairs.I downloaded a freely available db of movies to test with, it contains about 70k records, each with about 10 attributes, so the relation has about 700k rows. I modeled this particular data with all the other attributes (cast, director, release_year, etc) linked to the title of the movie. I cleaned the data a bit, imported into cozo and started querying. It works great, but it's quite a bit slower than I expected. Here's an example query:
Movies directed by Steven Spielberg
The query takes about 300ms, which is disappointing because the database I'm planning to use with my app is around 100x larger and presumably would take way too long to query.
This is on a new laptop with a good CPU and SSD, using the Sqlite backend.
I was expecting that there's an index of all the rows by attribute, and another by value, which should make lookups like this quite fast. (look up v=Spielberg and ov=Spielberg, there's only 11 of these and that's the final result. Of course it needs to be filtered to only return those rows where Spielberg is the director, (and not some other attribute) but that should be very fast.
What can I do to optimize the performance? Is there a way to find out what the bottleneck is here?
Beta Was this translation helpful? Give feedback.
All reactions