Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Standoff property value retrieving performance #1578

Open
gfoo opened this issue Jan 21, 2020 · 12 comments
Open

Standoff property value retrieving performance #1578

gfoo opened this issue Jan 21, 2020 · 12 comments

Comments

@gfoo
Copy link

gfoo commented Jan 21, 2020

We have in our project huge transcriptions based on our own standoff mapping and we have poor performances to retrieve the value of this property. I mainly use gravsearch to retrieve data, but even with the v2/resources we have poor perfs.
We are talking here about 20 or 30 seconds to retrieve this resource.

If needed @loicjaouen will provide to you our Mem/CPU stack config

I'm going to prepare a test case to let you try to reproduce this perf problem on your side.

@gfoo gfoo changed the title Standoff property retrieving performance Standoff property value retrieving performance Jan 21, 2020
@benjamingeer
Copy link

Have you read https://discuss.dasch.swiss/t/large-texts-and-xml-databases/134 ?

You have two options:

  1. Break your text into smaller pieces, instead of storing a huge text in a single TextValue.
  2. Wait until Knora supports storing text in an XML database.

@benjamingeer
Copy link

I suggested the same thing to you last April:

#1293 (comment)

@gfoo
Copy link
Author

gfoo commented Jan 22, 2020

Have you read https://discuss.dasch.swiss/t/large-texts-and-xml-databases/134 ?

no, sorry, no more enough motivations and time to follow your next devs, I just try to find solutions with the existing Knora :)

I suggested the same thing to you last April:

yep, I remember, with @mrivoal we thought about that, but not so easy for us to automatically split our user's data during the migration process from their mysql db into Knora. And anyway, at the end of the day, they probably won't want to split their data :|

Just have a look to their job : http://lumieres.unil.ch/fiches/trans/1088/ , in the edit mode, you need an account for that, they use ckeditor which produces a kind of pseudo-html, we provided a standoff mapping and it works very well, it's a shame that (probably) just for few transcriptions we have this kind of low perfs :(

@gfoo
Copy link
Author

gfoo commented Jan 22, 2020

The test case, if you want to reproduce : PerfTrans.zip

@gfoo
Copy link
Author

gfoo commented Jan 22, 2020

@mrivoal The only solution I see right now is to ask them to split their existing transcriptions in their database before our final migration.

@benjamingeer the save process is also very slow, it is not a problem for our migration process but probably a problem in our web app client if the end user have to wait more than 30 sec to save smthing... they didn't give us feedbacks about that but they probably will in a near future !

@benjamingeer
Copy link

the save process is also very slow

If you can split the text into smaller pieces, both saving and loading will be faster.

@mrivoal
Copy link

mrivoal commented Jan 22, 2020

Yes, the modeling solution, as usual.
However, artificially splitting long editions that users can easily deal with with other tools (existDB) is not an acceptable solution (this is already the feedback we have from another of our edition projects).

Then I guess, for the long run, Knora will have to store long texts in XML databases.

@benjamingeer
Copy link

However, artificially splitting long editions that users can easily deal with with other tools (existDB) is not an acceptable solution (this is already the feedback we have from another of our edition projects).

It's a trade-off. If you can store texts in small enough pieces (1000 words is a good size if you have a lot of markup), you can store them as RDF, and get functionality that you wouldn't get by storing the text in eXist-db, like "find me a text that mentions a person who was born after 1720 and who was a student of Euler". (Maybe you could do that in eXist-db if you were willing to store all your data as XML.)

Otherwise, you can store the text in eXist-db: storage and retrieval will be faster, and some queries will be faster, but you will lose some search capabilities.

I think the best we can do is offer both options, and let each project decide which is best for them.

@mrivoal
Copy link

mrivoal commented Jan 22, 2020

What do you consider will be "a lot of markup"?

@benjamingeer
Copy link

What do you consider will be "a lot of markup"?

In the test I did, nearly every word had a tag. The more markup you have, the more triples have to be retrieved, and the slower it's going to be. If you have a big text with very little markup, GraphDB can still retrieve it pretty quickly.

@mrivoal
Copy link

mrivoal commented Jan 22, 2020

Ok, thanks.

@benjamingeer
Copy link

Just have a look to their job : http://lumieres.unil.ch/fiches/trans/1088/

That text has chapters. Why not store one chapter per resource? That would also make navigation and editing a lot easier. Do you really want to scroll through that much text on one HTML page?

@subotic subotic added this to the Backlog milestone Feb 7, 2020
@irinaschubert irinaschubert removed this from the Backlog milestone Dec 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants