-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow using sqlite's OPFS_SAH backend #39
Comments
SQLocal does not use that backend because of the drawbacks it has (summarized here). However, I think for large batches of inserts, the bigger bottleneck is serializing all of the data between the main thread and the web worker that the database runs in. SQLocal is mainly for interacting with the database from the main thread and aims to abstract away the worker, but I would be open to making a utility of some sort that makes it easier to run bulk inserts directly in the worker to avoid the serialization overhead. Do you think that would help in your use case? |
I looked at the timings using a profiler and 99% of the time is spent in the sqlite wasm. So I don't think the serialisation overhead is the issue here. Note that I do agree that the default should probably remain the regular OPFS VFS, but for applications where multiple tabs/downloading the sqlite file is not a requirement, it could be nice to have an option to use the SAH backend. |
Interesting. Can you give some more details on what your use case is? How much data are you inserting? Is it a one-time setup or a recurring process? Why is multi-tab support not needed? Mind sharing what your app is used for? |
It's a music app where a lot of metadata is loaded beforehand (to allow offline browsing). Since it also supports playback, it doesn't really make sense to have multiple tabs open. About 100MB of data is synced, one or two times per day usually. |
What kind of performance are you seeing when you do that sync? How long do the inserts take in total? I'm looking at the benchmarks for the SAH VFS versus the original OPFS VFS, and it appears that INSERT performance is virtually identical between them. The main area where they differ is actually that SELECTs are much faster on the SAH VFS, so I'm not sure switching to SAH would help in your use case. |
I use upserts to add the data, which does make it take a long time, since it includes selections under the hood. It goes from a few hundred milliseconds on the SAH backend to almost a minute on the regular one. |
Thanks, that's useful to know. I'll have to investigate the SAH VFS a bit more to see how viable it is. I did some experimentation with it, but I kept running into it throwing disk I/O errors intermittently. In the mean time, any further insights you have would be helpful, and PRs are of course welcome too. |
If I remember correctly, when using the I'm guessing that the The project I'm working on is using sqlite-wasm with the We did run into an issue on initialization of the sqlite database where the browser would report that the file was locked even though it wasn't, but in that case the solution was to retry the request. More info in this issue (sqlite/sqlite-wasm#79). This all being said, while I/O issues haven't been a problem there have been other challenges (basically entirely with Safari--god I hate Safari 😡). E.g. when using the SAH approach you need to elect a leader tab to house the sqlite instance. We use the web locks API to elect a leader tab. Occasionally we've been hit with a browser bug (basically, a Safari-only bug though technically it's happened elsewhere) where the browser fails to release a web lock when the tab is destroyed. This can cause a new leader to fail to be elected. Safari seems to have gotten better about this behavior though as I haven't run into it recently. A bigger problem is that, on iOS, Safari will aggressively suspend background tabs. When the background tab is the leader, that's a problem. Sending the leader a The reason we chose the SAH approach is because, from what I've read, the non-SAH approach basically becomes unusable because of transaction contention when there are too many tabs open. We figured, if we need to control concurrent access to the database because of transaction contention, we might as well go with the SAH approach since it's the most performant. Has transaction contention not been a problem for SQLocal @DallasHoff? |
That's a lot of good insights. Thank you, @jorroll! I just did the same test I did before with the SAH VFS, and I'm not seeing the I/O errors anymore. That could be due to a fix made between then and now to sqlite-wasm or the changes to SQLocal itself, which now has its own locking mechanism so that if a transaction is attempted while another transaction is in progress on the same database, the second transaction will wait until the first is done. I've heard of that approach to using the SAH VFS of electing a leader tab using Web Locks. My initial idea for using the SAH VFS was to have sqlite-wasm run in a
All of these issues are fixable by the respective parties, but I do not expect that any time soon, so it seems like the "leader tab" approach is the only viable one, even though it's not as robust as any of us would prefer. An idea I had to mitigate the issues with leader tabs being put to sleep or Web Locks not getting released is to have the leader tab periodically send out a "heartbeat" message on a I'd like to work on supporting more VFS's with SQLocal soon because having every SQLocal function work the same no matter which VFS you use and provide that abstraction makes it really easy for users of the library to switch between VFS's or fall back as needed. The first step of this was making in-memory databases support SQLocal's full feature set, and I'll be releasing those changes very soon. After that, I'll come back to investigating the other VFS's again, especially the SAH VFS. |
Hello! I put together a very basic PR to just allow the configuration to use SAH VFS, since I have similar needs as @jorroll, and am going to implement a leader tab/locking mechanism as in https://www.notion.com/blog/how-we-sped-up-notion-in-the-browser-with-wasm-sqlite. Currently I'm yarn patching this library to achieve this functionality but I'd love the option to just use the main branch here. |
@DallasHoff the heartbeat idea is very interesting and makes a lot of sense to me also! |
While I'm not sure, I suspect this will not work @DallasHoff. If the leader still exists but is unresponsive, it might (probably does) still have a lock on the database file. I'm not sure if there's currently any way to "steal" an OPFS file lock. I.e. you might find that, while you can detect that the leader is unresponsive, you still cannot elect a new leader because the old leader still has a lock on the database file. Mind, I don't know if this is the case, but it's a potential problem. It's also worth stating that, in practice, the problem of the leader being suspended is only a problem we've seen in Safari and (to my memory) only a problem on iOS. Our solution to this problem is to only use a persisted sqlite database on mobile if our app is running as a progressive web app (PWA). If our app is running as a PWA, then we know there is only ever a single tab and we know that the current tab is the leader tab. It allows us to avoid electing a leader and avoid using a shared worker. Because SharedWorkers aren't supported on Android, we also apply the same restriction to Android (we only enable persistence in a PWA on Android). In practice, only enabling persistence in the PWA on mobile has proven to be a reasonable restriction for our users. When someone logs into our app on mobile, we warn them that persistence isn't supported unless they install the app and we invite them to install the PWA version on their device (and we provide instructions for doing so via https://github.com/khmyznikov/pwa-install).
Browsers (well, Chrome) are currently exploring APIs for allowing concurrent access to a SQLite database from multiple tabs. I.e. you could acquire a sync access handle to the sqlite database file without locking the file for other threads. I think there's a blog post on web.dev that explores this option as well as an open issue in one of the standards repos somewhere. This would allow each tab to create it's own dedicated worker that connects to sqlite using the
Worth noting that, for web apps, in practice if you're using persisted sqlite you also need to have an in-memory database. This is because you're (probably) going to want to apply optimistic updates and have them rendered in the app in less than 16ms for achieving 60fps. For a browser based web application I think this effectively requires maintaining an in-memory database in addition to a persisted database. E.g. hypothetically you could use a wasm build of sqlite (running in a worker) as the source of truth for you app and that data would be persisted. But while sqlite itself might be able to resolve queries fast enough to render a frame in less than 16ms, sending the query to the worker via postMessage and getting a response on the main thread via postMessage can't be guaranteed within any time frame. In practice I've found that postMessage's across threads in the browser can take surprisingly long to resolve (e.g. 100s of ms or longer). My perception is that the slowdown isn't due to sqlite, but is instead just how long the browser can take to postMessage across threads (if you're sending 100s or 1000s of postMessages within a second, the browser queues them and resolves them one by one and the last one can be resolved 100s of ms after the first). The solution that our app uses is to have an in-memory sqlite database on the main thread and then a persisted sqlite database hosted by a leader tab. Both these databases have the same schema. The in-memory db is the "source of truth" for rendering the app on the main thread. When resolving a query we typically synchronously serve the results from the in-memory db but also send the query to the persisted db async. When the persisted query resolves we load the results into the tab's in-memory db and rerender data as appropriate. If we send a query to the server, we load the results both into the tab's in-memory db and also send those results to the persisted db. While we don't need to use in-memory sqlite as the synchronous datastore on the main thread, it's an attractive option since we're already using persisted sqlite and we can reuse the query logic in both places. |
Thanks for the great library! I'm running into some insert performance degradation similar to this issue: sqlite/sqlite-wasm#61
One of the things mentioned to try there is to use the SAH backend. This backend does not allow for concurrency, but that is not really required for my use case, so it would be nice to use that backend.
The text was updated successfully, but these errors were encountered: