Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow using sqlite's OPFS_SAH backend #39

Open
chvp opened this issue Aug 16, 2024 · 12 comments
Open

Allow using sqlite's OPFS_SAH backend #39

chvp opened this issue Aug 16, 2024 · 12 comments
Labels
enhancement New feature or request

Comments

@chvp
Copy link

chvp commented Aug 16, 2024

Thanks for the great library! I'm running into some insert performance degradation similar to this issue: sqlite/sqlite-wasm#61

One of the things mentioned to try there is to use the SAH backend. This backend does not allow for concurrency, but that is not really required for my use case, so it would be nice to use that backend.

@DallasHoff
Copy link
Owner

SQLocal does not use that backend because of the drawbacks it has (summarized here). However, I think for large batches of inserts, the bigger bottleneck is serializing all of the data between the main thread and the web worker that the database runs in.

SQLocal is mainly for interacting with the database from the main thread and aims to abstract away the worker, but I would be open to making a utility of some sort that makes it easier to run bulk inserts directly in the worker to avoid the serialization overhead. Do you think that would help in your use case?

@chvp
Copy link
Author

chvp commented Aug 30, 2024

I looked at the timings using a profiler and 99% of the time is spent in the sqlite wasm. So I don't think the serialisation overhead is the issue here. Note that I do agree that the default should probably remain the regular OPFS VFS, but for applications where multiple tabs/downloading the sqlite file is not a requirement, it could be nice to have an option to use the SAH backend.

@DallasHoff
Copy link
Owner

Interesting. Can you give some more details on what your use case is? How much data are you inserting? Is it a one-time setup or a recurring process? Why is multi-tab support not needed? Mind sharing what your app is used for?

@chvp
Copy link
Author

chvp commented Sep 2, 2024

It's a music app where a lot of metadata is loaded beforehand (to allow offline browsing). Since it also supports playback, it doesn't really make sense to have multiple tabs open.

About 100MB of data is synced, one or two times per day usually.

@DallasHoff
Copy link
Owner

What kind of performance are you seeing when you do that sync? How long do the inserts take in total? I'm looking at the benchmarks for the SAH VFS versus the original OPFS VFS, and it appears that INSERT performance is virtually identical between them. The main area where they differ is actually that SELECTs are much faster on the SAH VFS, so I'm not sure switching to SAH would help in your use case.

@chvp
Copy link
Author

chvp commented Sep 3, 2024

I use upserts to add the data, which does make it take a long time, since it includes selections under the hood. It goes from a few hundred milliseconds on the SAH backend to almost a minute on the regular one.

@DallasHoff
Copy link
Owner

Thanks, that's useful to know. I'll have to investigate the SAH VFS a bit more to see how viable it is. I did some experimentation with it, but I kept running into it throwing disk I/O errors intermittently. In the mean time, any further insights you have would be helpful, and PRs are of course welcome too.

@DallasHoff DallasHoff added the enhancement New feature or request label Sep 14, 2024
@jorroll
Copy link

jorroll commented Nov 20, 2024

throwing disk I/O errors intermittently

If I remember correctly, when using the oo1.OpfsDb sqlite constructor, you're running one sqlite instance per-tab. Because only one instance can operate on the sqlite database file at a time, the different instances take turns taking out locks on the database file, performing the queries, and then releasing the locks. If one sqlite instance (i.e. tab) has a lock on the database file and then another instance (i.e. tab) attempts to acquire a lock on the database file, the browser raises that as a generic I/O error. Currently sqlite has no way of differentiating between I/O errors because of transaction contention or I/O errors because of other things. For this reason, when using the oo1.OpfsDb constructor, sqlite will retry queries with a delay when it runs into an I/O error (I think it's up to 5 times).

I'm guessing that the poolUtil.OpfsSAHPoolDb constructor encounters the same number of true I/O errors that the oo1.OpfsDb constructor does, but the oo1.OpfsDb is automatically retrying the requests for you (for unrelated reasons as described above) and maybe the requests are working the second time. If this guess is correct, then the fix for the I/O errors you saw with the poolUtil.OpfsSAHPoolDb constructor might just be to catch I/O errors and automatically retry them some number of times @DallasHoff.

The project I'm working on is using sqlite-wasm with the poolUtil.OpfsSAHPoolDb constructor in production. We've definitely run into challenges with it, mostly around leader election and message passing, but I/O errors haven't been a notable problem. I can't actually think of any I/O errors we've seen, but we have seen some database file corruption though it's been a negligible amount so far. I was also able to communicate with one of the Notion developers after they adopted sqlite-wasm, and they reported that the file corruption they're seeing is comparable to what they see with the native nodejs version of sqlite.

We did run into an issue on initialization of the sqlite database where the browser would report that the file was locked even though it wasn't, but in that case the solution was to retry the request. More info in this issue (sqlite/sqlite-wasm#79).

This all being said, while I/O issues haven't been a problem there have been other challenges (basically entirely with Safari--god I hate Safari 😡). E.g. when using the SAH approach you need to elect a leader tab to house the sqlite instance. We use the web locks API to elect a leader tab. Occasionally we've been hit with a browser bug (basically, a Safari-only bug though technically it's happened elsewhere) where the browser fails to release a web lock when the tab is destroyed. This can cause a new leader to fail to be elected. Safari seems to have gotten better about this behavior though as I haven't run into it recently.

A bigger problem is that, on iOS, Safari will aggressively suspend background tabs. When the background tab is the leader, that's a problem. Sending the leader a postMessage doesn't seem guaranteed to wake it up on iOS and, regardless, too often we're not able to reliably communicate with the leader on iOS. For this reason, we ended up disabling sqlite persistence on iOS unless our application is running as a PWA (since in that case we know the current tab is the leader tab).

The reason we chose the SAH approach is because, from what I've read, the non-SAH approach basically becomes unusable because of transaction contention when there are too many tabs open. We figured, if we need to control concurrent access to the database because of transaction contention, we might as well go with the SAH approach since it's the most performant. Has transaction contention not been a problem for SQLocal @DallasHoff?

@DallasHoff
Copy link
Owner

DallasHoff commented Nov 23, 2024

That's a lot of good insights. Thank you, @jorroll!

I just did the same test I did before with the SAH VFS, and I'm not seeing the I/O errors anymore. That could be due to a fix made between then and now to sqlite-wasm or the changes to SQLocal itself, which now has its own locking mechanism so that if a transaction is attempted while another transaction is in progress on the same database, the second transaction will wait until the first is done.

I've heard of that approach to using the SAH VFS of electing a leader tab using Web Locks. My initial idea for using the SAH VFS was to have sqlite-wasm run in a SharedWorker and then each SQLocal instance initialized for the same database path would connect to that SharedWorker instead of a dedicated Worker. If that were possible, it would make robustly supporting the SAH VFS easy, but unfortunately, there's 3 major issues:

  • SharedWorker is not supported on Android Chrome (Interop would be nice)
  • SharedWorker is not supported by coincident (which SQLocal uses to support user-defined database functions)
  • FileSystemSyncAccessHandle, the central browser API that the SAH VFS uses, is not supported inside SharedWorker

All of these issues are fixable by the respective parties, but I do not expect that any time soon, so it seems like the "leader tab" approach is the only viable one, even though it's not as robust as any of us would prefer.

An idea I had to mitigate the issues with leader tabs being put to sleep or Web Locks not getting released is to have the leader tab periodically send out a "heartbeat" message on a BroadcastChannel. If the other tabs stopped receiving that, they would elect a new leader. This would only be a fallback to the Web Lock approach, though, since it would be slower to wait on that timeout, but worst case scenario, a leader would always get elected whether the Web Lock is released instantly or the heartbeat timeout is reached. What do you think?

I'd like to work on supporting more VFS's with SQLocal soon because having every SQLocal function work the same no matter which VFS you use and provide that abstraction makes it really easy for users of the library to switch between VFS's or fall back as needed. The first step of this was making in-memory databases support SQLocal's full feature set, and I'll be releasing those changes very soon. After that, I'll come back to investigating the other VFS's again, especially the SAH VFS.

@6peterlu 6peterlu mentioned this issue Dec 5, 2024
@6peterlu
Copy link

6peterlu commented Dec 5, 2024

Hello! I put together a very basic PR to just allow the configuration to use SAH VFS, since I have similar needs as @jorroll, and am going to implement a leader tab/locking mechanism as in https://www.notion.com/blog/how-we-sped-up-notion-in-the-browser-with-wasm-sqlite. Currently I'm yarn patching this library to achieve this functionality but I'd love the option to just use the main branch here.

@6peterlu
Copy link

6peterlu commented Dec 5, 2024

@DallasHoff the heartbeat idea is very interesting and makes a lot of sense to me also!

@jorroll
Copy link

jorroll commented Dec 5, 2024

An idea I had to mitigate the issues with leader tabs being put to sleep or Web Locks not getting released is to have the leader tab periodically send out a "heartbeat" message on a BroadcastChannel. If the other tabs stopped receiving that, they would elect a new leader. This would only be a fallback to the Web Lock approach, though, since it would be slower to wait on that timeout, but worst case scenario, a leader would always get elected whether the Web Lock is released instantly or the heartbeat timeout is reached. What do you think?

While I'm not sure, I suspect this will not work @DallasHoff.

If the leader still exists but is unresponsive, it might (probably does) still have a lock on the database file. I'm not sure if there's currently any way to "steal" an OPFS file lock. I.e. you might find that, while you can detect that the leader is unresponsive, you still cannot elect a new leader because the old leader still has a lock on the database file. Mind, I don't know if this is the case, but it's a potential problem. It's also worth stating that, in practice, the problem of the leader being suspended is only a problem we've seen in Safari and (to my memory) only a problem on iOS. Our solution to this problem is to only use a persisted sqlite database on mobile if our app is running as a progressive web app (PWA). If our app is running as a PWA, then we know there is only ever a single tab and we know that the current tab is the leader tab. It allows us to avoid electing a leader and avoid using a shared worker. Because SharedWorkers aren't supported on Android, we also apply the same restriction to Android (we only enable persistence in a PWA on Android).

In practice, only enabling persistence in the PWA on mobile has proven to be a reasonable restriction for our users. When someone logs into our app on mobile, we warn them that persistence isn't supported unless they install the app and we invite them to install the PWA version on their device (and we provide instructions for doing so via https://github.com/khmyznikov/pwa-install).

I've heard of that approach to using the SAH VFS of electing a leader tab using Web Locks. My initial idea for using the SAH VFS was to have sqlite-wasm run in a SharedWorker and then each SQLocal instance initialized for the same database path would connect to that SharedWorker instead of a dedicated Worker. If that were possible, it would make robustly supporting the SAH VFS easy, but unfortunately, there's 3 major issues:

Browsers (well, Chrome) are currently exploring APIs for allowing concurrent access to a SQLite database from multiple tabs. I.e. you could acquire a sync access handle to the sqlite database file without locking the file for other threads. I think there's a blog post on web.dev that explores this option as well as an open issue in one of the standards repos somewhere. This would allow each tab to create it's own dedicated worker that connects to sqlite using the poolUtil.OpfsSAHPoolDb constructor (while also gracefully handling transaction contention without i/o errors). I strongly suspect this will become an option before accessing sqlite from a SharedWorker becomes an option. It's a different way of solving the same problem. I'd prefer the SharedWorker approach (or just allow SharedWorkers to create dedicated workers...which is also being explored somewhere but I don't remember the link) but w/e.

The first step of this was making in-memory databases support SQLocal's full feature set, and I'll be releasing those changes very soon.

Worth noting that, for web apps, in practice if you're using persisted sqlite you also need to have an in-memory database. This is because you're (probably) going to want to apply optimistic updates and have them rendered in the app in less than 16ms for achieving 60fps. For a browser based web application I think this effectively requires maintaining an in-memory database in addition to a persisted database. E.g. hypothetically you could use a wasm build of sqlite (running in a worker) as the source of truth for you app and that data would be persisted. But while sqlite itself might be able to resolve queries fast enough to render a frame in less than 16ms, sending the query to the worker via postMessage and getting a response on the main thread via postMessage can't be guaranteed within any time frame. In practice I've found that postMessage's across threads in the browser can take surprisingly long to resolve (e.g. 100s of ms or longer). My perception is that the slowdown isn't due to sqlite, but is instead just how long the browser can take to postMessage across threads (if you're sending 100s or 1000s of postMessages within a second, the browser queues them and resolves them one by one and the last one can be resolved 100s of ms after the first).

The solution that our app uses is to have an in-memory sqlite database on the main thread and then a persisted sqlite database hosted by a leader tab. Both these databases have the same schema. The in-memory db is the "source of truth" for rendering the app on the main thread. When resolving a query we typically synchronously serve the results from the in-memory db but also send the query to the persisted db async. When the persisted query resolves we load the results into the tab's in-memory db and rerender data as appropriate. If we send a query to the server, we load the results both into the tab's in-memory db and also send those results to the persisted db. While we don't need to use in-memory sqlite as the synchronous datastore on the main thread, it's an attractive option since we're already using persisted sqlite and we can reuse the query logic in both places.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants