Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggestions on improving SQLite storage backend performance #285

Open
justjake opened this issue Nov 25, 2024 · 1 comment
Open

Suggestions on improving SQLite storage backend performance #285

justjake opened this issue Nov 25, 2024 · 1 comment

Comments

@justjake
Copy link

Here's a few suggestions after reading the benchmark description on the docs website and doing a brief code-review of the sqlite backend after seeing such abysmal performance on the benchmark. Like, sqlite is going to be a slower backend than RocksDB but it doesn't have to be this much slower!

  1. Cache prepared statement objects with their connection, rather than re-creating them each transaction. Compiling text to SQLite's bytecode is not free. At least these are amortized for the duration of the transaction object lifetime, but for repetitive small transactions such as those in the pokec.rs benchmark, re-compiling the queries for each iteration adds massive overhead.
  2. Switch to WITHOUT ROWID table like this: CREATE TABLE cozo (...) WITHOUT ROWID. Since cozo uses SQLite as a KV store and doesn't use incremental BLOB IO, switching to WITHOUT ROWID table will remove some overhead. Right now the table uses two btrees: one btree indexes the key blob and maps to the ROWID, and then the actual table btree maps from rowid to the tuple. Switching to WITHOUT ROWID will remove the indirection and actually store the tuples in a btree by key. Docs: https://www.sqlite.org/withoutrowid.html
@justjake justjake changed the title Suggestions on improving SQLite storage backend Suggestions on improving SQLite storage backend performance Nov 25, 2024
@justjake
Copy link
Author

Another thing - open_thread_safe wraps every call to a SQLite connection or its derived objects like statements in a mutex, which makes it safe for multiple threads to share a single connection without causing undefined behavior. However, if a single cozo tx struct is only ever accessed from a single thread, or is itself protected by a lock when used from multiple threads, then open_thread_safe is overkill, and you can instead use OpenFlags::with_no_mutex() since cozo is already guaranteeing single-thread-exclusive use of the underlying sqlite objects. see https://www.sqlite.org/threadsafe.html ; this one is confusing because "NOMUTEX" means "its safe to use in a multi-threaded program but not from multiple threads at the same time", whereas "FULLMUTEX" means "i don't know what I'm doing, please add locks to every function call".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant