Running multiple processes on a shared outlines cache database #2306

e-tornike · 2024-09-16T12:51:16Z

When using the caching feature of outlines with multiple running processes, the cache database may get accessed simultaneously, resulting in disk I/O or database disk image errors:

sqlite3.OperationalError: disk I/O error

sqlite3.DatabaseError: database disk image is malformed

This is further described in vllm-project/vllm#4193 and dottxt-ai/outlines#827.

A workaround for users, who don't use guided decoding, is described in vllm-project/vllm#7831.

Another possible workaround is to set a unique cache directory for each process using the environment variable OUTLINES_CACHE_DIR.

An implementation in this framework (here) creates a separate cache database for each model "rank". Does the use_cache argument here aim to solve this issue, or does it have a different use?

The text was updated successfully, but these errors were encountered:

baberabb · 2024-09-17T17:35:55Z

Hi! The use_cache currently implemented here is for caching evaluation results, so that you can continue where you left off in case of an error.

e-tornike · 2024-09-18T11:02:33Z

Okay, thank you.

Would there be interest in integrating a workaround (e.g., dynamically setting the OUTLINES_CACHE_DIR) into the framework?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Running multiple processes on a shared outlines cache database #2306

Running multiple processes on a shared outlines cache database #2306

e-tornike commented Sep 16, 2024

baberabb commented Sep 17, 2024

e-tornike commented Sep 18, 2024

Running multiple processes on a shared outlines cache database #2306

Running multiple processes on a shared outlines cache database #2306

Comments

e-tornike commented Sep 16, 2024

baberabb commented Sep 17, 2024

e-tornike commented Sep 18, 2024