Fix docstring & concurrency issue with duckdb #42
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Discovered lea this afternoon after reading a Carbonfact job opening and wanted to know more about it!
So here is my attempt at fixing the main branch, hope you do not mind 😊
I read and setup my environment as specified in
CONTRIBUTING.md
✅The first issue is due to a docstring typo, fixed in 9cee9d2
The second one was introduced in 0ed11a9 when bumping
duckdb
to1.0
.By bisecting, the issue was actually introduced by
duckdb==0.10.1
(i.e. it works with 0.10.0) and is likely related to a deadlock between threads (using only one thread fixes the issue, two threads seems flaky, and more => 💀)This seems to be also discussed in DuckerDB's docs:
The guilty: (l.66):
This inevitably leads to tests hanging, in CI & locally, which end up killed after a few hours.
Duplicating the connection using
self.con.cursor()
looks like the easiest short term way to fix this issue since the DuckDB client is likely to be used in concurrent scenarios. It is also used by theread_sql
function:One might also create a connection on-the-fly without storing it using context managers such as:
In the long term, using one explicit connection per thread would be a better & more elegant pattern (e.g. via dependency injection)
🟢 Tests pass locally and in my fork
Thanks!