-
Notifications
You must be signed in to change notification settings - Fork 200
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"\\" in row breaks PyDruid - JSONDecodeError('Unterminated string...') #242
Comments
dklei
changed the title
JSONDecodeError('Unterminated string starting at: line 1 column 85919 (char 85918)')
"\\" in row breaks PyDruid - JSONDecodeError('Unterminated string starting at: line 1 column 85919 (char 85918)')
Nov 27, 2020
dklei
changed the title
"\\" in row breaks PyDruid - JSONDecodeError('Unterminated string starting at: line 1 column 85919 (char 85918)')
"\\" in row breaks PyDruid - JSONDecodeError('Unterminated string...')
Nov 27, 2020
Hey, I'm hitting this issue. Is it possible to fix this soon? |
This is still an issue in v0.6.6:
To replicate: from pydruid.db.api import rows_from_chunks
bad_json = """[
{
"id": 1,
"value": "hi"
},
{
"id": 2,
"value": "C:\\\\"
},
{
"id": 3,
"value": "this row is missing..."
}
]"""
for row in rows_from_chunks([bad_json]):
print(f"row from bad json: {row}")
print("that's all!") This prints:
There are rows missing! The suggested change in #262 seems to fix this problem. If I paste in the updated function definition from that PR and then rerun the above script, it prints the expected result:
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Hi,
I'm using
pydruid.db.connector
to run a query that pulls a row where the content that is returned ends in"...\\"
, and this appears to break pydruid, meaning it either drops rows from the data or fails with aJSONDecodeError
.e.g.
"SELECT x FROM y"
->[{"x": "some row"},{"x": "...\\"},{"x": "another row"},{"x": "more rows"}]
2020-11-27 10:44:23: [CRITICAL] JSONDecodeError('Unterminated string starting at: line 1 column 85919 (char 85918)')
2020-11-27 10:44:23: [CRITICAL] Traceback (most recent call last):
File "xxxxx", line 291, in main
data_paths = pull_data(tracker.last_data_dt, tracker.next_data_dt)
File "xxxxx", line 162, in pull_data
data_path = collector.execute_and_save()
File "xxxxx", line 226, in execute_and_save
for i, row in enumerate(cursor):
File "xxxxx", line 181, in _get_cursor
raise err
File "xxxxx", line 164, in _get_cursor
raise err
File "xxxxx", line 161, in _get_cursor
r = next(cursor)
File "/xxxx/venv/lib64/python3.8/site-packages/pydruid/db/api.py", line 62, in g
return f(self, *args, **kwargs)
File "/xxxx/venv/lib64/python3.8/site-packages/pydruid/db/api.py", line 320, in next
return next(self._results)
File "/xxxx/venv/lib64/python3.8/site-packages/pydruid/db/api.py", line 370, in _stream_query
for row in rows_from_chunks(chunks):
File "/xxxx/venv/lib64/python3.8/site-packages/pydruid/db/api.py", line 420, in rows_from_chunks
for row in json.loads(
File "/usr/lib64/python3.8/json/init.py", line 370, in loads
return cls(**kw).decode(s)
File "/usr/lib64/python3.8/json/decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/lib64/python3.8/json/decoder.py", line 353, in raw_decode
obj, end = self.scan_once(s, idx)
json.decoder.JSONDecodeError: Unterminated string starting at: line 1 column 85919 (char 85918)
Any rows proceeding the
{"x": "...\\"}
either do not return data, or return aJSONDecodeError
. I'm guessing this is becausepydruid.db.api.rows_from_chunks
tries to parse the JSON itself, and looks for"\\"
as end of strings?I have attached a script and a dummy JSON file (scratch.zip) that shows the rows being dropped by the function but this does not trigger the
JSONDecodeError
- this appears to only trigger when I try to read this row and the surrounding rows from the database.Many thanks in advance
The text was updated successfully, but these errors were encountered: