-
Notifications
You must be signed in to change notification settings - Fork 193
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix fromjson() to support reading from stdin #667
base: master
Are you sure you want to change the base?
Conversation
Pull Request Test Coverage Report for Build 9050397434Details
💛 - Coveralls |
@@ -16,7 +16,7 @@ | |||
from petl.util.base import data, Table, dicts as _dicts, iterpeek | |||
|
|||
|
|||
def fromjson(source, *args, **kwargs): | |||
def fromjson(source=None, *args, **kwargs): |
Check warning
Code scanning / Prospector (reported by Codacy)
Keyword argument before variable positional arguments list in the definition of fromjson function (keyword-arg-before-vararg) Warning
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code scanning / Prospector (reported by Codacy)
Keyword argument before variable positional arguments list in the definition of fromjson function (keyword-arg-before-vararg)
This wouldn't break existing code?
Maybe read_source_from_arg can be taught to read even when stdin is passed as argument as in:
table1 = etl.fromjson(sys.stdin, header=["foo", "bar"])
print(table1)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might be wrong, but I don't think it should break existing code.
All these assertions pass -
>>> def f(s, *a, **kw):
... return (s, a, kw)
...
>>> def fn(s=None, *a, **kw):
... return (s, a, kw)
...
>>> assert f("hello") == fn("hello")
>>> assert f(None) == fn(None)
>>> assert f("hello", 1, 2) == fn("hello", 1, 2)
>>> assert f("hello", k=1) == fn("hello", k=1)
>>> assert f(s="hello") == fn(s="hello")
>>> assert f("hello", 1, 2, k=1, j=2) == fn("hello", 1, 2, k=1, j=2)
re. read_source_from_arg()
- my interest is in the petl
executable, i.e., I would love for syntax like this to work in shell: echo '{"a":2, "b":4}' | petl 'fromjson(lines=True)'
.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that:
- Using petl as a command line tool is an interesting use case and we should give it some love by:
a. Adding more examples in the documentation.
b. Mention this usage pattern in the repository Readme/Frontage - Currently fromjson() source argument is not consistent with other functions like:
a. tojson() and tojsonarrays()
b. fromcsv() - We value not breaking existing code.
a. But I haven't had the time to check if this is the case yet.
b. If not it will be a good change for API consistency
c. I would love to hear more opinions about this change.
Adding to the investigation: The fromjson() parameters args and kwargs are forwarded to the json library call: def fromjson(source, *args, **kwargs):
"""..."""
...
...
dicts = json.load(f, *self.args, **self.kwargs)
... |
But the json library calls used take only one positional argument followed by others keyword arguments: def json.load(fp, *, cls=None, object_hook=None, parse_float=None, parse_int=None, parse_constant=None, object_pairs_hook=None, **kw):
...
def json.loads(s, *, cls=None, object_hook=None, parse_float=None, parse_int=None, parse_constant=None, object_pairs_hook=None, **kw):
... It looks like the args parameter is simply ignored by json.load and json.loads calls. Now I'm wondering if the function syntax shouldn't be: def fromjson(source=None, *, **kwargs):
"""..."""
...
|
After looking at the source code, I've concluded that your proposal looks good. Also, some considerations:
So, what are your plans for this PR? |
Thanks @juarezr , this sounds good.
Does that sound reasonable and I can update the PR? |
After checking it, I've found that the change in signature won't work: >>> def fromjson(source=None, *, **kwargs):
... print("source:", source, "kwargs:", kwargs)
...
File "<stdin>", line 1
SyntaxError: named arguments must follow bare * So we should keep the args argument as it is currently, and proceed. Eventually, we could remove the self.args property and the forwarding to json.load functions as it is pointless in: dicts = json.load(f, *self.args, **self.kwargs) But it shouldn't be mandatory. |
- adds a simple test invoking the petl executable - installs the package in CI so the executable is available.
""", shell=True, check=True, capture_output=True) | ||
assert result.stdout == b'foo\r\na\r\n' | ||
|
||
def test_json_stdin(): |
Check warning
Code scanning / Pylintpython3 (reported by Codacy)
Missing function or method docstring Warning test
@@ -0,0 +1,22 @@ | |||
from __future__ import print_function, division, absolute_import |
Check warning
Code scanning / Pylintpython3 (reported by Codacy)
Missing module docstring Warning test
|
||
import subprocess | ||
|
||
def test_executable(): |
Check warning
Code scanning / Pylintpython3 (reported by Codacy)
Missing function or method docstring Warning test
|
||
import subprocess | ||
|
||
def test_executable(): |
Check warning
Code scanning / Pylint (reported by Codacy)
Missing function docstring Warning test
@@ -0,0 +1,22 @@ | |||
from __future__ import print_function, division, absolute_import |
Check warning
Code scanning / Pylint (reported by Codacy)
Missing module docstring Warning test
""", shell=True, check=True, capture_output=True) | ||
assert result.stdout == b'foo\r\na\r\n' | ||
|
||
def test_json_stdin(): |
Check warning
Code scanning / Pylint (reported by Codacy)
Missing function docstring Warning test
@@ -0,0 +1,22 @@ | |||
from __future__ import print_function, division, absolute_import | |||
|
|||
import subprocess |
Check warning
Code scanning / Bandit (reported by Codacy)
Consider possible security implications associated with the subprocess module. Warning test
( echo '{"foo": "a", "bar": "b"}' ; echo '{"foo": "c", "bar": "d"}' ) | | ||
petl 'fromjson(lines=True).tocsv()' | ||
""", shell=True) | ||
assert result == b'foo,bar\r\na,b\r\nc,d\r\n' |
Check notice
Code scanning / Semgrep (reported by Codacy)
The application was found using assert in non-test code. Note test
(echo foo,bar ; echo a,b; echo c,d) | | ||
petl 'fromcsv().cut("foo").head(1).tocsv()' | ||
""", shell=True) | ||
assert result == b'foo\r\na\r\n' |
Check notice
Code scanning / Semgrep (reported by Codacy)
The application was found using assert in non-test code. Note test
echo '[{"foo": "a", "bar": "b"}]' | | ||
petl 'fromjson().tocsv()' | ||
""", shell=True) | ||
assert result == b'foo,bar\r\na,b\r\n' |
Check notice
Code scanning / Semgrep (reported by Codacy)
The application was found using assert in non-test code. Note test
assert result == b'foo\r\na\r\n' | ||
|
||
def test_json_stdin(): | ||
result = subprocess.check_output(""" |
Check warning
Code scanning / Bandit (reported by Codacy)
Starting a process with a partial executable path Warning test
import subprocess | ||
|
||
def test_executable(): | ||
result = subprocess.check_output(""" |
Check warning
Code scanning / Bandit (reported by Codacy)
Starting a process with a partial executable path Warning test
echo '[{"foo": "a", "bar": "b"}]' | | ||
petl 'fromjson().tocsv()' | ||
""", shell=True) | ||
assert result == b'foo,bar\r\na,b\r\n' |
Check warning
Code scanning / Bandit (reported by Codacy)
Use of assert detected. The enclosed code will be removed when compiling to optimised byte code. Warning test
result = subprocess.check_output(""" | ||
echo '[{"foo": "a", "bar": "b"}]' | | ||
petl 'fromjson().tocsv()' | ||
""", shell=True) |
Check failure
Code scanning / Bandit (reported by Codacy)
subprocess call with shell=True seems safe, but may be changed in the future, consider rewriting without shell Error test
petl 'fromjson().tocsv()' | ||
""", shell=True) | ||
assert result == b'foo,bar\r\na,b\r\n' | ||
result = subprocess.check_output(""" |
Check warning
Code scanning / Bandit (reported by Codacy)
Starting a process with a partial executable path Warning test
(echo foo,bar ; echo a,b; echo c,d) | | ||
petl 'fromcsv().cut("foo").head(1).tocsv()' | ||
""", shell=True) | ||
assert result == b'foo\r\na\r\n' |
Check warning
Code scanning / Bandit (reported by Codacy)
Use of assert detected. The enclosed code will be removed when compiling to optimised byte code. Warning test
( echo '{"foo": "a", "bar": "b"}' ; echo '{"foo": "c", "bar": "d"}' ) | | ||
petl 'fromjson(lines=True).tocsv()' | ||
""", shell=True) | ||
assert result == b'foo,bar\r\na,b\r\nc,d\r\n' |
Check warning
Code scanning / Bandit (reported by Codacy)
Use of assert detected. The enclosed code will be removed when compiling to optimised byte code. Warning test
result = subprocess.check_output(""" | ||
(echo foo,bar ; echo a,b; echo c,d) | | ||
petl 'fromcsv().cut("foo").head(1).tocsv()' | ||
""", shell=True) |
Check failure
Code scanning / Bandit (reported by Codacy)
subprocess call with shell=True seems safe, but may be changed in the future, consider rewriting without shell Error test
result = subprocess.check_output(""" | ||
( echo '{"foo": "a", "bar": "b"}' ; echo '{"foo": "c", "bar": "d"}' ) | | ||
petl 'fromjson(lines=True).tocsv()' | ||
""", shell=True) |
Check failure
Code scanning / Bandit (reported by Codacy)
subprocess call with shell=True seems safe, but may be changed in the future, consider rewriting without shell Error test
I've made some progress, but not all checks pass yet.
I'll try to look into this again next week, but admit I'm beginning to question how much time I can put into this; small fix, lots of overhead 😅 |
Trivial PR to support reading from stdin (
source=None
) infromjson()
.Two things you may want me to do before merging -
source=None
in anyio
module (e.g., also not incsv
etc). Generally, perhaps I missed them but I didn't see tests that fork/exec thebin/petl
executable at all...?source=None
to other modules. I rangit grep 'def from' | grep source | grep -v source=None
and found a few more that miss it, let me know if you'd like me to add it to any/all of them.Modules which appear to be missing
source=None
:avro
,bcolz
,pytables
(maybe),xml
Checklist
Use this checklist to ensure the quality of pull requests that include new code and/or make changes to existing code.
tox
/pytest
master
branch and tested before sending the PR