Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add support for JSON Lines format in 'target' #24

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

apalala
Copy link

@apalala apalala commented May 20, 2018

Try to read a JSON object per input line when reading the whole input as JSON fails. This makes glom behave more like jq.

@codecov-io
Copy link

codecov-io commented May 20, 2018

Codecov Report

Merging #24 into master will decrease coverage by 0.59%.
The diff coverage is 27.27%.

Impacted file tree graph

@@            Coverage Diff            @@
##           master      #24     +/-   ##
=========================================
- Coverage   83.72%   83.12%   -0.6%     
=========================================
  Files           9        9             
  Lines         805      812      +7     
  Branches      133      136      +3     
=========================================
+ Hits          674      675      +1     
- Misses         91       96      +5     
- Partials       40       41      +1
Impacted Files Coverage Δ
glom/cli.py 39.17% <27.27%> (-1.94%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d06e3d1...5e053fb. Read the comment docs.

@mahmoud
Copy link
Owner

mahmoud commented May 23, 2018

Hey Juacarlo! First off, thanks for this, definitely want this sort of direction in the CLI. I think I see an issue with the implementation, however.

I think to really make this part of the functionality production-ready, we need to switch to more of a streaming approach, which means switching up the interface to use a file-like interface, and reading one line at a time. As it stands, this will hold the whole dataset in memory.

Feel free to take a swing at that, but if you don't have the time, I was planning on doing it myself in the near future anyways, and I'll be sure to give you due credit :)

@apalala
Copy link
Author

apalala commented May 23, 2018

Hey Mahmoud!

You are right! In fact, I have memory problems with the software I use for JSONL data because it doesn't stream the lines.

I'll take a shot at it.

glom/cli.py Outdated
target = json.loads(target_text)
try:
target = json.loads(target_text)
except Exception: # JSONDecodeError not available in 2.7
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use ValueError since JSONDecodeError is a subclass.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants