Skip to content
This repository has been archived by the owner on Dec 21, 2023. It is now read-only.

SFrame.read_json should accept a JSON string not just a path to JSON file. #2756

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

dhivyaaxim
Copy link
Contributor

This PR resolves #2564
SFrame.read_json should accept a JSON string in addition to files

g = SFrame({'X1':g})
return g.unpack('X1','')
elif type(url)==str and url[0] =='{' :
url=pandas.read_json(url)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually we don't want to introduce pandas if it's not necessary due to the reason that pandas import time is lengthy (~260ms) and duplicated with SFrame in terms of functionality. Could you use json lib to read the json as a dictionary (you can lazily import here)? Then you can pass the dictionary to SFrame constructor.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Jarvi-Izana - Even without this change pandas has already been loaded. So I don't think we should worry about that load time in this pull request.
The reason pandas is being used here is to prevent having to write the JSON string to disk when creating a SFrame. I don't think anything in the standard json lib is going to be able to help with that.

Copy link
Collaborator

@guihao-liang guihao-liang Dec 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we wan to defer the load when necessary. In general, we shouldn't eagerly load pandas since we encourage people to use sframe.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused. Nothing in this pull request changes when pandas is loaded/imported. It's only adding an additional usage of pandas when a user calls SFrame.read_json.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right. The lazy loading in the future should handle it.

@guihao-liang
Copy link
Collaborator

it would be better to rename this pr. :-)

@TobyRoseman TobyRoseman changed the title SFrame.read_json_should_accept_a_JSON_string_in_addition_to_files_2564 SFrame.read_json should accept a JSON string not just a path to JSON file. Jan 8, 2020
Copy link
Collaborator

@TobyRoseman TobyRoseman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The

elif type(url)==str and url[0] =='{' :
url=pandas.read_json(url)
url=SFrame(url)
return url
elif orient == "lines":
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we support JSON strings with lines orient? If we can, I think that should be part of this pull request. If we can't support this we should mention that in the docstring.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be possible. It looks like pandas.read_json has a similar orient parameter.

@@ -1603,13 +1603,18 @@ def read_json(cls,
[3 rows x 1 columns]
"""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think url is the right name for the first parameter any more. Maybe we should call it url_or_string.

@@ -1603,13 +1603,18 @@ def read_json(cls,
[3 rows x 1 columns]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docstring needs to be updated to talk about this new functionality.

@@ -201,6 +201,13 @@ def test_auto_parse_csv_with_bom(self):
sf = SFrame.read_csv(csvfile.name, header=True)
self.assertEqual(sf.dtype, [float, int, str])
self.__test_equal(sf, df)
def test_read_json(self):
x='{"row 1":{"col 1":"a","col 2":"b"},"row 2":{"col 1":"c","col 2":"d"}}'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this is Panda's default JSON format which is different than what our SFrame expects. If you were to write this string to a file and try to read it with tc.SFrame.read_json(...), you would get an error.

The expected JSON format should not be different based on weather a string literal is used or a filename is given.

@TobyRoseman
Copy link
Collaborator

There is a significant amount of work needed to merge this pull request. The default input JSON format should be the same regardless of whether a JSON file or a JSON string is used. Passing a JSON string should also work for orient="lines".

I'm going to move the issue (#2564) for this pull request to the next milestone.

@dhivyaaxim - please let us know if you would like to continue to work on this pull. I'm happy to answer any questions.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

SFrame.read_json should accept a JSON string in addition to files
3 participants