Skip to content
This repository has been archived by the owner on Dec 21, 2023. It is now read-only.

SFrame.read_json should accept a JSON string not just a path to JSON file. #2756

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 12 additions & 7 deletions src/python/turicreate/data_structures/sframe.py
Original file line number Diff line number Diff line change
Expand Up @@ -1603,13 +1603,18 @@ def read_json(cls,
[3 rows x 1 columns]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docstring needs to be updated to talk about this new functionality.

"""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think url is the right name for the first parameter any more. Maybe we should call it url_or_string.

if orient == "records":
g = SArray.read_json(url)
if len(g) == 0:
return SFrame()
if g.dtype != dict:
raise RuntimeError("Invalid input JSON format. Expected list of dictionaries")
g = SFrame({'X1':g})
return g.unpack('X1','')
if type(url)==str and url[0] !='{' :
g = SArray.read_json(url)
if len(g) == 0:
return SFrame()
if g.dtype != dict:
raise RuntimeError("Invalid input JSON format. Expected list of dictionaries")
g = SFrame({'X1':g})
return g.unpack('X1','')
elif type(url)==str and url[0] =='{' :
url=pandas.read_json(url)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually we don't want to introduce pandas if it's not necessary due to the reason that pandas import time is lengthy (~260ms) and duplicated with SFrame in terms of functionality. Could you use json lib to read the json as a dictionary (you can lazily import here)? Then you can pass the dictionary to SFrame constructor.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Jarvi-Izana - Even without this change pandas has already been loaded. So I don't think we should worry about that load time in this pull request.
The reason pandas is being used here is to prevent having to write the JSON string to disk when creating a SFrame. I don't think anything in the standard json lib is going to be able to help with that.

Copy link
Collaborator

@guihao-liang guihao-liang Dec 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we wan to defer the load when necessary. In general, we shouldn't eagerly load pandas since we encourage people to use sframe.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused. Nothing in this pull request changes when pandas is loaded/imported. It's only adding an additional usage of pandas when a user calls SFrame.read_json.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right. The lazy loading in the future should handle it.

url=SFrame(url)
return url
elif orient == "lines":
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we support JSON strings with lines orient? If we can, I think that should be part of this pull request. If we can't support this we should mention that in the docstring.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be possible. It looks like pandas.read_json has a similar orient parameter.

g = cls.read_csv(url, header=False,na_values=['null'],true_values=['true'],false_values=['false'],
_only_raw_string_substitutions=True)
Expand Down
7 changes: 7 additions & 0 deletions src/python/turicreate/test/test_sframe.py
Original file line number Diff line number Diff line change
Expand Up @@ -201,6 +201,13 @@ def test_auto_parse_csv_with_bom(self):
sf = SFrame.read_csv(csvfile.name, header=True)
self.assertEqual(sf.dtype, [float, int, str])
self.__test_equal(sf, df)
def test_read_json(self):
x='{"row 1":{"col 1":"a","col 2":"b"},"row 2":{"col 1":"c","col 2":"d"}}'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this is Panda's default JSON format which is different than what our SFrame expects. If you were to write this string to a file and try to read it with tc.SFrame.read_json(...), you would get an error.

The expected JSON format should not be different based on weather a string literal is used or a filename is given.

sf=SFrame.read_json(x)
df =pd.read_json(x)
df=df.reset_index()
df=df.drop(['index'],axis=1)
self.__test_equal(sf, df)

def test_auto_parse_csv(self):
with tempfile.NamedTemporaryFile(mode='w', delete=False) as csvfile:
Expand Down