-
Notifications
You must be signed in to change notification settings - Fork 1.1k
SFrame.read_json should accept a JSON string not just a path to JSON file. #2756
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -1603,13 +1603,18 @@ def read_json(cls, | |
[3 rows x 1 columns] | ||
""" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think |
||
if orient == "records": | ||
g = SArray.read_json(url) | ||
if len(g) == 0: | ||
return SFrame() | ||
if g.dtype != dict: | ||
raise RuntimeError("Invalid input JSON format. Expected list of dictionaries") | ||
g = SFrame({'X1':g}) | ||
return g.unpack('X1','') | ||
if type(url)==str and url[0] !='{' : | ||
g = SArray.read_json(url) | ||
if len(g) == 0: | ||
return SFrame() | ||
if g.dtype != dict: | ||
raise RuntimeError("Invalid input JSON format. Expected list of dictionaries") | ||
g = SFrame({'X1':g}) | ||
return g.unpack('X1','') | ||
elif type(url)==str and url[0] =='{' : | ||
url=pandas.read_json(url) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. actually we don't want to introduce pandas if it's not necessary due to the reason that pandas import time is lengthy (~260ms) and duplicated with SFrame in terms of functionality. Could you use json lib to read the json as a dictionary (you can lazily import here)? Then you can pass the dictionary to SFrame constructor. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @Jarvi-Izana - Even without this change There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. we wan to defer the load when necessary. In general, we shouldn't eagerly load pandas since we encourage people to use sframe. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm confused. Nothing in this pull request changes when There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That's right. The lazy loading in the future should handle it. |
||
url=SFrame(url) | ||
return url | ||
elif orient == "lines": | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can we support JSON strings with lines orient? If we can, I think that should be part of this pull request. If we can't support this we should mention that in the docstring. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think it should be possible. It looks like |
||
g = cls.read_csv(url, header=False,na_values=['null'],true_values=['true'],false_values=['false'], | ||
_only_raw_string_substitutions=True) | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -201,6 +201,13 @@ def test_auto_parse_csv_with_bom(self): | |
sf = SFrame.read_csv(csvfile.name, header=True) | ||
self.assertEqual(sf.dtype, [float, int, str]) | ||
self.__test_equal(sf, df) | ||
def test_read_json(self): | ||
x='{"row 1":{"col 1":"a","col 2":"b"},"row 2":{"col 1":"c","col 2":"d"}}' | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So this is Panda's default JSON format which is different than what our SFrame expects. If you were to write this string to a file and try to read it with The expected JSON format should not be different based on weather a string literal is used or a filename is given. |
||
sf=SFrame.read_json(x) | ||
df =pd.read_json(x) | ||
df=df.reset_index() | ||
df=df.drop(['index'],axis=1) | ||
self.__test_equal(sf, df) | ||
|
||
def test_auto_parse_csv(self): | ||
with tempfile.NamedTemporaryFile(mode='w', delete=False) as csvfile: | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The docstring needs to be updated to talk about this new functionality.