-
Notifications
You must be signed in to change notification settings - Fork 1.1k
SFrame.read_json should accept a JSON string not just a path to JSON file. #2756
base: main
Are you sure you want to change the base?
Conversation
g = SFrame({'X1':g}) | ||
return g.unpack('X1','') | ||
elif type(url)==str and url[0] =='{' : | ||
url=pandas.read_json(url) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually we don't want to introduce pandas if it's not necessary due to the reason that pandas import time is lengthy (~260ms) and duplicated with SFrame in terms of functionality. Could you use json lib to read the json as a dictionary (you can lazily import here)? Then you can pass the dictionary to SFrame constructor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Jarvi-Izana - Even without this change pandas
has already been loaded. So I don't think we should worry about that load time in this pull request.
The reason pandas
is being used here is to prevent having to write the JSON string to disk when creating a SFrame
. I don't think anything in the standard json
lib is going to be able to help with that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we wan to defer the load when necessary. In general, we shouldn't eagerly load pandas since we encourage people to use sframe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused. Nothing in this pull request changes when pandas
is loaded/imported. It's only adding an additional usage of pandas
when a user calls SFrame.read_json
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's right. The lazy loading in the future should handle it.
it would be better to rename this pr. :-) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The
elif type(url)==str and url[0] =='{' : | ||
url=pandas.read_json(url) | ||
url=SFrame(url) | ||
return url | ||
elif orient == "lines": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we support JSON strings with lines orient? If we can, I think that should be part of this pull request. If we can't support this we should mention that in the docstring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it should be possible. It looks like pandas.read_json
has a similar orient
parameter.
@@ -1603,13 +1603,18 @@ def read_json(cls, | |||
[3 rows x 1 columns] | |||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think url
is the right name for the first parameter any more. Maybe we should call it url_or_string
.
@@ -1603,13 +1603,18 @@ def read_json(cls, | |||
[3 rows x 1 columns] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The docstring needs to be updated to talk about this new functionality.
@@ -201,6 +201,13 @@ def test_auto_parse_csv_with_bom(self): | |||
sf = SFrame.read_csv(csvfile.name, header=True) | |||
self.assertEqual(sf.dtype, [float, int, str]) | |||
self.__test_equal(sf, df) | |||
def test_read_json(self): | |||
x='{"row 1":{"col 1":"a","col 2":"b"},"row 2":{"col 1":"c","col 2":"d"}}' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this is Panda's default JSON format which is different than what our SFrame expects. If you were to write this string to a file and try to read it with tc.SFrame.read_json(...)
, you would get an error.
The expected JSON format should not be different based on weather a string literal is used or a filename is given.
There is a significant amount of work needed to merge this pull request. The default input JSON format should be the same regardless of whether a JSON file or a JSON string is used. Passing a JSON string should also work for I'm going to move the issue (#2564) for this pull request to the next milestone. @dhivyaaxim - please let us know if you would like to continue to work on this pull. I'm happy to answer any questions. |
This PR resolves #2564
SFrame.read_json should accept a JSON string in addition to files