Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError: Invalid character while parsing year ('N', Index: 0) #79

Closed
scottyhq opened this issue Oct 24, 2024 · 3 comments
Closed

ValueError: Invalid character while parsing year ('N', Index: 0) #79

scottyhq opened this issue Oct 24, 2024 · 3 comments

Comments

@scottyhq
Copy link
Contributor

For a heterogenous collection of STAC Items with some containing a timestamp property like updated and others that do not, coercing to timestamps fails because the code seems to be trying to convert a pyarrow 'None' string to timestamp:

[ciso8601.parse_rfc3339(str(t)) for t in column], pa.timestamp("us", tz="UTC")

I think this scenario might be common for APIs that are returning metadata that changes over time. I came across this using this public endpoint https://docs.canopy.umbra.space/docs/archive-catalog-searching-via-stac-api

I tried a quick fix which seems to work, but not sure it's the best approach... I just removed ciso8601 and let Arrow handle the casting 😅.

Alternatively, using pandas to coerce timestamps is also mentioned here #31 (comment)

@kylebarron
Copy link
Collaborator

I think you can fix it either way, such as by avoiding casting None to string here. But I also didn't know that pyarrow was able to cast the strings to dates, and so that's more appealing to me.

We shouldn't use pandas because this arrow module is intended to not have a dependency on pandas.

@scottyhq
Copy link
Contributor Author

But I also didn't know that pyarrow was able to cast the strings to dates

I'm new to arrow, so I definitely fumbled around a bit!

I thought this would work: pa.scalar('2024-08-24T17:52:27.135933+00:00', type=pa.timestamp('us', tz='UTC')) but raises ArrowTypeError: object of type <class 'str'> cannot be converted to int

But it works if you first go to a pyarrow string and then cast: pa.scalar(timestamp_str, type=pa.string()).cast(pa.timestamp('us', tz='UTC'))

@scottyhq
Copy link
Contributor Author

closed by #80

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants