Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Redundant data in timeseries analysis #482

Open
AdityaR-Bits opened this issue Jul 16, 2022 · 0 comments
Open

Redundant data in timeseries analysis #482

AdityaR-Bits opened this issue Jul 16, 2022 · 0 comments

Comments

@AdityaR-Bits
Copy link

AdityaR-Bits commented Jul 16, 2022

When running the analysis on the NYC Taxi dataset, I found that the JSON spec created using Altair backend was storing most of the data (about more than 90%) for a single timeseries temporal plot, where each datapoint in the JSON was for a very short time over a vast range. This plot took a lot of time to render when timed separately. The other recommended plots had performed binning (monthly, yearly, or day of the week) and so where very fast. In such cases where a single plot is taking majority of the time, we could possibly give the user an option to render or skip such a chart?

To Reproduce

lux.config.sampling = False
lux.config.default_display = "lux"
df = pd.read_csv("./data/nyc_taxi.csv")
df['tpep_pickup_datetime'] = pd.to_datetime(df.tpep_pickup_datetime, format="%Y-%m-%d")
df['tpep_dropoff_datetime'] = pd.to_datetime(df.tpep_dropoff_datetime, format="%Y-%m-%d")
df

This is the graph in particular
plot

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant