Redundant data in timeseries analysis #482

AdityaR-Bits · 2022-07-16T01:43:48Z

When running the analysis on the NYC Taxi dataset, I found that the JSON spec created using Altair backend was storing most of the data (about more than 90%) for a single timeseries temporal plot, where each datapoint in the JSON was for a very short time over a vast range. This plot took a lot of time to render when timed separately. The other recommended plots had performed binning (monthly, yearly, or day of the week) and so where very fast. In such cases where a single plot is taking majority of the time, we could possibly give the user an option to render or skip such a chart?

To Reproduce

lux.config.sampling = False
lux.config.default_display = "lux"
df = pd.read_csv("./data/nyc_taxi.csv")
df['tpep_pickup_datetime'] = pd.to_datetime(df.tpep_pickup_datetime, format="%Y-%m-%d")
df['tpep_dropoff_datetime'] = pd.to_datetime(df.tpep_dropoff_datetime, format="%Y-%m-%d")
df

This is the graph in particular

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Redundant data in timeseries analysis #482

Redundant data in timeseries analysis #482

AdityaR-Bits commented Jul 16, 2022 •

edited

Loading

Redundant data in timeseries analysis #482

Redundant data in timeseries analysis #482

Comments

AdityaR-Bits commented Jul 16, 2022 • edited Loading

AdityaR-Bits commented Jul 16, 2022 •

edited

Loading