-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: use dateutils.relativedelta
instead of timedelta
#64
base: main
Are you sure you want to change the base?
Conversation
@mdancho84 fancy taking a look whenever you have the time? |
Sorry must have missed this message. I'll install and test out with pytimetk. |
I just updated to feat/relativedelta and saw no breaking changes. I'll test the month in a minute and report back. |
Hey @mdancho84
I should avoid pinging during weekends 🙈
Sure thanks! There is no rush 😁 I could also follow a similar approach to what we do in Narwhals and run the pytimetk test suite in CI as a downstream library. I will open an issue to keep track of that |
That would be great just to make sure my examples don't fail. No worries about weekends. I get slammed by emails regardless. It was my bad. |
I'm seeing a failure with 'months': # imports
import numpy as np
import pandas as pd
import pytimetk as tk
# Get data
df = tk.datasets.load_dataset('bike_sales_sample')
df['order_date'] = pd.to_datetime(df['order_date'])
df.glimpse()
# aggregate sales by month
sales_by_month = df \
.groupby('category_2') \
.summarize_by_time(
date_column = 'order_date',
value_column = 'total_price',
agg_func = ['sum'],
freq = 'MS'
)
sales_by_month
# make cross validation sets
from pytimetk import TimeSeriesCV
tscv = TimeSeriesCV(
frequency="months",
train_size=24,
forecast_horizon=12,
gap=12,
)
|
Let me doublecheck that the branch was installed. |
That was the problem. I wasn't upgraded. Solution was to uninstall and reinstall at your last commit /branch |
I am running into issues with the 'month' test I've put together inside of Pytimetk. I have 1 year of data (12 months). And with the specification ( Note - My pytimetk tests for daily still work fine. tscv = TimeSeriesCV(
frequency="months",
train_size=6,
forecast_horizon=3,
gap=0,
) Test:# imports
import numpy as np
import pandas as pd
import pytimetk as tk
# Get data
df = tk.datasets.load_dataset('bike_sales_sample')
df['order_date'] = pd.to_datetime(df['order_date'])
df.glimpse()
# aggregate sales by month
sales_by_month = df \
.groupby('category_2') \
.summarize_by_time(
date_column = 'order_date',
value_column = 'total_price',
agg_func = ['sum'],
freq = 'MS'
)
sales_by_month \
.groupby('category_2') \
.plot_timeseries("order_date", "total_price_sum", smooth=False, plotly_dropdown = True)
# Set index
df = sales_by_month.copy()
df.set_index("order_date", inplace=True)
# Create an X dataframeand y series
X, y = df.loc[:, ["category_2"]], df["total_price_sum"]
X
y
# make cross validation sets
from pytimetk import TimeSeriesCV
tscv = TimeSeriesCV(
frequency="months",
train_size=6,
forecast_horizon=3,
gap=0,
)
splits = tscv.split(X, y)
for i, (X_train, X_forecast, y_train, y_forecast) in enumerate(list(splits)):
print(f"Split {i+1}")
print(X_train)
print(X_forecast)
tscv.glimpse(y)
tscv.plot(X,y) OutputThe output from printing the splits suggests it's only making 1 split:
|
It seems that the data has one year only: df.index.min(), df.index.max()
(Timestamp('2011-01-01 00:00:00'), Timestamp('2011-12-01 00:00:00')) |
Yes, is that a problem? |
Whops sorry, I thought it was 1 year frequency, ignore me |
So the reason is that there are 11 months between min and max, with a 6 months training and 3 months forecast horizon (and stride as well). This would make the second split to start on 2010-12-01, which is before the min date, and therefore exit the loop. For If you were to compute this in forward mode, then you would get 2 splits, the second of which has a test size of 2 months (from 2011-10-01 to 2011-12-01). One way to achieve that in backward mode would be to specify the end date in from datetime import datetime
...
tscv = TimeSeriesCV(
frequency="months",
train_size=6,
forecast_horizon=3,
gap=0,
)
splits = tscv.split(X, y, end_dt=datetime(2012, 1, 1))
for i, _ in enumerate(list(splits)):
print(f"Split {i+1}")
|
Ok that's interesting. Thanks for looking into it. All of the original examples I put together are working. I'll play around with it and see if there's anything else. But this looks great. Thanks so much for adding the new frequencies. |
Description
This will enable to work with months and years time frequencies.
Closes #63 and #8