-
Notifications
You must be signed in to change notification settings - Fork 480
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Loading and Saving Weights #1542
Comments
After doing some low-level weight save testing; I did notice some results close to NP results on accuracy; however, the weights also make 'consistency' in forecast more difficult from my testing so far. In general; I'm sure weight save and load is a bigger topic in forecasting (especially volatile datasets) than just implementing it like in standard PyTorch workflows. |
The save and load methods have been slightly improved and now live here: neural_prophet/neuralprophet/utils.py Line 25 in dbd0458
|
I would love to learn more about what you did and found here! |
We currently support saving/loading of the entire object (via torch). If you are looking to train a model, then re-train/fine-tune the already trained model, that is currently not supported, but should be possible to support with a limited set of changes. It mainly requires adjusting some dataset and model initialization checks, and adjusting the re-training schedule. I'd be happy to support you if you decide to add this with a PR! If you were planning to only save the weights and reload the weights, without all python/torch objects, that might be a larger project requiring sophisticated save and load methods that are highly dependent on many areas of the code. |
# Yes, taking saved weights, dynamically updating them during the run to file, then using it on retrain (this is a basic demo I was working with to play around with that logic):
import os
import numpy as np
import pandas as pd
import gc
import hiddenlayer as hl
import torch
import torch.nn as nn
import torch.optim as optim
from sklearn.preprocessing import MinMaxScaler
from torch.utils.data import DataLoader, TensorDataset
import gc
import torch.multiprocessing as mp
def clear_memory():
if torch.cuda.is_available():
torch.cuda.empty_cache()
gc.collect()
def main():
# If needed*
# os.environ['CUDA_LAUNCH_BLOCKING'] = '1'
# mp.set_start_method('spawn', force=True)
clear_memory()
gpu_index = 0
torch.cuda.set_device(gpu_index)
if torch.cuda.is_available():
gpu_index = min(gpu_index, torch.cuda.device_count() - 1)
torch.cuda.set_device(gpu_index)
device = torch.device(f"cuda:{gpu_index}")
print(f"Using device {gpu_index}: {torch.cuda.get_device_name(gpu_index)}")
else:
device = torch.device("cpu")
print("No CUDA device available. Using CPU instead.")
csv_file_path = 'timeseries.csv'
use_regressors = True
lags = 120
batch_size = 12
epochs = 1200
learning_rate = 0.023
model_dir = 'weights'
model_path = os.path.join(model_dir, 'time_series_model.pth')
best_loss_file = os.path.join(model_dir, 'best_loss.txt')
force_train = True
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
data = pd.read_csv(csv_file_path, parse_dates=['time'])
data = data.sort_values('time')
data = data.drop_duplicates(subset='time', keep='last')
data.set_index('time', inplace=True)
scaler = MinMaxScaler(feature_range=(-1, 1))
scaled_prices = scaler.fit_transform(data['close'].values.reshape(-1, 1)).flatten()
scaled_high, scaled_low = None, None
if use_regressors:
scaled_high = scaler.fit_transform(data['high'].values.reshape(-1, 1)).flatten()
scaled_low = scaler.fit_transform(data['low'].values.reshape(-1, 1)).flatten()
def create_sequences(data, lags, include_regressors=False, high_data=None, low_data=None):
X, y = [], []
for i in range(lags, len(data)):
sequence = data[i-lags:i]
if include_regressors:
assert high_data is not None and low_data is not None, "Regressor data must be provided"
sequence = np.column_stack((sequence, high_data[i-lags:i], low_data[i-lags:i]))
else:
sequence = np.array(sequence).reshape(-1, 1)
X.append(sequence)
y.append(data[i])
return np.array(X), np.array(y)
X, y = create_sequences(scaled_prices, lags, use_regressors, scaled_high, scaled_low)
X_tensor = torch.tensor(X, dtype=torch.float32).to(device)
y_tensor = torch.tensor(y, dtype=torch.float32).to(device)
dataset = TensorDataset(X_tensor, y_tensor)
dataloader = DataLoader(dataset, batch_size=batch_size, shuffle=True, num_workers=0)
class TimeSeriesNN(nn.Module):
def __init__(self, input_size, hidden_size=50, num_layers=2):
super(TimeSeriesNN, self).__init__()
self.lstm = nn.LSTM(input_size, hidden_size, num_layers, batch_first=True)
self.linear = nn.Linear(hidden_size, 1)
def forward(self, x):
_, (hidden, _) = self.lstm(x)
out = self.linear(hidden[-1])
return out
input_features = 3 if use_regressors else 1
model = TimeSeriesNN(input_size=input_features).to(device)
criterion = nn.MSELoss()
optimizer = optim.Adam(model.parameters(), lr=learning_rate)
if os.path.exists(model_path) and not force_train:
model.load_state_dict(torch.load(model_path, map_location=device))
with open(best_loss_file, 'r') as file:
best_loss = float(file.read().strip())
print("Loaded existing model weights.")
else:
best_loss = float('inf')
model.train()
for epoch in range(epochs):
for inputs, targets in dataloader:
inputs, targets = inputs.to(device), targets.to(device)
optimizer.zero_grad()
outputs = model(inputs)
loss = criterion(outputs, targets.unsqueeze(-1))
loss.backward()
optimizer.step()
print(f'Epoch {epoch+1}/{epochs}, Loss: {loss.item()}')
if loss.item() < best_loss:
best_loss = loss.item()
if not os.path.exists(model_dir):
os.makedirs(model_dir)
torch.save(model.state_dict(), model_path)
with open(best_loss_file, 'w') as file:
file.write(f'{best_loss}')
print("Saved improved model weights.")
model.eval()
with torch.no_grad():
if use_regressors:
last_sequence = np.column_stack((scaled_prices[-lags:], scaled_high[-lags:], scaled_low[-lags:]))
else:
last_sequence = scaled_prices[-lags:]
last_sequence = last_sequence.reshape(1, -1, 1)
last_sequence = torch.tensor(last_sequence, dtype=torch.float32).to(device)
predicted_scaled_price = model(last_sequence).item()
predicted_price = scaler.inverse_transform([[predicted_scaled_price]])[0][0]
last_timestamp_unix = data.index[-1]
last_timestamp = pd.to_datetime(last_timestamp_unix, unit='s', utc=True)
forecasted_timestamp = last_timestamp + pd.Timedelta(hours=1)
print(f"Trained up to: {last_timestamp.strftime('%Y-%m-%d %H:%M:%S %Z')}")
print(f"Forecasted for: {forecasted_timestamp.strftime('%Y-%m-%d %H:%M:%S %Z')}")
print(f"Predicted value for the next time step: ${predicted_price:.2f}")
clear_memory()
if __name__ == '__main__':
main() |
Please feel free creating a PR that does away with this assertion: neural_prophet/neuralprophet/forecaster.py Line 963 in dbd0458
Evidently, the PR will also need to be able to support the new logic of having a pre-initialized model now being trained again, including adequate test cases. Happy to help if you get stuck! |
Thanks! I'm going to be deep diving into this further and will review some test results. Will plan on making a PR in the near future for this! |
Excellent @quant-exchange! Looking forward to your PR. |
Addressing this will also require #828 |
I notice this merge from the past: https://github.com/ourownstory/neural_prophet/pull/691/files
Just to make sure I understand here: I'm not seeing these methods in the forecaster.py any longer. Is there an official way to save the weights to file and load those on next run so NP isn't being complete new instantiation each time, therefore outside of the epoch window; there's no continuous machine learning capable with the current library?
If this is the case, I wouldn't mind figuring out how to make a PR to get a solve if possible.
The text was updated successfully, but these errors were encountered: