Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to adjust the period_len parameter for my data #6

Open
Weidong725 opened this issue Jul 5, 2024 · 8 comments
Open

How to adjust the period_len parameter for my data #6

Weidong725 opened this issue Jul 5, 2024 · 8 comments

Comments

@Weidong725
Copy link

I read your source code and found that the input data you have here is indexed "y-MM-dd HH:mm:ss" and n columns of data. If my input data, the index is just "y-MM-dd", that is, one line represents a day, and a day has 96 points in time. How can I find the right "period_len" and whether the network should make any adjustments to the way I enter data

@lss-1138
Copy link
Owner

lss-1138 commented Jul 5, 2024

Is your data formatted as follows?

yy-MM-dd, point1, point2, ..., point96

This appears to be univariate data. You can convert it into the regular format:

yy-MM-dd HH:mm:ss, point1
yy-MM-dd HH:mm:ss, point2
...

You can use the following code to convert your dataset. This way, your data can fit into this framework's models without significant adjustments. You can try period_len=4 and set enc_in=1. The latter indicates the number of variables in the dataset.

import pandas as pd

# Sample dataframe
data = pd.read_csv('your_data.csv', header=None)
# Assuming first column is date and rest are points
date_col = data.iloc[:, 0]
points_cols = data.iloc[:, 1:]

# Generating timestamps for each day assuming 15-minute intervals
timestamps = pd.date_range(start='00:00', periods=96, freq='15T').strftime('%H:%M:%S')

# Create new DataFrame to store the converted data
converted_data = pd.DataFrame()

# Iterate through each row (day) in the original data
for index, row in data.iterrows():
    date = row[0]
    for i in range(1, 97):
        time = timestamps[i-1]
        converted_data = converted_data.append({
            'datetime': f"{date} {time}",
            'value': row[i]
        }, ignore_index=True)

# Save the converted data to a new CSV file
converted_data.to_csv('converted_data.csv', index=False)

You can try the above method and see if it works. If it doesn't, feel free to reach out for further assistance.

@autocodor
Copy link

For 96 points of data a day, why not set the period_len to 96 instead of 4? Would this have any effect on the results of the experiment

@lss-1138
Copy link
Owner

lss-1138 commented Jul 6, 2024

As discussed in Appendix C.2. and shown in Table 9, in scenarios with very long periods, an appropriate sparse strategy can be more effective. For instance, in the case of the ETTm1 dataset with a same period of 96, resampling with too large a period results in very short subsequences with sparse connections, leading to underutilization of information. In such cases, setting the period length to [2-6], i.e., adopting a denser sparse strategy, can be beneficial. Therefore, we recommend setting period_len=4 here.

@autocodor
Copy link

thank you very much

@Weidong725
Copy link
Author

Thank you for your patient answers. While reading your paper, I also found that you mentioned in your paper that sparse can be used in conjunction with GRU or Transformer. Where do you place GRU or Transformer on the network?

@lss-1138
Copy link
Owner

Yes, the sparse technique can be combined with GRU or Transformer. Please refer to our response to Issue #8 , where we provide the implementation code on how to integrate the sparse technique with Transformer. The implementation for GRU is similar. If you need the code for that, we are happy to share it.

@Weidong725
Copy link
Author

Yeah, I think I need GRU related codes. Can you provide them? Thanks a lot !

@lss-1138
Copy link
Owner

Of course. You can try the following implementation code. Use self.no_sparse to control whether to apply the sparse technique.

class Model(nn.Module):
    def __init__(self, configs):
        super(Model, self).__init__()

        # get parameters
        self.seq_len = configs.seq_len
        self.pred_len = configs.pred_len
        self.enc_in = configs.enc_in

        self.period_len = configs.period_len
        self.model_type = configs.model_type
        self.no_sparse = configs.no_sparse

        # self.no_sparse = True
        self.no_sparse = False

        if self.no_sparse:
            self.gru = nn.GRU(input_size=1, hidden_size=64, num_layers=1, bias=True, batch_first=True, bidirectional=False)
            self.output = nn.Linear(64, self.pred_len)
        else:
            self.seg_num_x = self.seq_len // self.period_len
            self.seg_num_y = self.pred_len // self.period_len

            self.conv1d = nn.Conv1d(in_channels=1, out_channels=1, kernel_size=1 + 2 * self.period_len//2,
                                    stride=1, padding=self.period_len//2, padding_mode="zeros", bias=False)
            self.gru = nn.GRU(input_size=1, hidden_size=64, num_layers=1, bias=True, batch_first=True, bidirectional=False)
            self.output = nn.Linear(64, self.seg_num_y)


    def forward(self, x):
        batch_size = x.shape[0]

        if self.no_sparse:
            seq_mean = torch.mean(x, dim=1).unsqueeze(1)
            x = (x - seq_mean)

            x = x.permute(0, 2, 1).reshape(-1, self.seq_len, 1)
            _, hn = self.gru(x)
            y = self.output(hn).view(-1, self.enc_in, self.pred_len).permute(0, 2, 1)

            y = y + seq_mean
        else:
            # normalization and permute     b,s,c -> b,c,s
            seq_mean = torch.mean(x, dim=1).unsqueeze(1)

            x = (x - seq_mean).permute(0, 2, 1)

            x = self.conv1d(x.reshape(-1, 1, self.seq_len)).reshape(-1, self.enc_in, self.seq_len) + x


            # b,c,s -> bc,n,w -> bc,w,n -> bcw,n,1
            x = x.reshape(-1, self.seg_num_x, self.period_len).permute(0, 2, 1).reshape(-1, self.seg_num_x, 1)
            _, hn = self.gru(x)
            y = self.output(hn).view(-1, self.period_len, self.seg_num_y) # bc, w, m
            # bc,w,m -> bc,m,w -> b,c,s
            y = y.permute(0, 2, 1).reshape(batch_size, self.enc_in, self.pred_len)

            y = y.permute(0, 2, 1) + seq_mean

        return y

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants