diff --git a/_posts/2023-10-05-(CL_paper100)Nonst_Trans.md b/_posts/2023-10-05-(CL_paper100)Nonst_Trans.md index 21f9cbe97435..e21d7924e137 100644 --- a/_posts/2023-10-05-(CL_paper100)Nonst_Trans.md +++ b/_posts/2023-10-05-(CL_paper100)Nonst_Trans.md @@ -17,6 +17,19 @@ https://github.com/thuml/Nonstationary_Transformers. 0. Abstract 0. Introduction +0. Related Works + 0. Deep Models for TSF + 0. Stationarization for TSF + +0. Non-stationary Transformers + 0. Series Stantionarization + 0. De-stationary Attention + +0. Experiments + 0. Experimental Setups + 0. Main Results + 0. Ablation Study +
@@ -24,7 +37,7 @@ https://github.com/thuml/Nonstationary_Transformers. Previous studies : use stationarization to attenuate the non-stationarity of TS -$\rightarrow$ can be less instructive for real-world TS +$$\rightarrow$$ can be less instructive for real-world TS
@@ -52,7 +65,7 @@ Non-stationarity of data However, non-stationarity is the inherent property -$\rightarrow$ also good guidance for discovering temporal dependencies +$$\rightarrow$$ also good guidance for discovering temporal dependencies
@@ -62,7 +75,7 @@ Example) Figure 1 - ( Figure 1 (b) ) Transformers trained on the stationarized series tend to generate indistinguishable attentions - $\rightarrow$ ***over-stationarization*** problem + $$\rightarrow$$ ***over-stationarization*** problem - unexpected side-effect ... makes Transformers fail to capture eventful temporal dependencies @@ -166,27 +179,122 @@ De-stationary Attention mechanism
-Self-Attention: $\operatorname{Attn}(\mathbf{Q}, \mathbf{K}, \mathbf{V})=\operatorname{Softmax}\left(\frac{\mathbf{Q K}^{\top}}{\sqrt{d_k}}\right) \mathbf{V}$. +Self-Attention: $$\operatorname{Attn}(\mathbf{Q}, \mathbf{K}, \mathbf{V})=\operatorname{Softmax}\left(\frac{\mathbf{Q K}^{\top}}{\sqrt{d_k}}\right) \mathbf{V}$$. Bring the vanished non-stationary information back to its calculation - approximate the - - positive scaling scalar $\tau=\sigma_{\mathbf{x}}^2 \in \mathbb{R}^{+}$ - - shifting vector $\boldsymbol{\Delta}=\mathbf{K} \mu_{\mathbf{Q}} \in \mathbb{R}^{S \times 1}$, + - positive scaling scalar $$\tau=\sigma_{\mathbf{x}}^2 \in \mathbb{R}^{+}$$ + - shifting vector $$\boldsymbol{\Delta}=\mathbf{K} \mu_{\mathbf{Q}} \in \mathbb{R}^{S \times 1}$$, which are defined as de-stationary factors. -- try to learn de-stationary factors directly from the statistics of unstationarized $\mathbf{x}, \mathbf{Q}$ and $\mathbf{K}$ by MLP +- try to learn de-stationary factors directly from the statistics of unstationarized $$\mathbf{x}, \mathbf{Q}$$ and $$\mathbf{K}$$ by MLP
-$\log \tau=\operatorname{MLP}\left(\sigma_{\mathbf{x}}, \mathbf{x}\right)$. +$$\log \tau=\operatorname{MLP}\left(\sigma_{\mathbf{x}}, \mathbf{x}\right)$$. -$\boldsymbol{\Delta}=\operatorname{MLP}\left(\mu_{\mathbf{x}}, \mathbf{x}\right)$. -$\operatorname{Attn}\left(\mathbf{Q}^{\prime}, \mathbf{K}^{\prime}, \mathbf{V}^{\prime}, \tau, \boldsymbol{\Delta}\right)=\operatorname{Softmax}\left(\frac{\tau \mathbf{Q}^{\prime} \mathbf{K}^{\prime}+\mathbf{1} \boldsymbol{\Delta}^{\top}}{\sqrt{d_k}}\right) \mathbf{V}^{\prime}$. +$$\boldsymbol{\Delta}=\operatorname{MLP}\left(\mu_{\mathbf{x}}, \mathbf{x}\right)$$. +$$\operatorname{Attn}\left(\mathbf{Q}^{\prime}, \mathbf{K}^{\prime}, \mathbf{V}^{\prime}, \tau, \boldsymbol{\Delta}\right)=\operatorname{Softmax}\left(\frac{\tau \mathbf{Q}^{\prime} \mathbf{K}^{\prime}+\mathbf{1} \boldsymbol{\Delta}^{\top}}{\sqrt{d_k}}\right) \mathbf{V}^{\prime}$$.
# 4. Experiments +## (1) Experimental Setups + +### a) Datasets + +- Electricity +- ETT datasets +- IExchange +- ILI +- Traffic +- Weather + +
+ +### b) Degree of stationarity + +Augmented Dick-Fuller (ADF) test statistic + +- small value = high stationarity + +![figure2](/assets/img/ts/img472.png) + +
+ +### c) Baselines + +- pass + +
+ +## (2) Main Results + +### a) Forecasting + +MTS Forecasting + +![figure2](/assets/img/ts/img473.png) + +
+ +UTS Forecasting + +![figure2](/assets/img/ts/img474.png) + +
+ +### b) Framework Generality + +![figure2](/assets/img/ts/img475.png) + +Conclusion: Non-stationary Transformer is an **effective and lightweight** framework that can be widely **applied to Transformer-based models** and enhances their non-stationary predictability + +
+ +## (3) Ablation Study + +### a) Quality evaluation + +Dataset: ETTm2 + +Models: + +- vanilla Transformer +- Transformer with only Series Stationarization +- Non-stationary Transformer + +
+ +![figure2](/assets/img/ts/img476.png) + +
+ +### b) Quantitative performance + +![figure2](/assets/img/ts/img477.png) + +
+ +## (3) Model Analysis + +### a) Over-stationarization problem + +Transformers with .... + +- v1) Transformer + Ours ( = Non-stationary Transformer ) +- v2) Transformer + RevIN +- v3) Transformer + Series Stationarization + +
+ +![figure2](/assets/img/ts/img478.png) + +Result + +- v2 & v3) tend to output series with unexpected high degree of stationarity + diff --git a/_posts/non-stationary_transformer.pdf b/_posts/non-stationary_transformer.pdf deleted file mode 100644 index efa297187131..000000000000 Binary files a/_posts/non-stationary_transformer.pdf and /dev/null differ diff --git a/assets/img/ts/img472.png b/assets/img/ts/img472.png new file mode 100644 index 000000000000..749fd6949fd3 Binary files /dev/null and b/assets/img/ts/img472.png differ diff --git a/assets/img/ts/img473.png b/assets/img/ts/img473.png new file mode 100644 index 000000000000..611e9a19e77b Binary files /dev/null and b/assets/img/ts/img473.png differ diff --git a/assets/img/ts/img474.png b/assets/img/ts/img474.png new file mode 100644 index 000000000000..434c23d1c7dc Binary files /dev/null and b/assets/img/ts/img474.png differ diff --git a/assets/img/ts/img475.png b/assets/img/ts/img475.png new file mode 100644 index 000000000000..635b1a6f8b2a Binary files /dev/null and b/assets/img/ts/img475.png differ diff --git a/assets/img/ts/img476.png b/assets/img/ts/img476.png new file mode 100644 index 000000000000..e7a34abacb56 Binary files /dev/null and b/assets/img/ts/img476.png differ diff --git a/assets/img/ts/img477.png b/assets/img/ts/img477.png new file mode 100644 index 000000000000..da73e5668b96 Binary files /dev/null and b/assets/img/ts/img477.png differ diff --git a/assets/img/ts/img478.png b/assets/img/ts/img478.png new file mode 100644 index 000000000000..38421469e003 Binary files /dev/null and b/assets/img/ts/img478.png differ