Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stationarity of the original series #33

Closed
MislavSag opened this issue May 31, 2021 · 12 comments
Closed

Stationarity of the original series #33

MislavSag opened this issue May 31, 2021 · 12 comments

Comments

@MislavSag
Copy link

Hi,

Thanks for the great package.

Should input series be stationar or it can have unit root? For example, should we use prices or diff log prices (returns) for input series?

Do we have to do any other preprocessing before applying the main function?

@hendersontrent
Copy link
Owner

hendersontrent commented May 31, 2021

Hi,

Thanks!

There's no requirement for stationarity. No explicit preprocessing is required beyond what you want to do (e.g. normalisation, if required) -- the function can accept any numeric time-series vector. You should just be able to feed in prices as an input series.

Happy for me to close this issue?

@MislavSag
Copy link
Author

@hendersontrent , Thanks for info. I have found your therft package too. Good work.
Can be closed.

@hendersontrent
Copy link
Owner

Thanks. theft will be submitted to CRAN shortly after some final cleanup and documentation fixes.

@MislavSag
Copy link
Author

Sorry for the opening issue again. Should all functions be used on the rolling window? This seems reasonable beause functions return only one value.

@hendersontrent
Copy link
Owner

If using a rolling window approach, then yes, it seems reasonable to compute features over each window. This has been used in the literature for both forecasting and classification purposes.

@MislavSag
Copy link
Author

Thanks on feedback.

@MislavSag
Copy link
Author

I have tried to use rolling window with runner package, but it crashes R studio.
For exmaple, this works:

runner(
  x = y,
  f = function(x) {
    x + 3
  },
  k = 22,
  na_pad = TRUE
)

but this crashes R studio:

runner(
  x = y,
  f = function(x) {
    Rcatch22::catch22_all(x)
  },
  k = 22,
  na_pad = TRUE
)

I have also tried to use slider package, but it chrashes r studio too.

@MislavSag
Copy link
Author

One update. If I calculate all functions inside the loop it increases my memory. By every step it is bigger and bigger and it eventually increases to 100%. Here is sample code:

for (s in price_sybmols) {
  print(s)

  # data sample
  sample_ <- copy(prices)
  sample_ <- sample_[symbol == s]

  # create catch 22 features
  n <- 22
  sample_[, `:=`(
    CO_Embed2_Dist_tau_d_expfit_meandiff = frollapply(adjClose, n, Rcatch22::CO_Embed2_Dist_tau_d_expfit_meandiff),
    CO_f1ecac = frollapply(adjClose, n, Rcatch22::CO_f1ecac),
    CO_FirstMin_ac = frollapply(adjClose, n, Rcatch22::CO_FirstMin_ac),
    CO_HistogramAMI_even_2_5 = frollapply(adjClose, n, Rcatch22::CO_HistogramAMI_even_2_5),
    CO_trev_1_num = frollapply(adjClose, n, Rcatch22::CO_trev_1_num),
    DN_HistogramMode_10 = frollapply(adjClose, n, Rcatch22::DN_HistogramMode_10),
    DN_HistogramMode_5 = frollapply(adjClose, n, Rcatch22::DN_HistogramMode_5),
    DN_OutlierInclude_n_001_mdrmd = frollapply(adjClose, n, Rcatch22::DN_OutlierInclude_n_001_mdrmd),
    DN_OutlierInclude_p_001_mdrmd = frollapply(adjClose, n, Rcatch22::DN_OutlierInclude_p_001_mdrmd),
    FC_LocalSimple_mean1_tauresrat = frollapply(adjClose, n, Rcatch22::FC_LocalSimple_mean1_tauresrat),
    FC_LocalSimple_mean3_stderr = frollapply(adjClose, n, Rcatch22::FC_LocalSimple_mean3_stderr),
    IN_AutoMutualInfoStats_40_gaussian_fmmi = frollapply(adjClose, n, Rcatch22::IN_AutoMutualInfoStats_40_gaussian_fmmi),
    MD_hrv_classic_pnn40 = frollapply(adjClose, n, Rcatch22::MD_hrv_classic_pnn40),
    PD_PeriodicityWang_th0_01 = frollapply(adjClose, n, Rcatch22::PD_PeriodicityWang_th0_01),
    SB_BinaryStats_diff_longstretch0 = frollapply(adjClose, n, Rcatch22::SB_BinaryStats_diff_longstretch0),
    SB_BinaryStats_mean_longstretch1 = frollapply(adjClose, n, Rcatch22::SB_BinaryStats_mean_longstretch1),
    SB_MotifThree_quantile_hh = frollapply(adjClose, n, Rcatch22::SB_MotifThree_quantile_hh),
    SB_TransitionMatrix_3ac_sumdiagcov = frollapply(adjClose, n, Rcatch22::SB_TransitionMatrix_3ac_sumdiagcov),
    SC_FluctAnal_2_dfa_50_1_2_logi_prop_r1 = frollapply(adjClose, n, Rcatch22::SC_FluctAnal_2_dfa_50_1_2_logi_prop_r1),
    SC_FluctAnal_2_rsrangefit_50_1_logi_prop_r1 = frollapply(adjClose, n, Rcatch22::SC_FluctAnal_2_rsrangefit_50_1_logi_prop_r1),
    SP_Summaries_welch_rect_area_5_1 = frollapply(adjClose, n, Rcatch22::SP_Summaries_welch_rect_area_5_1),
    SP_Summaries_welch_rect_centroid = frollapply(adjClose, n, Rcatch22::SP_Summaries_welch_rect_centroid)
  )]

  # save
  cols <- c("symbol", "date",
            colnames(sample_)[which(colnames(sample_) == "CO_Embed2_Dist_tau_d_expfit_meandiff"):ncol(sample_)])
  fwrite(sample_[, ..cols], paste0("D:/fundamental_data/catch22/", s, "-", n, ".csv"))
}

@hendersontrent
Copy link
Owner

I'll run your code later and explore. There is a known memory leak (see here) and it becomes particularly evident if lots of loops are run. This could be the issue?

@MislavSag
Copy link
Author

I run lots of loops (that is use functions inside lapply), because I am calculating the functions on rolling window.
My memory is growing very fast and makes the functions almost unusable. And the problem is not with one, but with many (all?) functions.

@MislavSag
Copy link
Author

Since you use Rcpp, maybe the problem is connected to this: https://stackoverflow.com/questions/47885216/memory-leaks-in-a-simple-rcpp-function

@MislavSag
Copy link
Author

Maybe you can try to include gc() at the end of function?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants