Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leak #36

Closed
MislavSag opened this issue Jul 26, 2021 · 2 comments · Fixed by #41
Closed

Memory leak #36

MislavSag opened this issue Jul 26, 2021 · 2 comments · Fixed by #41

Comments

@MislavSag
Copy link

Hi,

I have commented already this problem on this and the theft package, but I think this issue deserves its own issue number.

As far as I know, the only way to use catch22 features is to calculate them on a rolling/expanding window.
But when I use rolling windows on the big dataset (not really big, bt above 1 mil rows), the RAM starts to increase rapidly and it crashed R studio for me.

I am not sure if the problem is in C++ code or somewhere else.

I already have posted the code sample, but I am repeating it here:

for (s in price_sybmols) {
  print(s)

  # data sample
  sample_ <- copy(prices)
  sample_ <- sample_[symbol == s]

  # create catch 22 features
  n <- 22
  sample_[, `:=`(
    CO_Embed2_Dist_tau_d_expfit_meandiff = frollapply(adjClose, n, Rcatch22::CO_Embed2_Dist_tau_d_expfit_meandiff),
    CO_f1ecac = frollapply(adjClose, n, Rcatch22::CO_f1ecac),
    CO_FirstMin_ac = frollapply(adjClose, n, Rcatch22::CO_FirstMin_ac),
    CO_HistogramAMI_even_2_5 = frollapply(adjClose, n, Rcatch22::CO_HistogramAMI_even_2_5),
    CO_trev_1_num = frollapply(adjClose, n, Rcatch22::CO_trev_1_num),
    DN_HistogramMode_10 = frollapply(adjClose, n, Rcatch22::DN_HistogramMode_10),
    DN_HistogramMode_5 = frollapply(adjClose, n, Rcatch22::DN_HistogramMode_5),
    DN_OutlierInclude_n_001_mdrmd = frollapply(adjClose, n, Rcatch22::DN_OutlierInclude_n_001_mdrmd),
    DN_OutlierInclude_p_001_mdrmd = frollapply(adjClose, n, Rcatch22::DN_OutlierInclude_p_001_mdrmd),
    FC_LocalSimple_mean1_tauresrat = frollapply(adjClose, n, Rcatch22::FC_LocalSimple_mean1_tauresrat),
    FC_LocalSimple_mean3_stderr = frollapply(adjClose, n, Rcatch22::FC_LocalSimple_mean3_stderr),
    IN_AutoMutualInfoStats_40_gaussian_fmmi = frollapply(adjClose, n, Rcatch22::IN_AutoMutualInfoStats_40_gaussian_fmmi),
    MD_hrv_classic_pnn40 = frollapply(adjClose, n, Rcatch22::MD_hrv_classic_pnn40),
    PD_PeriodicityWang_th0_01 = frollapply(adjClose, n, Rcatch22::PD_PeriodicityWang_th0_01),
    SB_BinaryStats_diff_longstretch0 = frollapply(adjClose, n, Rcatch22::SB_BinaryStats_diff_longstretch0),
    SB_BinaryStats_mean_longstretch1 = frollapply(adjClose, n, Rcatch22::SB_BinaryStats_mean_longstretch1),
    SB_MotifThree_quantile_hh = frollapply(adjClose, n, Rcatch22::SB_MotifThree_quantile_hh),
    SB_TransitionMatrix_3ac_sumdiagcov = frollapply(adjClose, n, Rcatch22::SB_TransitionMatrix_3ac_sumdiagcov),
    SC_FluctAnal_2_dfa_50_1_2_logi_prop_r1 = frollapply(adjClose, n, Rcatch22::SC_FluctAnal_2_dfa_50_1_2_logi_prop_r1),
    SC_FluctAnal_2_rsrangefit_50_1_logi_prop_r1 = frollapply(adjClose, n, Rcatch22::SC_FluctAnal_2_rsrangefit_50_1_logi_prop_r1),
    SP_Summaries_welch_rect_area_5_1 = frollapply(adjClose, n, Rcatch22::SP_Summaries_welch_rect_area_5_1),
    SP_Summaries_welch_rect_centroid = frollapply(adjClose, n, Rcatch22::SP_Summaries_welch_rect_centroid)
  )]

  # save
  cols <- c("symbol", "date",
            colnames(sample_)[which(colnames(sample_) == "CO_Embed2_Dist_tau_d_expfit_meandiff"):ncol(sample_)])
  fwrite(sample_[, ..cols], paste0("D:/fundamental_data/catch22/", s, "-", n, ".csv"))
}
@hendersontrent
Copy link
Owner

Hi. Yes this is a known issue for catch22. We are working to resolve it. As for usage of features, you are currently computing subsequence features on each window. Another (and more common) approach is to compute global features over the entire time series. This reduces your dataset to a # of Time Series x 22 Feature matrix (although wrangled into a tidy "long" format in Rcatch22). This is much less memory intensive, and might be of use to you unless you have a good motivation for adoption the rolling window approach?

@MislavSag
Copy link
Author

Sorry for late response. Yes, I need to apply rolling window approach. The reason is that I want to make predictions after every window. I think it's common approach (together with expanding approach) in time series forecasting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants