Memory leak #36

MislavSag · 2021-07-26T09:54:05Z

Hi,

I have commented already this problem on this and the theft package, but I think this issue deserves its own issue number.

As far as I know, the only way to use catch22 features is to calculate them on a rolling/expanding window.
But when I use rolling windows on the big dataset (not really big, bt above 1 mil rows), the RAM starts to increase rapidly and it crashed R studio for me.

I am not sure if the problem is in C++ code or somewhere else.

I already have posted the code sample, but I am repeating it here:

for (s in price_sybmols) {
  print(s)

  # data sample
  sample_ <- copy(prices)
  sample_ <- sample_[symbol == s]

  # create catch 22 features
  n <- 22
  sample_[, `:=`(
    CO_Embed2_Dist_tau_d_expfit_meandiff = frollapply(adjClose, n, Rcatch22::CO_Embed2_Dist_tau_d_expfit_meandiff),
    CO_f1ecac = frollapply(adjClose, n, Rcatch22::CO_f1ecac),
    CO_FirstMin_ac = frollapply(adjClose, n, Rcatch22::CO_FirstMin_ac),
    CO_HistogramAMI_even_2_5 = frollapply(adjClose, n, Rcatch22::CO_HistogramAMI_even_2_5),
    CO_trev_1_num = frollapply(adjClose, n, Rcatch22::CO_trev_1_num),
    DN_HistogramMode_10 = frollapply(adjClose, n, Rcatch22::DN_HistogramMode_10),
    DN_HistogramMode_5 = frollapply(adjClose, n, Rcatch22::DN_HistogramMode_5),
    DN_OutlierInclude_n_001_mdrmd = frollapply(adjClose, n, Rcatch22::DN_OutlierInclude_n_001_mdrmd),
    DN_OutlierInclude_p_001_mdrmd = frollapply(adjClose, n, Rcatch22::DN_OutlierInclude_p_001_mdrmd),
    FC_LocalSimple_mean1_tauresrat = frollapply(adjClose, n, Rcatch22::FC_LocalSimple_mean1_tauresrat),
    FC_LocalSimple_mean3_stderr = frollapply(adjClose, n, Rcatch22::FC_LocalSimple_mean3_stderr),
    IN_AutoMutualInfoStats_40_gaussian_fmmi = frollapply(adjClose, n, Rcatch22::IN_AutoMutualInfoStats_40_gaussian_fmmi),
    MD_hrv_classic_pnn40 = frollapply(adjClose, n, Rcatch22::MD_hrv_classic_pnn40),
    PD_PeriodicityWang_th0_01 = frollapply(adjClose, n, Rcatch22::PD_PeriodicityWang_th0_01),
    SB_BinaryStats_diff_longstretch0 = frollapply(adjClose, n, Rcatch22::SB_BinaryStats_diff_longstretch0),
    SB_BinaryStats_mean_longstretch1 = frollapply(adjClose, n, Rcatch22::SB_BinaryStats_mean_longstretch1),
    SB_MotifThree_quantile_hh = frollapply(adjClose, n, Rcatch22::SB_MotifThree_quantile_hh),
    SB_TransitionMatrix_3ac_sumdiagcov = frollapply(adjClose, n, Rcatch22::SB_TransitionMatrix_3ac_sumdiagcov),
    SC_FluctAnal_2_dfa_50_1_2_logi_prop_r1 = frollapply(adjClose, n, Rcatch22::SC_FluctAnal_2_dfa_50_1_2_logi_prop_r1),
    SC_FluctAnal_2_rsrangefit_50_1_logi_prop_r1 = frollapply(adjClose, n, Rcatch22::SC_FluctAnal_2_rsrangefit_50_1_logi_prop_r1),
    SP_Summaries_welch_rect_area_5_1 = frollapply(adjClose, n, Rcatch22::SP_Summaries_welch_rect_area_5_1),
    SP_Summaries_welch_rect_centroid = frollapply(adjClose, n, Rcatch22::SP_Summaries_welch_rect_centroid)
  )]

  # save
  cols <- c("symbol", "date",
            colnames(sample_)[which(colnames(sample_) == "CO_Embed2_Dist_tau_d_expfit_meandiff"):ncol(sample_)])
  fwrite(sample_[, ..cols], paste0("D:/fundamental_data/catch22/", s, "-", n, ".csv"))
}

The text was updated successfully, but these errors were encountered:

hendersontrent · 2021-07-26T09:59:18Z

Hi. Yes this is a known issue for catch22. We are working to resolve it. As for usage of features, you are currently computing subsequence features on each window. Another (and more common) approach is to compute global features over the entire time series. This reduces your dataset to a # of Time Series x 22 Feature matrix (although wrangled into a tidy "long" format in Rcatch22). This is much less memory intensive, and might be of use to you unless you have a good motivation for adoption the rolling window approach?

MislavSag · 2021-08-09T14:29:39Z

Sorry for late response. Yes, I need to apply rolling window approach. The reason is that I want to make predictions after every window. I think it's common approach (together with expanding approach) in time series forecasting.

hendersontrent mentioned this issue Jun 2, 2022

Interpolated feature, bug fixes, unit tests #41

Merged

hendersontrent closed this as completed in #41 Jun 2, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory leak #36

Memory leak #36

MislavSag commented Jul 26, 2021

hendersontrent commented Jul 26, 2021

MislavSag commented Aug 9, 2021

Memory leak #36

Memory leak #36

Comments

MislavSag commented Jul 26, 2021

hendersontrent commented Jul 26, 2021

MislavSag commented Aug 9, 2021