You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have commented already this problem on this and the theft package, but I think this issue deserves its own issue number.
As far as I know, the only way to use catch22 features is to calculate them on a rolling/expanding window.
But when I use rolling windows on the big dataset (not really big, bt above 1 mil rows), the RAM starts to increase rapidly and it crashed R studio for me.
I am not sure if the problem is in C++ code or somewhere else.
I already have posted the code sample, but I am repeating it here:
for (s in price_sybmols) {
print(s)
# data sample
sample_ <- copy(prices)
sample_ <- sample_[symbol == s]
# create catch 22 features
n <- 22
sample_[, `:=`(
CO_Embed2_Dist_tau_d_expfit_meandiff = frollapply(adjClose, n, Rcatch22::CO_Embed2_Dist_tau_d_expfit_meandiff),
CO_f1ecac = frollapply(adjClose, n, Rcatch22::CO_f1ecac),
CO_FirstMin_ac = frollapply(adjClose, n, Rcatch22::CO_FirstMin_ac),
CO_HistogramAMI_even_2_5 = frollapply(adjClose, n, Rcatch22::CO_HistogramAMI_even_2_5),
CO_trev_1_num = frollapply(adjClose, n, Rcatch22::CO_trev_1_num),
DN_HistogramMode_10 = frollapply(adjClose, n, Rcatch22::DN_HistogramMode_10),
DN_HistogramMode_5 = frollapply(adjClose, n, Rcatch22::DN_HistogramMode_5),
DN_OutlierInclude_n_001_mdrmd = frollapply(adjClose, n, Rcatch22::DN_OutlierInclude_n_001_mdrmd),
DN_OutlierInclude_p_001_mdrmd = frollapply(adjClose, n, Rcatch22::DN_OutlierInclude_p_001_mdrmd),
FC_LocalSimple_mean1_tauresrat = frollapply(adjClose, n, Rcatch22::FC_LocalSimple_mean1_tauresrat),
FC_LocalSimple_mean3_stderr = frollapply(adjClose, n, Rcatch22::FC_LocalSimple_mean3_stderr),
IN_AutoMutualInfoStats_40_gaussian_fmmi = frollapply(adjClose, n, Rcatch22::IN_AutoMutualInfoStats_40_gaussian_fmmi),
MD_hrv_classic_pnn40 = frollapply(adjClose, n, Rcatch22::MD_hrv_classic_pnn40),
PD_PeriodicityWang_th0_01 = frollapply(adjClose, n, Rcatch22::PD_PeriodicityWang_th0_01),
SB_BinaryStats_diff_longstretch0 = frollapply(adjClose, n, Rcatch22::SB_BinaryStats_diff_longstretch0),
SB_BinaryStats_mean_longstretch1 = frollapply(adjClose, n, Rcatch22::SB_BinaryStats_mean_longstretch1),
SB_MotifThree_quantile_hh = frollapply(adjClose, n, Rcatch22::SB_MotifThree_quantile_hh),
SB_TransitionMatrix_3ac_sumdiagcov = frollapply(adjClose, n, Rcatch22::SB_TransitionMatrix_3ac_sumdiagcov),
SC_FluctAnal_2_dfa_50_1_2_logi_prop_r1 = frollapply(adjClose, n, Rcatch22::SC_FluctAnal_2_dfa_50_1_2_logi_prop_r1),
SC_FluctAnal_2_rsrangefit_50_1_logi_prop_r1 = frollapply(adjClose, n, Rcatch22::SC_FluctAnal_2_rsrangefit_50_1_logi_prop_r1),
SP_Summaries_welch_rect_area_5_1 = frollapply(adjClose, n, Rcatch22::SP_Summaries_welch_rect_area_5_1),
SP_Summaries_welch_rect_centroid = frollapply(adjClose, n, Rcatch22::SP_Summaries_welch_rect_centroid)
)]
# save
cols <- c("symbol", "date",
colnames(sample_)[which(colnames(sample_) == "CO_Embed2_Dist_tau_d_expfit_meandiff"):ncol(sample_)])
fwrite(sample_[, ..cols], paste0("D:/fundamental_data/catch22/", s, "-", n, ".csv"))
}
The text was updated successfully, but these errors were encountered:
Hi. Yes this is a known issue for catch22. We are working to resolve it. As for usage of features, you are currently computing subsequence features on each window. Another (and more common) approach is to compute global features over the entire time series. This reduces your dataset to a # of Time Series x 22 Feature matrix (although wrangled into a tidy "long" format in Rcatch22). This is much less memory intensive, and might be of use to you unless you have a good motivation for adoption the rolling window approach?
Sorry for late response. Yes, I need to apply rolling window approach. The reason is that I want to make predictions after every window. I think it's common approach (together with expanding approach) in time series forecasting.
Hi,
I have commented already this problem on this and the theft package, but I think this issue deserves its own issue number.
As far as I know, the only way to use catch22 features is to calculate them on a rolling/expanding window.
But when I use rolling windows on the big dataset (not really big, bt above 1 mil rows), the RAM starts to increase rapidly and it crashed R studio for me.
I am not sure if the problem is in C++ code or somewhere else.
I already have posted the code sample, but I am repeating it here:
The text was updated successfully, but these errors were encountered: