You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Using extract_relevant_features on a financial data set I've hit a roadblock where the test_features_significance seems to fail to ever return. Tested and repeatable on multiple systems and always on the same data column (Volume).
Anything else we need to know?:
EfficientFCParameters
/home/ryan/xgboost/venv/lib/python3.10/site-packages/tsfresh/utilities/dataframe_functions.py:198: RuntimeWarning: The columns ['Volume__ar_coefficient__coeff_0__k_10''Volume__ar_coefficient__coeff_1__k_10''Volume__ar_coefficient__coeff_2__k_10''Volume__ar_coefficient__coeff_3__k_10''Volume__ar_coefficient__coeff_4__k_10''Volume__ar_coefficient__coeff_5__k_10''Volume__ar_coefficient__coeff_6__k_10''Volume__ar_coefficient__coeff_7__k_10''Volume__ar_coefficient__coeff_8__k_10''Volume__ar_coefficient__coeff_9__k_10''Volume__query_similarity_count__query_None__threshold_0.0'] did not have any finite values. Filling with zeros.
warnings.warn(
/home/ryan/xgboost/venv/lib/python3.10/site-packages/tsfresh/feature_selection/relevance.py:222: RuntimeWarning: [test_feature_significance] Constant features: Volume__symmetry_looking__r_0.0, Volume__large_standard_deviation__r_0.5, Volume__large_standard_deviation__r_0.55, Volume__large_standard_deviation__r_0.6000000000000001, Volume__large_standard_deviation__r_0.65, Volume__large_standard_deviation__r_0.7000000000000001, Volume__large_standard_deviation__r_0.75, Volume__large_standard_deviation__r_0.8, Volume__large_standard_deviation__r_0.8500000000000001, Volume__large_standard_deviation__r_0.9, Volume__large_standard_deviation__r_0.9500000000000001, Volume__partial_autocorrelation__lag_0, Volume__number_peaks__n_10, Volume__number_peaks__n_50, Volume__ar_coefficient__coeff_0__k_10, Volume__ar_coefficient__coeff_1__k_10, Volume__ar_coefficient__coeff_2__k_10, Volume__ar_coefficient__coeff_3__k_10, Volume__ar_coefficient__coeff_4__k_10, Volume__ar_coefficient__coeff_5__k_10, Volume__ar_coefficient__coeff_6__k_10, Volume__ar_coefficient__coeff_7__k_10, Volume__ar_coefficient__coeff_8__k_10, Volume__ar_coefficient__coeff_9__k_10, Volume__ar_coefficient__coeff_10__k_10, Volume__value_count__value_0, Volume__value_count__value_-1, Volume__range_count__max_1__min_-1, Volume__range_count__max_0__min_-1000000000000.0, Volume__number_crossing_m__m_0, Volume__number_crossing_m__m_-1, Volume__count_above__t_0, Volume__count_below__t_0, Volume__query_similarity_count__query_None__threshold_0.0
warnings.warn(
The log seems incomplete, but this is all the logging i have been able to get output before the hang.
I've attempted to remove some of the features that caused significant repetitive errors but the issue persists:
params = EfficientFCParameters()
del params['fft_coefficient']
del params['agg_linear_trend']
del params['ratio_beyond_r_sigma']
Minimal Example
from tsfresh.utilities.dataframe_functions import make_forecasting_frame
from tsfresh import extract_relevant_features, feature_extraction
from tsfresh.feature_extraction import EfficientFCParameters
import pandas as pd
data = pd.read_csv('session_3000.csv')
column = 'Volume'
x = data[['id', 'Datetime', column]].rename(columns={column: 'value'})
x['kind'] = column # Add kind column to differentiate between series
df_shifted, y = make_forecasting_frame(x['value'], kind=column, max_timeshift=20, rolling_direction=1)
extracted_features = extract_relevant_features(
df_shifted,
y=y,
default_fc_parameters=EfficientFCParameters(),
column_id='id',
column_sort='time',
column_kind='kind',
column_value='value',
n_jobs=16,
)
kind_to_fc_parameters = feature_extraction.settings.from_columns(extracted_features)
Environment:
Python version: 3.10.12
Operating System: Ubuntu 22.04.3 LTS & WLS2
tsfresh version: 0.20.2
Install method (conda, pip, source): pip
The text was updated successfully, but these errors were encountered:
Hi @bulldog5046 - sorry for the late response.
The problem in your case is, that your target is integer-valued, but has many different values. Our internal automatic ml target deduction thinks, you want to do a classification task with a multiclass target, and we need to do many 1-vs-rest comparisons (and probably do hundreds of feature selection runs). By just setting the ml_task="regression", you can tell tsfresh to treat your problem as a regression problem (what it is) and feature selection will finish much faster :)
session_3000.csv
The problem:
Using extract_relevant_features on a financial data set I've hit a roadblock where the test_features_significance seems to fail to ever return. Tested and repeatable on multiple systems and always on the same data column (Volume).
Anything else we need to know?:
EfficientFCParameters
The log seems incomplete, but this is all the logging i have been able to get output before the hang.
I've attempted to remove some of the features that caused significant repetitive errors but the issue persists:
Minimal Example
Environment:
The text was updated successfully, but these errors were encountered: