New features
- 🔥 Now also supports feature-extraction on numeric-index data (and thus not only time-based data)
- 💚 Seamless integration with
tsfresh
, check out the example below:
from tsfresh.feature_extraction import MinimalFCParameters; import scipy.stats as ss
from tsflex.features import FeatureCollection, MultipleFeatureDescriptors
from tsflex.features.integrations import tsfresh_settings_wrapper
from tsflex.utils.data import load_empatica_data
# Load sequence-indexed data (in this case a time-index)
df_tmp, df_acc = load_empatica_data(['tmp', 'acc'])
# Construct your feature extraction configuration & extract features
fc = FeatureCollection(
MultipleFeatureDescriptors(
functions=tsfresh_settings_wrapper(MinimalFCParameters()) + [ss.skew],
series_names=["TMP", "ACC_x", "ACC_y"],
windows=["5min", "15min"],
strides="5min"
)
)
fc.calculate(data=[df_tmp, df_acc], return_df=True)
- ⚡ Optimized strided-rolling feature-extraction, see the newly generated benchmark ⬇️
- Added FeatureCollection.reduce() which comes in really handy when feature selection is performed in your machine-learning pipeline
- 🐻 chunk_data() now also supports DataFrame-dicts as input, which can be more convenient when having DataFrames with a lot of columns for which you want to specify the sample-frequencies.
- 🌻 SeriesPipeline is now more compose-like as it now accepts SeriesPipeline instances
Changes
- 🧵 Changed pathos ➡️ multiprocess as multiprocessing back-end
- 🔧 Moved the
bound_method
argument to FeatureCollection.calculate()
- 📝 Rewrote strided-rolling back-end in a more OO manner (introduced the segmenter module), which complies with our roadmap of providing more segmenting functionality