Releases · splicemachine/ml-workflow

21 May 19:30

Ben-Epstein

2.8.0-k8

e9edea5

2.8.0-k8 Latest

Latest

What's new?

This release has 40 commits and a number of major enhancements.

Major Enhancements

Airflow Support - Feature Statistics calculations, backfill, and pipeline support for feature sets (@myles-novick, #157 #171 #173)
JWT Support for the feature store and mlmanager (@myles-novick, #126)
MLflow 1.15 upgrade (@Ben-Epstein, #129)
Support for deploying fastai, statsmodels, and spacy models to kubernetes natively (@Ben-Epstein, #131)
New HTTP artifact store for mlflow (@Ben-Epstein, #155)
New, cleaner documentation! See it here

Other Changes

Support returning training sets as pandas dataframes (@Ben-Epstein, #158)
feature_exists, feature_set_exists, and training_view_exists functions (@Ben-Epstein, #132 #161 #159 #164)
Enabling custom CORS support via environment variable (@Ben-Epstein, #136)
Versioning for training sets (@Ben-Epstein, #138)
Advanced feature search (@Ben-Epstein, #145)
Database deployed models now propagate errors to the user instead of throwing Unexpected Exceptions (@Ben-Epstein, #166)

Bug Fixes

Fix to the feature store VTI function for TimeSnap was missing the schema name (@Ben-Epstein, #128)
Database model deployment via new VTI was failing when executing via OLAP (@Ben-Epstein, #130)
Various bug fixes for the new Feature Store UI (@Ben-Epstein, #135)
Missing validation on aggregation feature sets (@Ben-Epstein, #147 #148 #156)
TimestampSnap function was sometimes 12 hours off (@sergioferragut, #174)

Breaking Changes

You must run the upgrade script moving from 2.7.0 to 2.8.0 in order for the feature store to function properly

This release is in tandem with the pysplice release

Assets 2

06 Apr 22:54

Ben-Epstein

2.7.0-k8

27ed0eb

2.7.0-k8

What's New?

Initial Airflow support from the Feature Store (@myles-novick )(#108)
New Feature Store Java functions for time-window aggregations (@sergioferragut )(#107)
Improvement to K8s model deployment using secrets and an added label for our new network policies (@Ben-Epstein )(#110)
Script to set up a local running mock feature store for easier development and testing (@sergioferragut , @Ben-Epstein ) (#113)
MLflow UI Iframe bug (@edriggers )(#112)
get_feature_vector bug fix not returning values under certain conditions (@myles-novick )(#115)
Added ability to create a feature set with a list of features in a single API call (@Ben-Epstein )(#114)
New model deployment VTI triggers for database deployment, improving performance by orders of magnitude (@Ben-Epstein )(#109)
remove_training_view API (#106)
Bobby now removes crashing Kubernetes model deployment pods (@Ben-Epstein )(#117)
Docs for feature store API now show up in cloud deployment (@Ben-Epstein )(#118)
A new, more robust way of passing in datatypes to the REST api (@Ben-Epstein )(#120)
update_feature_metadata route added to update tags, descriptions, and attributes of features (@Ben-Epstein )(#122)
Added parameters to the deployments route that can return the deployments created from a particular feature or feature set (@Ben-Epstein )(#121)
Enabled the returning of primary keys from a call to get_feature_vector (@myles-novick )(#124)
New metrics added to the dashboard - most recently created features and most used features (@Ben-Epstein )(#123)
Bug fix for /features throwing a 500 error (@Ben-Epstein )(#125)
Moved all -description routes to -details to match the UI pages. (@Ben-Epstein )(#125)
Better validation for data types for Features and Feature Set primary keys (@Ben-Epstein )(#125)
New Pipeline, Source, and AggregationFeatureSet abilities. This will enable us to create and manage feature set pipelines, automate them (once Airflow is fully integrated), and fully backfill features (@sergioferragut )(#125)

Breaking Changes

Data Types must now be provided in the new standard format. You cannot pass in a feature data type as Varchar(500) for example. You must conform to the new DataType schema:

class DataType(BaseModel):
    """
    A class for representing a SQL data type as an object. Data types can have length, precision,
    and recall values depending on their type (VARCHAR(50), DECIMAL(15,2) for example.
    This class enables the breaking up of those data types into objects
    """
    data_type: str
    length: Optional[int] = None
    precision: Optional[int] = None
    scale: Optional[int] = None

So {feature_data_type: varchar(500)} is now {feature_data_type: {data_type: varchar, length: 500}}

All -description routes must now hit the -details routes. Simply replace all occurrences in your API calls, as all have been changes.

This release is in tandem with the pysplice release which contains the matching Python APIs to these new REST APIs

Assets 2

12 Mar 19:25

Ben-Epstein

2.6.0-k8

fda8faa

2.6.0-k8

What's New?

This release comes with a number of improvements to the Feature Store, both enhanced functionality and improved performance

New Feature Set architecture redesign improving performance and I/O for offline/online tables (#96) (@sergioferragut , @myles-novick )
get_feature_vector_sql bug fix. Returns the proper order of requested features now (#97) (@myles-novick )
Code refactor for better usability (#98) (@myles-novick )
New attributes metadata parameter for features that accepts dictionary of key-values. tags now accepts a list of strings.
undeploy_kubernetes function to remove kubernetes model deployments (#100) (@Ben-Epstein )
Ability for users to drop feature sets in certain scenarios (#102) (@Ben-Epstein )
Allow labels in get_training_view to force the proper point-in-time joins against the label and for better metadata tracking(#103) (@myles-novick, @sergioferragut )
Bug Fix: Validation of primary keys in create_training_view (#104) (@myles-novick )
Upgrade scripts for this release (@myles-novick ) (@sergioferragut )
Fix to a database connection bug that causes a segmentation fault after long stale connections. (54a77a6) (@Ben-Epstein )
Moved the table creation to a pre-app script to avoid write-write conflicts across worker threads (384e021) (@Ben-Epstein, @abaveja313 )

Breaking Changes

The tags parameter no longer accepts a dictionary, it now accepts a list. This must be changed to the attributes parameter for things to work. The attributes now accepts a dictionary.

This release is in tandem with the PySplice release

Assets 2

06 Mar 03:44

Ben-Epstein

2.5.5-k8

ed09dec

Spark3 Release

This release is a Spark3 support of 2.5.1-k8.

No other changes were made except adding spark3 support and removing spark2.4 support. All future releases will be spark3 only

Assets 2

24 Feb 21:44

Ben-Epstein

2.5.1-k8

7491320

PATCH Release for Feature Store

Important

Use this release instead of the release 2.5.0-k8. There is one (1) commit change to this release.

What's changed?

The release, 2.5.0-k8 had a bug that caused both the bobby pod and feature store pod to attempt to created each others tables. The conflict eventually resolves itself (so tests did not catch the issue), but this is not preferable behavior. The issue was discovered via manual testing and inspecting the logs of the deployment.

Assets 2

24 Feb 16:09

Ben-Epstein

2.5.0-k8

2a80479

2.5.0-k8

What's New?

The Server Side Feature Store!

The Feature Store fully functional server side API (@myles-novick )(095aac2)
Feature Store full SQLAlchemy Implementation (@myles-novick )(da5de71, 310be3e)
Unified exception handling for FastAPI errors in Splice Machine (@myles-novick )(df7fd92)
Feature Store unit testing infra and a preliminary suite of tests (@Ben-Epstein )(#87)
Spark 3 support and Spark 2 revert (@Ben-Epstein )(69e63e2, da62337)
Added documentation for the feature store (@Ben-Epstein )(f3a6ed0)

Breaking Changes

There should now be any breaking changes this release. Please upgrade your pysplice package to take advantage of the new feature store API.

This release is in tandem with pysplice
There is no upgrade script for this release as no table structures have changed, only new tables have been added.

Assets 2

07 Jan 20:11

Ben-Epstein

2.4.0-k8

a46a689

2.4.0-k8

What's New?

Major improvement to Database Connection engine for thread safe database connections (@abaveja313 )(#79)
Datetime columns no longer being converted to dates in SQLAlchemy binds (@Ben-Epstein ) (splicemachine/splice_sqlalchemy#16)
docker-compose-template.yaml has been moved to a standard docker-compose.yaml so that docker image versions are kept in sync with branches and releases. A .env file is now used to manage environment variables that remain private (@Ben-Epstein )(#79)
Full documentation for the README so other people can use the repo. (@Ben-Epstein )(#79)
SQL Migration script for the new release (@Ben-Epstein ) (#79)
Feature Store API updates for Beta launch (@Ben-Epstein )(#78)
Better support for SparkML K means clustering (@Ben-Epstein )(#77)
Moved call to get_transaction_id from client to server so users don't need the permissions to make the call (@Ben-Epstein )(#75)
Support for a non-cloud environment to run with ml-workflow, and support for non-k8s environments to not crash the system (@Ben-Epstein )(#63)
Bobby acts as an operator for K8s deployments, bringing the pods back up after bobby crashes or the databse is paused/resumed (@Ben-Epstein , @sergioferragut )(https://github.com/splicemachine/ml-workflow/pull/62/files)
The deploy_kubernetes function now waits for the pod to be ready so users know when the endpoint is active (@Ben-Epstein )(https://github.com/splicemachine/ml-workflow/pull/62/files)

This PR is in tandem with the client side pysplice release.

The SQL migration script is attached to the release, and in the releases directory.

Assets 3

15 Sep 22:32

Ben-Epstein

2.3.0-k8

1ed899b

2.3.0-k8

What's New?

Database Deployment Migrated to Server side running on Bobby pod (@abaveja313, @Ben-Epstein )
Initial K8s deployment code available - known bug with init container hanging, expected to be working in next release (@abaveja313 )
Models are now logged as MLModels instead of the raw model binary (@abaveja313 )
Model caching for database deployment (@Ben-Epstein )
Fix for artifacts downloading without file extension (@Ben-Epstein )
Model deployment metadata managed by Bobby (@abaveja313 )

BREAKING CHANGES

The models table no longer exists. The deployment model is instead stored in a new column of the Artifacts table called database_binary. You must run the migration scripts to alter the artifacts table, otherwise existing deployments won't work
Models currently saved in the database with log_model will not be deployable as we have changed the model saving format from model to MLModel. You must read in the model binary, deserialize it, and re-log the model under a new run.

This release is in tandem with the PySplice release.

Upgrade scripts from 2.2.0 are attached below and available here

Assets 3

01 Jul 22:07

Ben-Epstein

2.2.0-k8-PATCH

88e335b

PATCH fix for View creation

This is a patch release for 2.2.0-k8, fixing the view creation to avoid write-write conflicts

Assets 2

22 Jun 22:18

Ben-Epstein

2.2.0-k8

f540a13

2.2.0-k8

What's New?

Stronger AWS Sagemaker deployment support using k8s ServiceAccounts
Model metadata tracking for in-db deployed models using the MODEL_METADATA and LIVE_MODEL_STATUS table and view
Support for in-db deployment for Keras linear models (LSTMs/RNNs/CNNs not yet supported).
Support for in-db deployment XGBoost using H2O/SKlearn implementations
SKLearn bug fix with fastnumbers
SKlearn better support for non-double return types
Upgrade from pickle -> cloudpickle for sklearn model serialization, adding support for both external and lambda functions inside SKLearn Pipelines
Merge in-db deployment to a 1 table design from a 2-table design. All features + model prediction(s) are stored in a single table
Support for deploying models to an existing table
Support for selecting which columns from a table are used in the model prediction. This allows you to deploy models to a "subset" fo a table.
Better support for in-db deployment for sklearn Pipelines that have predict parameters
deploy_db api cleanup: Removed model parameter and make run_id required. Model is pulled behind the scenes. DF parameter is optional and not required if deploying model to existing table.
General code cleanup

BREAKING CHANGES

deploy_db will no longer work with old parameters. New parameter set and order is required.
createTable from the PySpliceContext now has parameters ordered dataframe, schema_table_name instead of the other way around to match all other APIs in the module.

This release is in tandem with the PySplice release.

Upgrade scripts from 2.1.0 are attached below

UPDATE

Please see the patch release for an important fix.

Assets 3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What's new?

Major Enhancements

Other Changes

Bug Fixes

Breaking Changes

What's New?

Breaking Changes

What's New?

Breaking Changes

Important

What's changed?

What's New?

Breaking Changes

What's New?

What's New?

BREAKING CHANGES

What's New?

BREAKING CHANGES

UPDATE

Releases: splicemachine/ml-workflow

2.8.0-k8

What's new?

Major Enhancements

Other Changes

Bug Fixes

Breaking Changes

2.7.0-k8

What's New?

Breaking Changes

2.6.0-k8

What's New?

Breaking Changes

Spark3 Release

PATCH Release for Feature Store

Important

What's changed?

2.5.0-k8

What's New?

Breaking Changes

2.4.0-k8

What's New?

2.3.0-k8

What's New?

BREAKING CHANGES

PATCH fix for View creation

2.2.0-k8

What's New?

BREAKING CHANGES

UPDATE