Skip to content
This repository has been archived by the owner on Apr 15, 2022. It is now read-only.

Releases: splicemachine/ml-workflow

2.8.0-k8

21 May 19:30
e9edea5
Compare
Choose a tag to compare

What's new?

This release has 40 commits and a number of major enhancements.

Major Enhancements

Other Changes

Bug Fixes

Breaking Changes

  • You must run the upgrade script moving from 2.7.0 to 2.8.0 in order for the feature store to function properly

This release is in tandem with the pysplice release

2.7.0-k8

06 Apr 22:54
27ed0eb
Compare
Choose a tag to compare

What's New?

  • Initial Airflow support from the Feature Store (@myles-novick )(#108)
  • New Feature Store Java functions for time-window aggregations (@sergioferragut )(#107)
  • Improvement to K8s model deployment using secrets and an added label for our new network policies (@Ben-Epstein )(#110)
  • Script to set up a local running mock feature store for easier development and testing (@sergioferragut , @Ben-Epstein ) (#113)
  • MLflow UI Iframe bug (@edriggers )(#112)
  • get_feature_vector bug fix not returning values under certain conditions (@myles-novick )(#115)
  • Added ability to create a feature set with a list of features in a single API call (@Ben-Epstein )(#114)
  • New model deployment VTI triggers for database deployment, improving performance by orders of magnitude (@Ben-Epstein )(#109)
  • remove_training_view API (#106)
  • Bobby now removes crashing Kubernetes model deployment pods (@Ben-Epstein )(#117)
  • Docs for feature store API now show up in cloud deployment (@Ben-Epstein )(#118)
  • A new, more robust way of passing in datatypes to the REST api (@Ben-Epstein )(#120)
  • update_feature_metadata route added to update tags, descriptions, and attributes of features (@Ben-Epstein )(#122)
  • Added parameters to the deployments route that can return the deployments created from a particular feature or feature set (@Ben-Epstein )(#121)
  • Enabled the returning of primary keys from a call to get_feature_vector (@myles-novick )(#124)
  • New metrics added to the dashboard - most recently created features and most used features (@Ben-Epstein )(#123)
  • Bug fix for /features throwing a 500 error (@Ben-Epstein )(#125)
  • Moved all -description routes to -details to match the UI pages. (@Ben-Epstein )(#125)
  • Better validation for data types for Features and Feature Set primary keys (@Ben-Epstein )(#125)
  • New Pipeline, Source, and AggregationFeatureSet abilities. This will enable us to create and manage feature set pipelines, automate them (once Airflow is fully integrated), and fully backfill features (@sergioferragut )(#125)

Breaking Changes

  • Data Types must now be provided in the new standard format. You cannot pass in a feature data type as Varchar(500) for example. You must conform to the new DataType schema:
class DataType(BaseModel):
    """
    A class for representing a SQL data type as an object. Data types can have length, precision,
    and recall values depending on their type (VARCHAR(50), DECIMAL(15,2) for example.
    This class enables the breaking up of those data types into objects
    """
    data_type: str
    length: Optional[int] = None
    precision: Optional[int] = None
    scale: Optional[int] = None

So {feature_data_type: varchar(500)} is now {feature_data_type: {data_type: varchar, length: 500}}

  • All -description routes must now hit the -details routes. Simply replace all occurrences in your API calls, as all have been changes.

This release is in tandem with the pysplice release which contains the matching Python APIs to these new REST APIs

2.6.0-k8

12 Mar 19:25
fda8faa
Compare
Choose a tag to compare

What's New?

This release comes with a number of improvements to the Feature Store, both enhanced functionality and improved performance

  • New Feature Set architecture redesign improving performance and I/O for offline/online tables (#96) (@sergioferragut , @myles-novick )
  • get_feature_vector_sql bug fix. Returns the proper order of requested features now (#97) (@myles-novick )
  • Code refactor for better usability (#98) (@myles-novick )
  • New attributes metadata parameter for features that accepts dictionary of key-values. tags now accepts a list of strings.
  • undeploy_kubernetes function to remove kubernetes model deployments (#100) (@Ben-Epstein )
  • Ability for users to drop feature sets in certain scenarios (#102) (@Ben-Epstein )
  • Allow labels in get_training_view to force the proper point-in-time joins against the label and for better metadata tracking(#103) (@myles-novick, @sergioferragut )
  • Bug Fix: Validation of primary keys in create_training_view (#104) (@myles-novick )
  • Upgrade scripts for this release (@myles-novick ) (@sergioferragut )
  • Fix to a database connection bug that causes a segmentation fault after long stale connections. (54a77a6) (@Ben-Epstein )
  • Moved the table creation to a pre-app script to avoid write-write conflicts across worker threads (384e021) (@Ben-Epstein, @abaveja313 )

Breaking Changes

The tags parameter no longer accepts a dictionary, it now accepts a list. This must be changed to the attributes parameter for things to work. The attributes now accepts a dictionary.

This release is in tandem with the PySplice release

Spark3 Release

06 Mar 03:44
Compare
Choose a tag to compare

This release is a Spark3 support of 2.5.1-k8.

No other changes were made except adding spark3 support and removing spark2.4 support. All future releases will be spark3 only

PATCH Release for Feature Store

24 Feb 21:44
Compare
Choose a tag to compare

Important

Use this release instead of the release 2.5.0-k8. There is one (1) commit change to this release.

What's changed?

The release, 2.5.0-k8 had a bug that caused both the bobby pod and feature store pod to attempt to created each others tables. The conflict eventually resolves itself (so tests did not catch the issue), but this is not preferable behavior. The issue was discovered via manual testing and inspecting the logs of the deployment.

2.5.0-k8

24 Feb 16:09
2a80479
Compare
Choose a tag to compare

What's New?

The Server Side Feature Store!

Breaking Changes

There should now be any breaking changes this release. Please upgrade your pysplice package to take advantage of the new feature store API.

This release is in tandem with pysplice
There is no upgrade script for this release as no table structures have changed, only new tables have been added.

2.4.0-k8

07 Jan 20:11
a46a689
Compare
Choose a tag to compare

What's New?

This PR is in tandem with the client side pysplice release.


The SQL migration script is attached to the release, and in the releases directory.

2.3.0-k8

15 Sep 22:32
1ed899b
Compare
Choose a tag to compare

What's New?

  • Database Deployment Migrated to Server side running on Bobby pod (@abaveja313, @Ben-Epstein )
  • Initial K8s deployment code available - known bug with init container hanging, expected to be working in next release (@abaveja313 )
  • Models are now logged as MLModels instead of the raw model binary (@abaveja313 )
  • Model caching for database deployment (@Ben-Epstein )
  • Fix for artifacts downloading without file extension (@Ben-Epstein )
  • Model deployment metadata managed by Bobby (@abaveja313 )

BREAKING CHANGES

  • The models table no longer exists. The deployment model is instead stored in a new column of the Artifacts table called database_binary. You must run the migration scripts to alter the artifacts table, otherwise existing deployments won't work
  • Models currently saved in the database with log_model will not be deployable as we have changed the model saving format from model to MLModel. You must read in the model binary, deserialize it, and re-log the model under a new run.

This release is in tandem with the PySplice release.

Upgrade scripts from 2.2.0 are attached below and available here

PATCH fix for View creation

01 Jul 22:07
88e335b
Compare
Choose a tag to compare

This is a patch release for 2.2.0-k8, fixing the view creation to avoid write-write conflicts

2.2.0-k8

22 Jun 22:18
f540a13
Compare
Choose a tag to compare

What's New?

  • Stronger AWS Sagemaker deployment support using k8s ServiceAccounts
  • Model metadata tracking for in-db deployed models using the MODEL_METADATA and LIVE_MODEL_STATUS table and view
  • Support for in-db deployment for Keras linear models (LSTMs/RNNs/CNNs not yet supported).
  • Support for in-db deployment XGBoost using H2O/SKlearn implementations
  • SKLearn bug fix with fastnumbers
  • SKlearn better support for non-double return types
  • Upgrade from pickle -> cloudpickle for sklearn model serialization, adding support for both external and lambda functions inside SKLearn Pipelines
  • Merge in-db deployment to a 1 table design from a 2-table design. All features + model prediction(s) are stored in a single table
  • Support for deploying models to an existing table
  • Support for selecting which columns from a table are used in the model prediction. This allows you to deploy models to a "subset" fo a table.
  • Better support for in-db deployment for sklearn Pipelines that have predict parameters
  • deploy_db api cleanup: Removed model parameter and make run_id required. Model is pulled behind the scenes. DF parameter is optional and not required if deploying model to existing table.
  • General code cleanup

BREAKING CHANGES

  • deploy_db will no longer work with old parameters. New parameter set and order is required.
  • createTable from the PySpliceContext now has parameters ordered dataframe, schema_table_name instead of the other way around to match all other APIs in the module.

This release is in tandem with the PySplice release.

Upgrade scripts from 2.1.0 are attached below

UPDATE

Please see the patch release for an important fix.