Releases: splicemachine/ml-workflow
2.8.0-k8
What's new?
This release has 40 commits and a number of major enhancements.
Major Enhancements
- Airflow Support - Feature Statistics calculations, backfill, and pipeline support for feature sets (@myles-novick, #157 #171 #173)
- JWT Support for the feature store and mlmanager (@myles-novick, #126)
- MLflow 1.15 upgrade (@Ben-Epstein, #129)
- Support for deploying fastai, statsmodels, and spacy models to kubernetes natively (@Ben-Epstein, #131)
- New HTTP artifact store for mlflow (@Ben-Epstein, #155)
- New, cleaner documentation! See it here
Other Changes
- Support returning training sets as pandas dataframes (@Ben-Epstein, #158)
- feature_exists, feature_set_exists, and training_view_exists functions (@Ben-Epstein, #132 #161 #159 #164)
- Enabling custom CORS support via environment variable (@Ben-Epstein, #136)
- Versioning for training sets (@Ben-Epstein, #138)
- Advanced feature search (@Ben-Epstein, #145)
- Database deployed models now propagate errors to the user instead of throwing Unexpected Exceptions (@Ben-Epstein, #166)
Bug Fixes
- Fix to the feature store VTI function for TimeSnap was missing the schema name (@Ben-Epstein, #128)
- Database model deployment via new VTI was failing when executing via OLAP (@Ben-Epstein, #130)
- Various bug fixes for the new Feature Store UI (@Ben-Epstein, #135)
- Missing validation on aggregation feature sets (@Ben-Epstein, #147 #148 #156)
- TimestampSnap function was sometimes 12 hours off (@sergioferragut, #174)
Breaking Changes
- You must run the upgrade script moving from 2.7.0 to 2.8.0 in order for the feature store to function properly
This release is in tandem with the pysplice release
2.7.0-k8
What's New?
- Initial Airflow support from the Feature Store (@myles-novick )(#108)
- New Feature Store Java functions for time-window aggregations (@sergioferragut )(#107)
- Improvement to K8s model deployment using secrets and an added label for our new network policies (@Ben-Epstein )(#110)
- Script to set up a local running mock feature store for easier development and testing (@sergioferragut , @Ben-Epstein ) (#113)
- MLflow UI Iframe bug (@edriggers )(#112)
get_feature_vector
bug fix not returning values under certain conditions (@myles-novick )(#115)- Added ability to create a feature set with a list of features in a single API call (@Ben-Epstein )(#114)
- New model deployment VTI triggers for database deployment, improving performance by orders of magnitude (@Ben-Epstein )(#109)
remove_training_view
API (#106)- Bobby now removes crashing Kubernetes model deployment pods (@Ben-Epstein )(#117)
- Docs for feature store API now show up in cloud deployment (@Ben-Epstein )(#118)
- A new, more robust way of passing in datatypes to the REST api (@Ben-Epstein )(#120)
update_feature_metadata
route added to update tags, descriptions, and attributes of features (@Ben-Epstein )(#122)- Added parameters to the
deployments
route that can return the deployments created from a particular feature or feature set (@Ben-Epstein )(#121) - Enabled the returning of primary keys from a call to
get_feature_vector
(@myles-novick )(#124) - New metrics added to the dashboard - most recently created features and most used features (@Ben-Epstein )(#123)
- Bug fix for
/features
throwing a 500 error (@Ben-Epstein )(#125) - Moved all
-description
routes to-details
to match the UI pages. (@Ben-Epstein )(#125) - Better validation for data types for Features and Feature Set primary keys (@Ben-Epstein )(#125)
- New Pipeline, Source, and AggregationFeatureSet abilities. This will enable us to create and manage feature set pipelines, automate them (once Airflow is fully integrated), and fully backfill features (@sergioferragut )(#125)
Breaking Changes
- Data Types must now be provided in the new standard format. You cannot pass in a feature data type as
Varchar(500)
for example. You must conform to the newDataType
schema:
class DataType(BaseModel):
"""
A class for representing a SQL data type as an object. Data types can have length, precision,
and recall values depending on their type (VARCHAR(50), DECIMAL(15,2) for example.
This class enables the breaking up of those data types into objects
"""
data_type: str
length: Optional[int] = None
precision: Optional[int] = None
scale: Optional[int] = None
So {feature_data_type: varchar(500)}
is now {feature_data_type: {data_type: varchar, length: 500}}
- All
-description
routes must now hit the-details
routes. Simply replace all occurrences in your API calls, as all have been changes.
This release is in tandem with the pysplice release which contains the matching Python APIs to these new REST APIs
2.6.0-k8
What's New?
This release comes with a number of improvements to the Feature Store, both enhanced functionality and improved performance
- New Feature Set architecture redesign improving performance and I/O for offline/online tables (#96) (@sergioferragut , @myles-novick )
- get_feature_vector_sql bug fix. Returns the proper order of requested features now (#97) (@myles-novick )
- Code refactor for better usability (#98) (@myles-novick )
- New
attributes
metadata parameter for features that accepts dictionary of key-values.tags
now accepts a list of strings. - undeploy_kubernetes function to remove kubernetes model deployments (#100) (@Ben-Epstein )
- Ability for users to drop feature sets in certain scenarios (#102) (@Ben-Epstein )
- Allow labels in
get_training_view
to force the proper point-in-time joins against the label and for better metadata tracking(#103) (@myles-novick, @sergioferragut ) - Bug Fix: Validation of primary keys in create_training_view (#104) (@myles-novick )
- Upgrade scripts for this release (@myles-novick ) (@sergioferragut )
- Fix to a database connection bug that causes a segmentation fault after long stale connections. (54a77a6) (@Ben-Epstein )
- Moved the table creation to a pre-app script to avoid write-write conflicts across worker threads (384e021) (@Ben-Epstein, @abaveja313 )
Breaking Changes
The tags
parameter no longer accepts a dictionary, it now accepts a list. This must be changed to the attributes
parameter for things to work. The attributes
now accepts a dictionary.
This release is in tandem with the PySplice release
Spark3 Release
This release is a Spark3 support of 2.5.1-k8.
No other changes were made except adding spark3 support and removing spark2.4 support. All future releases will be spark3 only
PATCH Release for Feature Store
Important
Use this release instead of the release 2.5.0-k8. There is one (1) commit change to this release.
What's changed?
The release, 2.5.0-k8 had a bug that caused both the bobby pod and feature store pod to attempt to created each others tables. The conflict eventually resolves itself (so tests did not catch the issue), but this is not preferable behavior. The issue was discovered via manual testing and inspecting the logs of the deployment.
2.5.0-k8
What's New?
The Server Side Feature Store!
- The Feature Store fully functional server side API (@myles-novick )(095aac2)
- Feature Store full SQLAlchemy Implementation (@myles-novick )(da5de71, 310be3e)
- Unified exception handling for FastAPI errors in Splice Machine (@myles-novick )(df7fd92)
- Feature Store unit testing infra and a preliminary suite of tests (@Ben-Epstein )(#87)
- Spark 3 support and Spark 2 revert (@Ben-Epstein )(69e63e2, da62337)
- Added documentation for the feature store (@Ben-Epstein )(f3a6ed0)
Breaking Changes
There should now be any breaking changes this release. Please upgrade your pysplice package to take advantage of the new feature store API.
This release is in tandem with pysplice
There is no upgrade script for this release as no table structures have changed, only new tables have been added.
2.4.0-k8
What's New?
- Major improvement to Database Connection engine for thread safe database connections (@abaveja313 )(#79)
- Datetime columns no longer being converted to dates in SQLAlchemy binds (@Ben-Epstein ) (splicemachine/splice_sqlalchemy#16)
- docker-compose-template.yaml has been moved to a standard docker-compose.yaml so that docker image versions are kept in sync with branches and releases. A
.env
file is now used to manage environment variables that remain private (@Ben-Epstein )(#79) - Full documentation for the README so other people can use the repo. (@Ben-Epstein )(#79)
- SQL Migration script for the new release (@Ben-Epstein ) (#79)
- Feature Store API updates for Beta launch (@Ben-Epstein )(#78)
- Better support for SparkML K means clustering (@Ben-Epstein )(#77)
- Moved call to get_transaction_id from client to server so users don't need the permissions to make the call (@Ben-Epstein )(#75)
- Support for a non-cloud environment to run with ml-workflow, and support for non-k8s environments to not crash the system (@Ben-Epstein )(#63)
- Bobby acts as an operator for K8s deployments, bringing the pods back up after bobby crashes or the databse is paused/resumed (@Ben-Epstein , @sergioferragut )(https://github.com/splicemachine/ml-workflow/pull/62/files)
- The deploy_kubernetes function now waits for the pod to be ready so users know when the endpoint is active (@Ben-Epstein )(https://github.com/splicemachine/ml-workflow/pull/62/files)
This PR is in tandem with the client side pysplice release.
The SQL migration script is attached to the release, and in the releases directory.
2.3.0-k8
What's New?
- Database Deployment Migrated to Server side running on Bobby pod (@abaveja313, @Ben-Epstein )
- Initial K8s deployment code available - known bug with init container hanging, expected to be working in next release (@abaveja313 )
- Models are now logged as MLModels instead of the raw model binary (@abaveja313 )
- Model caching for database deployment (@Ben-Epstein )
- Fix for artifacts downloading without file extension (@Ben-Epstein )
- Model deployment metadata managed by Bobby (@abaveja313 )
BREAKING CHANGES
- The models table no longer exists. The deployment model is instead stored in a new column of the Artifacts table called
database_binary
. You must run the migration scripts to alter the artifacts table, otherwise existing deployments won't work - Models currently saved in the database with
log_model
will not be deployable as we have changed the model saving format from model to MLModel. You must read in the model binary, deserialize it, and re-log the model under a new run.
This release is in tandem with the PySplice release.
Upgrade scripts from 2.2.0 are attached below and available here
PATCH fix for View creation
This is a patch release for 2.2.0-k8, fixing the view creation to avoid write-write conflicts
2.2.0-k8
What's New?
- Stronger AWS Sagemaker deployment support using k8s ServiceAccounts
- Model metadata tracking for in-db deployed models using the MODEL_METADATA and LIVE_MODEL_STATUS table and view
- Support for in-db deployment for Keras linear models (LSTMs/RNNs/CNNs not yet supported).
- Support for in-db deployment XGBoost using H2O/SKlearn implementations
- SKLearn bug fix with fastnumbers
- SKlearn better support for non-double return types
- Upgrade from pickle -> cloudpickle for sklearn model serialization, adding support for both external and lambda functions inside SKLearn Pipelines
- Merge in-db deployment to a 1 table design from a 2-table design. All features + model prediction(s) are stored in a single table
- Support for deploying models to an existing table
- Support for selecting which columns from a table are used in the model prediction. This allows you to deploy models to a "subset" fo a table.
- Better support for in-db deployment for sklearn Pipelines that have predict parameters
deploy_db
api cleanup: Removed model parameter and make run_id required. Model is pulled behind the scenes. DF parameter is optional and not required if deploying model to existing table.- General code cleanup
BREAKING CHANGES
deploy_db
will no longer work with old parameters. New parameter set and order is required.createTable
from thePySpliceContext
now has parameters ordered dataframe, schema_table_name instead of the other way around to match all other APIs in the module.
This release is in tandem with the PySplice release.
Upgrade scripts from 2.1.0 are attached below
UPDATE
Please see the patch release for an important fix.