Releases: splicemachine/pysplice
2.8.0-k8
What's new?
This release has 90 commits and a number of major enhancements.
- JWT Support for the Feature Store and MLManager model deployment (@myles-novick, #138)
- MLflow 1.15 upgrade (@Ben-Epstein, #139)
- New native support for MLModel flavors fastai, spacy, and statsmodels (@Ben-Epstein, #139)
- feature_exists, feature_set_exists, training_view_exists functions (@Ben-Epstein, #140 #143)
- Versioning support for training sets (@Ben-Epstein, #144)
- Get features from feature set function (@Ben-Epstein, #146)
- Migration from PySpliceContext artifact store to an HTTP Splice Artifact store for mlflow (@Ben-Epstein, #147)
- Native Feature Search for Jupyter notebook for Feature Store (@Ben-Epstein, #148)
- Extended get_training_set functions to support returning pandas dataframes and JSON data for users without a Spark Session (@Ben-Epstein, #149)
- Function in mlflow to get the deployed models in an environment and their current statuses (@Ben-Epstein, #150)
Breaking Changes
None - But a NOTE that you must be on the matching ml-workflow release for these functions to work, especially the HTTP artifact store.
This release is in tandem with the ml-workflow release
2.7.0-k8
What's new?
Nothing specific to this repo has been added, other than the SDK functions that map to the partnered ml-workflow release
You can see all changes from the last release here
Breaking Changes
None
2.6.0-k8
What's New?
- New feature set design (#119 , @sergioferragut )
- New
attributes
parameter to features that allows key value pairs (tags
has been changed to a list. See breaking changes) (#120) (@myles-novick ) - Undeploy Kubernetes function (#121) (@Ben-Epstein )
- Bug Fix: Notebook history tracking was causing errors running mlflow locally (#122) (@Ben-Epstein )
- Delete feature sets is now possible in certain scenarios (#124) (@Ben-Epstein )
- Labels are now allowed in get_training_set without a view, which forces the proper time-consistent joins for training set creation (#125) (@myles-novick )
Breaking Changes
The tags
parameter no longer accepts a dictionary, it now accepts a list. This must be changed to the attributes parameter
for things to work. The attributes now accepts a dictionary.
This release is in tandem with the ml-workflow release
Spark3 Release
This release is a Spark3 support of 2.5.1-k8.
No other changes were made except adding spark3 support and removing spark2.4 support. All future releases will be spark3 only
2.5.0-k8
What's New?
The new Feature Store API!
- (Nearly) full server side Feature Store API (@myles-novick) (many PRs)
- New APIs for the feature store for added functionality (delete features, better authentication, summary statistics) (@myles-novick, @Ben-Epstein ) (many PRs)
- Better support for mlflow native model logging calls (@Ben-Epstein )
- MLflow watch_job throws an exception when the job fails (@Ben-Epstein )(#113)
- Upgrade to Spark3, and maintained support for Spark2 (@Ben-Epstein )(#109)
Breaking Changes
The old Feature Store API may still work, but it is highly recommended to switch to the new Server Side Feature Store API. The client side API will no longer be maintained or supported.
This release is in tandem with the ml-workflow release
There is no upgrade script for this release as no table structures have changed, only new tables have been added.
Patch Release for Feature Sture
This is a patch release for 2.4.0-k8
The following features were added:
- Drift detection (@sergioferragut )
- Organized utilities modules for helpful functions of drift detection and training view SQL creation (@Ben-Epstein , @sergioferragut )
The following was fixed
- Case sensitivity: The database's case sensitive column and table names were causing searchability issues. To remedy this, all column names, schema names, and table names are stored as UPPERCASE in the metadata, to match the default state of the database storage. (@sergioferragut )
datetime.min
(0001-01-01 00:00:00) was causing problems when Spark tried to parse and process it. Because so much of the system runs on Spark, this was causing problems down the stack. To remedy this, we've replaceddatetime.min
withdatetime.datetime('1900-01-01 00:00:00')
for unspecified start times on Training Sets. (@sergioferragut , @Ben-Epstein )
2.4.0-k8
What's New?
- K8s deployment has been fixed and stabilized (@Ben-Epstein ) (4521)
- Feature Store API Beta 1 Release (@sergioferragut , @Ben-Epstein ) (#96)
- NSDS Merge Into API (@jpanko1 ) (#95)
- Moved call to get_current_transaction to the server side so users don't need permissions to make that call (@Ben-Epstein ) (#94)
- Better Pandas support and fileToTable function for uploading data to the database (@Ben-Epstein ) (#93)
- createAndInsertTable API for NSDS (@Ben-Epstein ) (#92)
- MLFlow run log history (all cells run in the order they were executed) automatically recorded at the end of a run (@Ben-Epstein ) (#91)
- Case insensitive column names for NSDS (@jpanko1 ) (#86)
- MLFlow model support for pyfunc models (@Ben-Epstein ) (#81)
Breaking Changes
- in
mlflow.deploy_db
thecreate_model_table
parameter is now defaulted to True.
This release is in tandem with the ml-workflow release
The upgrade script is available here
2.3.0-k8
What's New?
- The External Native Spark Datasource API is now available (@jpanko1 )
- Added functions to
splicemachine.notebook
to access the Spark UI and the Mlflow UI (@Ben-Epstein ) - Python Dependency fixes for the October 2020 pip changes (@Ben-Epstein )
- More graceful [errors] for unsupported models (#74) (@Ben-Epstein )
- Better checking for spark datatypes (@Ben-Epstein, @ZachC16 )
- Deployment support for non-pipeline models (@Ben-Epstein, @ZachC16 )
- Support for Linear Support Vector Machine Spark Model (@Ben-Epstein, @ZachC16 )
- Better unit testing (@Ben-Epstein @ZachC16)
- New warning passed on Keras and Spark models when the number of label columns passed in doesn't match model (@Ben-Epstein, @domclassen )
- Database Deployment Migrated to Server side running on Bobby pod (@abaveja313, @Ben-Epstein )
- Initial K8s deployment code available - known bug with init container hanging, expected to be working in next release (@abaveja313 )
- Models are now logged as MLModels instead of the raw model binary (@abaveja313 )
- Model caching for database deployment (@Ben-Epstein @sergioferragut )
- Fix for artifacts downloading without file extension (@Ben-Epstein )
- Model deployment metadata managed by Bobby (@abaveja313 )
BREAKING CHANGES
- The models table no longer exists. The deployment model is instead stored in a new column of the Artifacts table called
database_binary
. You must run the migration scripts to alter the artifacts table, otherwise existing deployments won't work - Models currently saved in the database with
log_model
will not be deployable as we have changed the model saving format from model to MLModel. You must read in the model binary, deserialize it, and re-log the model under a new run.
This release is in tandem with the ml-workflow release.
Release 2.2.0
What's New?
- Stronger AWS Sagemaker deployment support using k8s ServiceAccounts
- Model metadata tracking for in-db deployed models using the MODEL_METADATA and LIVE_MODEL_STATUS table and view
- Support for in-db deployment for Keras linear models (LSTMs/RNNs/CNNs not yet supported).
- Support for in-db deployment XGBoost using H2O/SKlearn implementations
- SKLearn bug fix with fastnumbers
- SKlearn better support for non-double return types
- Upgrade from pickle -> cloudpickle for sklearn model serialization, adding support for both external and lambda functions inside SKLearn Pipelines
- Merge in-db deployment to a 1 table design from a 2-table design. All features + model prediction(s) are stored in a single table
- Support for deploying models to an existing table
- Support for selecting which columns from a table are used in the model prediction. This allows you to deploy models to a "subset" fo a table.
- Better support for in-db deployment for sklearn Pipelines that have predict parameters
deploy_db
api cleanup: Removed model parameter and make run_id required. Model is pulled behind the scenes. DF parameter is optional and not required if deploying model to existing table.- General code cleanup
BREAKING CHANGES
deploy_db
will no longer work with old parameters. New parameter set and order is required.createTable
from thePySpliceContext
now has parameters ordered dataframe, schema_table_name instead of the other way around to match all other APIs in the module.
This release is in tandem with the ml-workflow release. Upgrade scripts are attached to that release.
Patch Fix for bad insert logic
Update context.py (#59) Fix for df insert