Skip to content
This repository has been archived by the owner on Apr 15, 2022. It is now read-only.

Releases: splicemachine/pysplice

2.8.0-k8

21 May 19:31
88087f2
Compare
Choose a tag to compare

What's new?

This release has 90 commits and a number of major enhancements.

  • JWT Support for the Feature Store and MLManager model deployment (@myles-novick, #138)
  • MLflow 1.15 upgrade (@Ben-Epstein, #139)
  • New native support for MLModel flavors fastai, spacy, and statsmodels (@Ben-Epstein, #139)
  • feature_exists, feature_set_exists, training_view_exists functions (@Ben-Epstein, #140 #143)
  • Versioning support for training sets (@Ben-Epstein, #144)
  • Get features from feature set function (@Ben-Epstein, #146)
  • Migration from PySpliceContext artifact store to an HTTP Splice Artifact store for mlflow (@Ben-Epstein, #147)
  • Native Feature Search for Jupyter notebook for Feature Store (@Ben-Epstein, #148)
  • Extended get_training_set functions to support returning pandas dataframes and JSON data for users without a Spark Session (@Ben-Epstein, #149)
  • Function in mlflow to get the deployed models in an environment and their current statuses (@Ben-Epstein, #150)

Breaking Changes

None - But a NOTE that you must be on the matching ml-workflow release for these functions to work, especially the HTTP artifact store.

This release is in tandem with the ml-workflow release

2.7.0-k8

06 Apr 22:54
Compare
Choose a tag to compare

What's new?

Nothing specific to this repo has been added, other than the SDK functions that map to the partnered ml-workflow release

You can see all changes from the last release here

Breaking Changes

None

2.6.0-k8

12 Mar 19:32
Compare
Choose a tag to compare

What's New?

  • New feature set design (#119 , @sergioferragut )
  • New attributes parameter to features that allows key value pairs (tags has been changed to a list. See breaking changes) (#120) (@myles-novick )
  • Undeploy Kubernetes function (#121) (@Ben-Epstein )
  • Bug Fix: Notebook history tracking was causing errors running mlflow locally (#122) (@Ben-Epstein )
  • Delete feature sets is now possible in certain scenarios (#124) (@Ben-Epstein )
  • Labels are now allowed in get_training_set without a view, which forces the proper time-consistent joins for training set creation (#125) (@myles-novick )

Breaking Changes

The tags parameter no longer accepts a dictionary, it now accepts a list. This must be changed to the attributes parameter for things to work. The attributes now accepts a dictionary.

This release is in tandem with the ml-workflow release

Spark3 Release

06 Mar 03:45
Compare
Choose a tag to compare

This release is a Spark3 support of 2.5.1-k8.

No other changes were made except adding spark3 support and removing spark2.4 support. All future releases will be spark3 only

2.5.0-k8

24 Feb 15:57
b499e07
Compare
Choose a tag to compare

What's New?

The new Feature Store API!

  • (Nearly) full server side Feature Store API (@myles-novick) (many PRs)
  • New APIs for the feature store for added functionality (delete features, better authentication, summary statistics) (@myles-novick, @Ben-Epstein ) (many PRs)
  • Better support for mlflow native model logging calls (@Ben-Epstein )
  • MLflow watch_job throws an exception when the job fails (@Ben-Epstein )(#113)
  • Upgrade to Spark3, and maintained support for Spark2 (@Ben-Epstein )(#109)

Breaking Changes

The old Feature Store API may still work, but it is highly recommended to switch to the new Server Side Feature Store API. The client side API will no longer be maintained or supported.
This release is in tandem with the ml-workflow release
There is no upgrade script for this release as no table structures have changed, only new tables have been added.

Patch Release for Feature Sture

15 Jan 14:42
55dda8d
Compare
Choose a tag to compare

This is a patch release for 2.4.0-k8

The following features were added:

The following was fixed

  • Case sensitivity: The database's case sensitive column and table names were causing searchability issues. To remedy this, all column names, schema names, and table names are stored as UPPERCASE in the metadata, to match the default state of the database storage. (@sergioferragut )
  • datetime.min (0001-01-01 00:00:00) was causing problems when Spark tried to parse and process it. Because so much of the system runs on Spark, this was causing problems down the stack. To remedy this, we've replaced datetime.min with datetime.datetime('1900-01-01 00:00:00') for unspecified start times on Training Sets. (@sergioferragut , @Ben-Epstein )

2.4.0-k8

07 Jan 19:58
5ddcf9b
Compare
Choose a tag to compare

What's New?

  • K8s deployment has been fixed and stabilized (@Ben-Epstein ) (4521)
  • Feature Store API Beta 1 Release (@sergioferragut , @Ben-Epstein ) (#96)
  • NSDS Merge Into API (@jpanko1 ) (#95)
  • Moved call to get_current_transaction to the server side so users don't need permissions to make that call (@Ben-Epstein ) (#94)
  • Better Pandas support and fileToTable function for uploading data to the database (@Ben-Epstein ) (#93)
  • createAndInsertTable API for NSDS (@Ben-Epstein ) (#92)
  • MLFlow run log history (all cells run in the order they were executed) automatically recorded at the end of a run (@Ben-Epstein ) (#91)
  • Case insensitive column names for NSDS (@jpanko1 ) (#86)
  • MLFlow model support for pyfunc models (@Ben-Epstein ) (#81)

Breaking Changes

  • in mlflow.deploy_db the create_model_table parameter is now defaulted to True.

This release is in tandem with the ml-workflow release
The upgrade script is available here

2.3.0-k8

15 Sep 22:42
b62e8a4
Compare
Choose a tag to compare

What's New?

BREAKING CHANGES

  • The models table no longer exists. The deployment model is instead stored in a new column of the Artifacts table called database_binary. You must run the migration scripts to alter the artifacts table, otherwise existing deployments won't work
  • Models currently saved in the database with log_model will not be deployable as we have changed the model saving format from model to MLModel. You must read in the model binary, deserialize it, and re-log the model under a new run.

This release is in tandem with the ml-workflow release.

Upgrade scripts from 2.2.0 are available here

Release 2.2.0

22 Jun 22:12
504f8f2
Compare
Choose a tag to compare

What's New?

  • Stronger AWS Sagemaker deployment support using k8s ServiceAccounts
  • Model metadata tracking for in-db deployed models using the MODEL_METADATA and LIVE_MODEL_STATUS table and view
  • Support for in-db deployment for Keras linear models (LSTMs/RNNs/CNNs not yet supported).
  • Support for in-db deployment XGBoost using H2O/SKlearn implementations
  • SKLearn bug fix with fastnumbers
  • SKlearn better support for non-double return types
  • Upgrade from pickle -> cloudpickle for sklearn model serialization, adding support for both external and lambda functions inside SKLearn Pipelines
  • Merge in-db deployment to a 1 table design from a 2-table design. All features + model prediction(s) are stored in a single table
  • Support for deploying models to an existing table
  • Support for selecting which columns from a table are used in the model prediction. This allows you to deploy models to a "subset" fo a table.
  • Better support for in-db deployment for sklearn Pipelines that have predict parameters
  • deploy_db api cleanup: Removed model parameter and make run_id required. Model is pulled behind the scenes. DF parameter is optional and not required if deploying model to existing table.
  • General code cleanup

BREAKING CHANGES

  • deploy_db will no longer work with old parameters. New parameter set and order is required.
  • createTable from the PySpliceContext now has parameters ordered dataframe, schema_table_name instead of the other way around to match all other APIs in the module.

This release is in tandem with the ml-workflow release. Upgrade scripts are attached to that release.

Patch Fix for bad insert logic

21 May 00:36
4a88cde
Compare
Choose a tag to compare
Update context.py (#59)

Fix for df insert