This release adds support for Python 3.9, and updates dependencies to accept the latest versions when possible.
- Add support for Python 3.9 - Issue #127 by @katxiao
- Add pip check worflow - Issue #124 by @pvk-developer
- Fix meta.yaml dependencies - PR #119 by @fealho
- Upgrade dependency ranges - Issue #118 by @katxiao
This release fixed a bug where passing a json
file as configuration for a multi-table synthesizer crashed the model.
It also adds a number of fixes and enhancements, including: (1) a function and CLI command to list the available synthesizer names,
(2) a curate set of dependencies and making Gretel
into an optional dependency, (3) updating Gretel
to use temp directories,
(4) using nvidia-smi
to get the number of gpus and (5) multiple dockerfile
updates to improve functionality.
- Bug when using JSON configuration for multiple multi-table evaluation - Issue #115 by @pvk-developer
- Use nvidia-smi to get number of gpus - PR #113 by @katxiao
- List synthesizer names - Issue #82 by @fealho
- Use nvidia base for dockerfile - PR #108 by @katxiao
- Add Makefile target to install gretel and ydata - PR #107 by @katxiao
- Curate dependencies and make Gretel optional - PR #106 by @csala
- Update gretel checkpoints to use temp directory - PR #105 by @katxiao
- Initialize variable before reference - PR #104 by @katxiao
This release adds new synthesizers for Gretel and ydata, and creates a Docker image for SDGym. It also includes enhancements to the accepted SDGym arguments, adds a summary command to aggregate metrics, and adds the normalized score to the benchmark results.
- Add normalized score to benchmark results - Issue #102 by @katxiao
- Add max rows and max columns args - Issue #96 by @katxiao
- Automatically detect number of workers - Issue #97 by @katxiao
- Add summary function and command - Issue #92 by @amontanez24
- Allow jobs list/JSON to be passed - Issue #93 by @fealho
- Add ydata to sdgym - Issue #90 by @fealho
- Add dockerfile for sdgym - Issue #88 by @katxiao
- Add Gretel to SDGym synthesizer - Issue #87 by @amontanez24
This release adds new features to store results and cache contents into an S3 bucket as well as a script to collect results from a cache dir and compile a single results CSV file.
- Collect cached results from s3 bucket - Issue #85 by @katxiao
- Store cache contents into an S3 bucket - Issue #81 by @katxiao
- Store SDGym results into an S3 bucket - Issue #80 by @katxiao
- Add a way to collect cached results - Issue #79 by @katxiao
- Allow reading datasets from private s3 bucket - Issue #74 by @katxiao
- Typos in the sdgym.run function docstring documentation - Issue #69 by @sbrugman
Major rework of the SDGym functionality to support a collection of new features:
- Add relational and timeseries model benchmarking.
- Use SDMetrics for model scoring.
- Update datasets format to match SDV metadata based storage format.
- Centralize default datasets collection in the
sdv-datasets
S3 bucket. - Add options to download and use datasets from different S3 buckets.
- Rename synthesizers to baselines and adapt to the new metadata format.
- Add model execution and metric computation time logging.
- Add optional synthetic data and error traceback caching.
This version adds a rework of the benchmark function and a few new synthesizers.
- New CLI with
run
,make-leaderboard
andmake-summary
commands - Parallel execution via Dask or Multiprocessing
- Download datasets without executing the benchmark
- Support for python from 3.6 to 3.8
sdv.tabular.CTGAN
sdv.tabular.CopulaGAN
sdv.tabular.GaussianCopulaOneHot
sdv.tabular.GaussianCopulaCategorical
sdv.tabular.GaussianCopulaCategoricalFuzzy
New updated leaderboard and minor improvements.
- Add parameters for PrivBNSynthesizer - Issue #37 by @csala
New Becnhmark API and lots of improved documentation.
- The benchmark function now returns a complete leaderboard instead of only one score
- Class Synthesizers can be directly passed to the benchmark function
- One hot encoding errors in the Independent, VEEGAN and Medgan Synthesizers.
- Proper usage of the
eval
mode during sampling. - Fix improperly configured datasets.
First release to PyPi