Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding Forest-Flow: Generating and Imputing Tabular Data via Diffusion and Flow-based Gradient-Boosted Trees #69

Merged
merged 30 commits into from
Nov 26, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
5f3896c
add optional time vector in sample_location_and_conditional_flow
kilianFatras Nov 3, 2023
f51f52b
ForestFlow example
AlexiaJM Nov 3, 2023
031f5e8
add initial test for t Tensor
kilianFatras Nov 6, 2023
e394b02
fix bugs pad_t_like_x in FM and SI classes
kilianFatras Nov 6, 2023
a659457
add tests over guidance functions
kilianFatras Nov 6, 2023
06097b1
Update runner-requirements.txt with forestdiffusion
AlexiaJM Nov 6, 2023
b714d9c
Update setup.py
AlexiaJM Nov 6, 2023
4d88b81
Update version.py
AlexiaJM Nov 6, 2023
8a37334
remove installs
AlexiaJM Nov 6, 2023
3fce406
update tests with pytest
kilianFatras Nov 6, 2023
448f14b
add forest-flow to requirement, changed example to forest-flow in set…
AlexiaJM Nov 7, 2023
96bba7f
call ForestDiffusion as ForestFlow
AlexiaJM Nov 7, 2023
555e9a2
Merge branch 'forest_flow' into forest_flow
AlexiaJM Nov 7, 2023
30b37c8
Merge pull request #66 from AlexiaJM/forest_flow
kilianFatras Nov 10, 2023
c04fbd4
update indentation, add seeds and remove unsued cells
kilianFatras Nov 10, 2023
64e63ac
update Forest-Flow description in notebooks
kilianFatras Nov 10, 2023
39450a3
fix readme typo
kilianFatras Nov 10, 2023
2f85d0a
typo in implemented papers
kilianFatras Nov 10, 2023
f8fd3b4
conflicts with main branch readme
kilianFatras Nov 10, 2023
4dd566d
provide more details in the notebook
kilianFatras Nov 10, 2023
4654873
initial Readme in tabular example folder
kilianFatras Nov 10, 2023
1ae5690
update Readme
kilianFatras Nov 10, 2023
703dbb6
Merge branch 'main' into forest_flow
kilianFatras Nov 10, 2023
6acfca7
pep 8
kilianFatras Nov 10, 2023
8fbf8f7
change test name folder
kilianFatras Nov 14, 2023
a134c52
Delete tests/test directory
kilianFatras Nov 14, 2023
3c7165f
remove unnecessary requirements
kilianFatras Nov 23, 2023
d3fe6f2
Merge branch 'main' into forest_flow
kilianFatras Nov 23, 2023
42273f3
corrects SB test t bug
kilianFatras Nov 23, 2023
cac49f1
pep8 in setup
kilianFatras Nov 23, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@

## Description

Conditional Flow Matching (CFM) is a fast way to train continuous normalizing flow (CNF) models. CFM is a simulation-free training objective for continuous normalizing flows that allows conditional generative modeling and speeds up training and inference. CFM's performance closes the gap between CNFs and diffusion models. To spread its use within the machine learning community, we have built a library focused on Flow Matching methods: TorchCFM. TorchCFM is a library showing how Flow Matching methods can be trained and used to deal with image generation, single-cell dynamics and (soon) SO(3) data and tabular data.
Conditional Flow Matching (CFM) is a fast way to train continuous normalizing flow (CNF) models. CFM is a simulation-free training objective for continuous normalizing flows that allows conditional generative modeling and speeds up training and inference. CFM's performance closes the gap between CNFs and diffusion models. To spread its use within the machine learning community, we have built a library focused on Flow Matching methods: TorchCFM. TorchCFM is a library showing how Flow Matching methods can be trained and used to deal with image generation, single-cell dynamics, tabular data and soon SO(3) data.

<p align="center">
<img src="assets/169_generated_samples_otcfm.png" width="600"/>
Expand Down Expand Up @@ -107,8 +107,8 @@ List of implemented papers:
- Building Normalizing Flows with Stochastic Interpolants (Albergo et al. 2023a) [Paper](https://openreview.net/forum?id=li7qeBbCR1t)
- Action Matching: Learning Stochastic Dynamics From Samples (Neklyudov et al. 2022) [Paper](https://arxiv.org/abs/2210.06662) [Code](https://github.com/necludov/jam)
- Concurrent work to our OT-CFM method: Multisample Flow Matching: Straightening Flows with Minibatch Couplings (Pooladian et al. 2023) [Paper](https://arxiv.org/abs/2304.14772)
- Soon: SE(3)-Stochastic Flow Matching for Protein Backbone Generation (Bose et al.) [paper](https://arxiv.org/abs/2310.02391)
- Soon: Generating and Imputing Tabular Data via Diffusion and Flow-based Gradient-Boosted Trees (Jolicoeur-Martineau et al.) [paper](https://arxiv.org/abs/2309.09968)
- Generating and Imputing Tabular Data via Diffusion and Flow-based Gradient-Boosted Trees (Jolicoeur-Martineau et al.) [Paper](https://arxiv.org/abs/2309.09968) [Code](https://github.com/SamsungSAILMontreal/ForestDiffusion)
- Soon: SE(3)-Stochastic Flow Matching for Protein Backbone Generation (Bose et al.) [Paper](https://arxiv.org/abs/2310.02391)

## How to run

Expand Down Expand Up @@ -155,7 +155,7 @@ python -m ipykernel install --user --name=torchcfm

## Project Structure

The directory structure of a new project looks like this:
The directory structure looks like this:

```

Expand Down
2 changes: 2 additions & 0 deletions examples/notebooks/mnist_example.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -253,6 +253,8 @@
"outputs": [],
"source": [
"# follows example from https://github.com/google-research/torchsde/blob/master/examples/cont_ddpm.py\n",
"\n",
"\n",
"class SDE(torch.nn.Module):\n",
" noise_type = \"diagonal\"\n",
" sde_type = \"ito\"\n",
Expand Down
2 changes: 2 additions & 0 deletions examples/notebooks/training-8gaussians-to-moons.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -843,6 +843,8 @@
],
"source": [
"# %%time\n",
"\n",
"\n",
"class MLP2(torch.nn.Module):\n",
" def __init__(self, dim, out_dim=None, w=64, time_varying=False):\n",
" super().__init__()\n",
Expand Down
20 changes: 20 additions & 0 deletions examples/tabular/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# Forest-Flow experiment on the Iris dataset using TorchCFM

This notebook is a self-contained example showing how to train the novel Forest-Flow method to generate tabular data [(Jolicoeur-Martineau et al. 2023)](https://arxiv.org/abs/2309.09968). The idea behind Forest-Flow is to **learn Independent Conditional Flow-Matching's vector field with XGBoost models** instead of neural networks. The motivation is that it is known that Forests work currently better than neural networks on Tabular data tasks. This idea comes with some difficulties, for instance how to approximate Flow Matching's loss, and this notebook shows how to do it on a minimal example. The method, its training procedure and the experiments are described in [(Jolicoeur-Martineau et al. 2023)](https://arxiv.org/abs/2309.09968). The full code can be found [here](https://github.com/SamsungSAILMontreal/ForestDiffusion).

To run our jupyter notebooks, installing our package:

```bash
cd ../../

# install torchcfm
pip install -e '.[forest-flow]'

# install ipykernel
conda install -c anaconda ipykernel

# install conda env in jupyter notebook
python -m ipykernel install --user --name=torchcfm

# launch our notebooks with the torchcfm kernel
```
Loading
Loading