Problem about training and sampling with the flow-matching model #2

Saoge123 · 2024-01-23T08:10:44Z

Hi:
Thanks for your greate work! There are some problems when we use your code (supplementary materials) to train a model and generate molecules.

The performance of the flow-matching (FM) model seems to be inconsistent with the statements in your paper. Such as the following figure, the FM model (cyan) is much worse than the EDM (puple). Although the number of train steps of FM model is much less than EDM, the trend of metrics is clear.
We also try to generate some molecules, but the ckpt of pretrained FM model can not be found in your code.

Saoge123 · 2024-01-23T08:14:35Z

the training configuration in the following:
args.n_epochs = 3000
args.exp_name = 'edm_qm9'
args.n_stability_samples = 1000
args.diffusion_noise_schedule = 'polynomial_2'
args.diffusion_noise_precision = 1e-5
args.diffusion_steps = 1000
args.diffusion_loss_type = 'l2'
args.batch_size = 64
args.nf = 256
args.n_layers = 9
args.lr = 1e-4
args.normalize_factors = [1,4,10]
args.test_epochs = 20
args.ema_decay = 0.9999
args.probabilistic_model = 'flow_matching'
args.node_classifier_model_ckpt = ''

jingjing-gong · 2024-01-23T08:18:07Z

the checkpoints path: efm_gen/checkpoints/generative_model_ema_0.npy

jingjing-gong · 2024-01-23T08:28:22Z

We are still making effort to re-format the code to make it more readable. We will also release reproducible training script by then.

Saoge123 · 2024-01-23T08:59:05Z

thanks for your quickly reply, args.pickle is needed when we run eval_sample.py.

jingjing-gong · 2024-01-23T09:21:01Z

thanks for your quickly reply, args.pickle is needed when we run eval_sample.py.

Sorry for the sloppiness, we have uploaded an args.pickle to google drive, here is the link: https://drive.google.com/file/d/1ebAcJ79AMeYq1uzcmcnVBUFIYn92--nt/view?usp=drive_link

Hope you find it useful :-)

jingjing-gong · 2024-01-23T13:06:20Z

Is the checkpoint working out for you?

Saoge123 · 2024-01-24T07:23:05Z

Thanks very much for your help! The sampling code is working out, we are looking forward to your re-formated code for training.

Frankie123421 · 2024-02-04T02:04:28Z

Hi, thanks for your nice work. I've trained the model on qm9 datasets using the pre-released code of "supplementary materials" and the hyperparameters from the "args.pickle", and I found that the performance is still worse than expected. Specifically, the validity and molecule stability converge to approximately 0.87 and 0.77 respectively. Even though the number of epochs is only around 720, it appears that further training may result in either no improvement or only marginal gains, based on the observation of the curve trend. (However, the checkpoint is at around 3000 epochs?) Could you please provide some tips on how to further enhance the performance? Thanks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problem about training and sampling with the flow-matching model #2

Problem about training and sampling with the flow-matching model #2

Saoge123 commented Jan 23, 2024

Saoge123 commented Jan 23, 2024

jingjing-gong commented Jan 23, 2024

jingjing-gong commented Jan 23, 2024

Saoge123 commented Jan 23, 2024

jingjing-gong commented Jan 23, 2024

jingjing-gong commented Jan 23, 2024

Saoge123 commented Jan 24, 2024

Frankie123421 commented Feb 4, 2024

Problem about training and sampling with the flow-matching model #2

Problem about training and sampling with the flow-matching model #2

Comments

Saoge123 commented Jan 23, 2024

Saoge123 commented Jan 23, 2024

jingjing-gong commented Jan 23, 2024

jingjing-gong commented Jan 23, 2024

Saoge123 commented Jan 23, 2024

jingjing-gong commented Jan 23, 2024

jingjing-gong commented Jan 23, 2024

Saoge123 commented Jan 24, 2024

Frankie123421 commented Feb 4, 2024