Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About the rich feedback model release #3

Open
srymaker opened this issue Jun 20, 2024 · 9 comments
Open

About the rich feedback model release #3

srymaker opened this issue Jun 20, 2024 · 9 comments

Comments

@srymaker
Copy link

Thanks for your great work!
Will the rich feedback model be released?I'd love to test and experience the model and apply it to my own tasks!

@leebird
Copy link
Collaborator

leebird commented Jun 22, 2024

Hello, currently we don't have the plan to release the model.

@densechen
Copy link

@leebird Looking forward to the rich feedback model...

@udrs
Copy link

udrs commented Jun 25, 2024

Looking forward

@leebird
Copy link
Collaborator

leebird commented Jun 25, 2024

Thanks for all the interests in our work! Due to company policies (related to productization etc.) we could not open source the model. We have included details of how to reproduce the results in our paper. If you have further questions please email the corresponding authors, and we'd be happy to help you reproduce the results.

@srymaker
Copy link
Author

Hello, can you tell me how you trained the reward model, like which layers were frozen and which tuning method was used?

@leebird
Copy link
Collaborator

leebird commented Jun 29, 2024

Hi @srymaker , we finetuned all the layers in the model, including the ViT component. We tried freezing the ViT component but it didn't work well, especially for the heatmap tasks. Experiment details including hyperparameters and optimizer can be found in Section 9 in the paper.

@srymaker
Copy link
Author

srymaker commented Jul 1, 2024

Thank you for your answer. Do all layers refer to the encoder and decoder in t5?

@leebird
Copy link
Collaborator

leebird commented Jul 7, 2024

@srymaker yes, all the layers are from the ViT and T5 encoder/decoder. Note there is a pretraining stage for the ViT and T5 layers on multimodal data as they were originally pretrained on unimodal data only.

@Nieleilei
Copy link

@srymaker yes, all the layers are from the ViT and T5 encoder/decoder. Note there is a pretraining stage for the ViT and T5 layers on multimodal data as they were originally pretrained on unimodal data only.

Hello, For single-modal pretraining tasks, are only the natural image captioning tasks on the WebLI dataset used? What other tasks are included?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants