-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GFN LLM FineTuning #191
Comments
Introductiongflownet-finetuning of language models https://arxiv.org/abs/2310.04363 consists of fine-tuning the parameters of an LLM transformer (say decoder-only) so that it generates sequences (of tokens) that satisfy some constraints or preferences. This entails that:
States and actionsThe goal is to take inspiration from the code that accompanies the original paper to implement gflownet-finetuning with torchgfn. Following what’s written above, it’s clear that the environment corresponds to the set of all possible sequences (say up to a certain limit). Actions correspond to adding a token. Therefore, the action space, should correspond to the vocabulary. Different language models have different tokenizers, with different total number of tokens. It is therefore important to have as a parameter, the total number of tokens. States correspond to sequences of tokens. States can be represented as:
Fortunately, the preprocessor object needed to define an environment can handle casting between these two types! First steps towards a PRFollowing the introduction above, one can implement the environment, that takes as input the language model and the tokenizer, and defines the preprocessor, states class, and actions class. It is important to look at To test the implementation, we should be able to instantiate the environment with an arbitrary language model / tokenizer, and generate random sequences. The generation will require using the transformers library of huggingface. |
We need to re-implement this paper as part of the torchgfn library: https://github.com/GFNOrg/gfn-lm-tuning
In particular many quality of life improvements would be welcome, such as being able to initialize a policy from a huggingface model with a clean API, etc.
EDIT by @saleml : See proposed motivation and plan in the first comment below
The text was updated successfully, but these errors were encountered: