-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
failing to use BART models - Breaking the generation loop! #42
Comments
Hi, Could you please provide your training hyperparameters or whole python code? |
Hi @unnir , Sure. Here is the code. We run training on California dataset. In the code below total number of epoch is 8*9. ```python
batch_size = 32
steps = len(data)//batch_size
epochs = [0,1,2,3,4,5,6,7]
columns = data.columns
for epoch in epochs:
for idx, column in enumerate(columns):
print(f'{epoch=} -> {column=}')
great = GReaT(base, # Name of the large language model used (see HuggingFace for more options)
batch_size=batch_size,
epochs=epoch*len(data.columns) + idx + 1, # Number of epochs to train (only one epoch for demonstration)
save_steps=steps, # Save model weights every x steps
logging_steps=steps, # Log the loss and learning rate every x steps
experiment_dir=f"aleks_{llm}_trainer", # Name of the directory where all intermediate steps are saved
)
if epoch == 0 and idx == 0:
trainer = great.fit(data, conditional_col=column)
else:
trainer = great.fit(data, conditional_col=column, resume_from_checkpoint=True)
rmtree(Path(f"aleks_{llm}_trainer")/f"checkpoint-{epoch*len(data.columns)*steps + idx*steps}")
great.save(f"aleks_california_{llm}")
for path in Path(f"aleks_{llm}_trainer").iterdir():
if path.is_dir():
print(f'{path=}')
|
My suggestion, again, is to train the model longer, but I will try to reproduce the error and debug it. |
Hi,
I'm trying to use f.eks, 'sshleifer/distilbart-cnn-6-6' and failing. Following message:
An error has occurred: Breaking the generation loop! To address this issue, consider fine-tuning the GReaT model for an longer period. This can be achieved by increasing the number of epochs. Alternatively, you might consider increasing the max_length parameter within the sample function. For example: model.sample(n_samples=10, max_length=2000) If the problem persists despite these adjustments, feel free to raise an issue on our GitHub page at: https://github.com/kathrinse/be_great/issues
Aleksandar
The text was updated successfully, but these errors were encountered: