You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
제가 A6000 GPU를 사용하여 트레이닝을 진행하였는데, 몇 가지 에러가 발생하였습니다.
"python3 run_squad.py --task korquad --config_file xlm-roberta.json"를 실행하였을 때 진행되다 1/3 지점에서 발생한 문제인데 해결하는 방법에 대해 여쭤보고자 합니다.
좋은 트레이닝 모델을 제공해주셔서 감사합니다.
05/12/2022 09:45:19 - INFO - main - ***** Running training *****
05/12/2022 09:45:19 - INFO - main - Num examples = 65811
05/12/2022 09:45:19 - INFO - main - Num Epochs = 7
05/12/2022 09:45:19 - INFO - main - Train batch size per GPU = 4
05/12/2022 09:45:19 - INFO - main - Total train batch size (w. parallel, distributed & accumulation) = 4
05/12/2022 09:45:19 - INFO - main - Gradient Accumulation steps = 1
05/12/2022 09:45:19 - INFO - main - Total optimization steps = 115171
05/12/2022 09:52:44 - INFO - main - Loading features from cached file data/cached_dev_xlm-roberta-base_512
05/12/2022 09:52:48 - INFO - main - ***** Running evaluation 4000 *****
05/12/2022 09:52:48 - INFO - main - Num examples = 6742
05/12/2022 09:52:48 - INFO - main - Batch size = 16
Traceback (most recent call last):-------------------------------| 0.00% [0/422 00:00<00:00]
File "run_squad.py", line 483, in
main(cli_args)
File "run_squad.py", line 435, in main
global_step, tr_loss = train(args, train_dataset, model, tokenizer)
File "run_squad.py", line 184, in train
results = evaluate(args, model, tokenizer, global_step=global_step)
File "run_squad.py", line 259, in evaluate
output = [to_list(output[i]) for output in outputs]
File "run_squad.py", line 259, in
output = [to_list(output[i]) for output in outputs]
File "run_squad.py", line 58, in to_list
return tensor.detach().cpu().tolist()
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
제가 A6000 GPU를 사용하여 트레이닝을 진행하였는데, 몇 가지 에러가 발생하였습니다.
"python3 run_squad.py --task korquad --config_file xlm-roberta.json"를 실행하였을 때 진행되다 1/3 지점에서 발생한 문제인데 해결하는 방법에 대해 여쭤보고자 합니다.
좋은 트레이닝 모델을 제공해주셔서 감사합니다.
05/12/2022 09:45:19 - INFO - main - ***** Running training *****
05/12/2022 09:45:19 - INFO - main - Num examples = 65811
05/12/2022 09:45:19 - INFO - main - Num Epochs = 7
05/12/2022 09:45:19 - INFO - main - Train batch size per GPU = 4
05/12/2022 09:45:19 - INFO - main - Total train batch size (w. parallel, distributed & accumulation) = 4
05/12/2022 09:45:19 - INFO - main - Gradient Accumulation steps = 1
05/12/2022 09:45:19 - INFO - main - Total optimization steps = 115171
05/12/2022 09:52:44 - INFO - main - Loading features from cached file data/cached_dev_xlm-roberta-base_512
05/12/2022 09:52:48 - INFO - main - ***** Running evaluation 4000 *****
05/12/2022 09:52:48 - INFO - main - Num examples = 6742
05/12/2022 09:52:48 - INFO - main - Batch size = 16
Traceback (most recent call last):-------------------------------| 0.00% [0/422 00:00<00:00]
File "run_squad.py", line 483, in
main(cli_args)
File "run_squad.py", line 435, in main
global_step, tr_loss = train(args, train_dataset, model, tokenizer)
File "run_squad.py", line 184, in train
results = evaluate(args, model, tokenizer, global_step=global_step)
File "run_squad.py", line 259, in evaluate
output = [to_list(output[i]) for output in outputs]
File "run_squad.py", line 259, in
output = [to_list(output[i]) for output in outputs]
File "run_squad.py", line 58, in to_list
return tensor.detach().cpu().tolist()
Beta Was this translation helpful? Give feedback.
All reactions