You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I was trying to follow the tutorial in the notebook. When I change the yaml config gpu-id: -1 to gpu-id: 0 which should enable GPU training, an error occured. Following are the log output and the error info:
2020-05-24 13:00:57.181 [root setup:380] This is run ID: c5854f3d72844dd8b842c49c4a29f9fc
2020-05-24 13:00:57.181 [root setup:383] Inside experiment ID: 0 (None)
2020-05-24 13:00:57.182 [root setup:386] Local output directory is: runs/nuqe
2020-05-24 13:00:57.182 [root setup:389] Logging execution to MLflow at: None
2020-05-24 13:00:57.186 [root setup:395] Using GPU: 0
2020-05-24 13:00:57.186 [root setup:400] Artifacts location: None
2020-05-24 13:00:57.193 [kiwi.lib.train run:154] Training the NuQE model
2020-05-24 13:00:59.819 [kiwi.lib.train run:187] NuQE(
(_loss): CrossEntropyLoss()
(source_emb): Embedding(6437, 50, padding_idx=1)
(target_emb): Embedding(7493, 50, padding_idx=1)
(embeddings_dropout): Dropout(p=0.5, inplace=False)
(linear_1): Linear(in_features=300, out_features=400, bias=True)
(linear_2): Linear(in_features=400, out_features=400, bias=True)
(linear_3): Linear(in_features=400, out_features=200, bias=True)
(linear_4): Linear(in_features=200, out_features=200, bias=True)
(linear_5): Linear(in_features=400, out_features=100, bias=True)
(linear_6): Linear(in_features=100, out_features=50, bias=True)
(linear_out): Linear(in_features=50, out_features=2, bias=True)
(gru_1): GRU(400, 200, batch_first=True, bidirectional=True)
(gru_2): GRU(200, 200, batch_first=True, bidirectional=True)
(dropout_in): Dropout(p=0.0, inplace=False)
(dropout_out): Dropout(p=0.0, inplace=False)
)
2020-05-24 13:00:59.819 [kiwi.lib.train run:188] 2347752 parameters
2020-05-24 13:00:59.819 [kiwi.trainers.trainer run:75] Epoch 1 of 3
2020-05-24 13:01:13.122 [kiwi.metrics.stats log:60] tags_F1_MULT: 0.0275, tags_F1_OK: 0.9294, tags_F1_BAD: 0.0296, tags_CORRECT: 0.8683, loss_loss: 892.0779
2020-05-24 13:01:26.385 [kiwi.metrics.stats log:60] tags_F1_MULT: 0.1496, tags_F1_OK: 0.9225, tags_F1_BAD: 0.1622, tags_CORRECT: 0.8582, loss_loss: 835.9351
Batches: 100%|██████████████████████████| 211/211 [00:27<00:00, 7.58 batches/s]
2020-05-24 13:01:27.717 [kiwi.metrics.stats log:60] tags_F1_MULT: 0.2363, tags_F1_OK: 0.8934, tags_F1_BAD: 0.2645, tags_CORRECT: 0.8139, loss_loss: 786.3296
2020-05-24 13:01:29.716 [kiwi.metrics.stats log:60] EVAL_tags_F1_MULT: 0.2828, EVAL_tags_F1_OK: 0.9003, EVAL_tags_F1_BAD: 0.3141, EVAL_tags_CORRECT: 0.8259, EVAL_loss_loss: 789.3109
2020-05-24 13:01:29.717 [root save:183] Saving training state to runs/nuqe/epoch_1
2020-05-24 13:01:29.829 [root save_latest:241] Saving training state to runs/nuqe/temp_latest_epoch
2020-05-24 13:01:29.830 [kiwi.trainers.callbacks _remove_snapshot:178] Removing previous snapshot: runs/nuqe/latest_epoch
2020-05-24 13:01:29.830 [kiwi.trainers.callbacks save_latest:252] Moving runs/nuqe/temp_latest_epoch to runs/nuqe/latest_epoch
Traceback (most recent call last):
File "/opt/conda/bin/kiwi", line 8, in <module>
sys.exit(main())
File "/opt/conda/lib/python3.7/site-packages/kiwi/__main__.py", line 22, in main
return kiwi.cli.main.cli()
File "/opt/conda/lib/python3.7/site-packages/kiwi/cli/main.py", line 71, in cli
train.main(extra_args)
File "/opt/conda/lib/python3.7/site-packages/kiwi/cli/pipelines/train.py", line 142, in main
train.train_from_options(options)
File "/opt/conda/lib/python3.7/site-packages/kiwi/lib/train.py", line 123, in train_from_options
trainer = run(ModelClass, output_dir, pipeline_options, model_options)
File "/opt/conda/lib/python3.7/site-packages/kiwi/lib/train.py", line 204, in run
trainer.run(train_iter, valid_iter, epochs=pipeline_options.epochs)
File "/opt/conda/lib/python3.7/site-packages/kiwi/trainers/trainer.py", line 79, in run
self.checkpointer(self, valid_iterator, epoch=epoch)
File "/opt/conda/lib/python3.7/site-packages/kiwi/trainers/callbacks.py", line 115, in __call__
predictions = trainer.predict(valid_iterator)
File "/opt/conda/lib/python3.7/site-packages/kiwi/trainers/trainer.py", line 167, in predict
model_pred = self.model.predict(batch)
File "/opt/conda/lib/python3.7/site-packages/kiwi/models/model.py", line 137, in predict
mask = self.get_mask(batch, input_key)
File "/opt/conda/lib/python3.7/site-packages/kiwi/models/model.py", line 205, in get_mask
input_tensor != pad_id, dtype=torch.uint8
RuntimeError: expected device cuda:0 but got device cpu
Thanks!
Tim
The text was updated successfully, but these errors were encountered:
I had a look, and found out that the problem exists in openkiwi 0.1.2. It has been fixed in the latest openkiwi release 0.1.3. The simple fix for this tutorial is to change the openkiwi version in requirements.txt file from 0.1.2 to 0.1.3, which has been done in the pull request #6 .
Hi,
I was trying to follow the tutorial in the notebook. When I change the yaml config
gpu-id: -1
togpu-id: 0
which should enable GPU training, an error occured. Following are the log output and the error info:Thanks!
Tim
The text was updated successfully, but these errors were encountered: