Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

训练问题 #4

Open
weiweili123 opened this issue Jan 12, 2018 · 7 comments
Open

训练问题 #4

weiweili123 opened this issue Jan 12, 2018 · 7 comments

Comments

@weiweili123
Copy link

您好!我用resnet18提取了数据集的特征,但是不知道怎样训练,好像训练代码是针对于bottom-up-attention的。同时也不是很清楚如何用bottom-up-attention提出AI Challenger训练集的特征

@ruotianluo
Copy link
Owner

应该是兼容的,可以直接训练

@weiweili123
Copy link
Author

您好!我把run_train.sh修改成如下:
#! /bin/sh

#larger batch

id="dense_box_bn"$1
ckpt_path="log_"$id
if [ ! -d $ckpt_path ]; then
mkdir $ckpt_path
fi
if [ ! -f $ckpt_path"/infos_"$id".pkl" ]; then
start_from=""
else
start_from="--start_from "$ckpt_path
fi

python train.py --caption_model denseatt --input_json data/chinese_talk.json --input_label_h5 data/chinese_talk_label.h5 --input_fc_dir data/chinese_talk_fc --input_att_dir data/chinese_talk_att --seq_per_img 512 --batch_size 50 --beam_size 1 --learning_rate 5e-4 --learning_rate_decay_start 0 --scheduled_sampling_start 0 --checkpoint_path $ckpt_path $start_from --save_checkpoint_every 3000 --language_eval 1 --val_images_use 10000 --max_epoch 37 --rnn_size 1300 --use_bn 1

if [ ! -d xe/$ckpt_path ]; then
cp -r $ckpt_path xe/
fi

python train.py --caption_model denseatt --input_json data/chinese_talk.json --input_label_h5 data/chinese_talk_label.h5 --input_fc_dir data/chinese_talk_fc --input_att_dir data/chinese_talk_att --seq_per_img 5 --batch_size 50 --beam_size 1 --learning_rate 5e-5 --learning_rate_decay_start 0 --learning_rate_decay_every 55 --learning_rate_decay_rate 0.1 --scheduled_sampling_start 0 --checkpoint_path $ckpt_path --start_from $ckpt_path --save_checkpoint_every 3000 --language_eval 1 --val_images_use 10000 --self_critical_after 37 --rnn_size 1300 --use_bn 1

运行结果:
zou@zou:~/Image_Captioning_chinese$ sh run_train.sh
/usr/local/lib/python2.7/dist-packages/h5py/init.py:36: FutureWarning: Conversion of the second argument of issubdtype from float to np.floating is deprecated. In future, it will be treated as np.float64 == np.dtype(float).type.
from ._conv import register_converters as _register_converters

DataLoader loading json file: data/chinese_talk.json
vocab size is 4461
DataLoader loading h5 file: data/chinese_talk_fc data/chinese_talk_att data/cocotalk_box data/chinese_talk_label.h5
max sequence length in data is 20
read 240000 image features
assigned 220000 images to split train
assigned 10000 images to split val
assigned 10000 images to split test
Traceback (most recent call last):
File "train.py", line 228, in
train(opt)
File "train.py", line 114, in train
data = loader.get_batch('train')
File "/home/zou/Image_Captioning_chinese/dataloader.py", line 164, in get_batch
data['att_feats'][i*seq_per_img:(i+1)*seq_per_img, :att_batch[i].shape[0]] = att_batch[i]
ValueError: could not broadcast input array from shape (14,14,512) into shape (512,14,14)
Terminating BlobFetcher
/usr/local/lib/python2.7/dist-packages/h5py/init.py:36: FutureWarning: Conversion of the second argument of issubdtype from float to np.floating is deprecated. In future, it will be treated as np.float64 == np.dtype(float).type.
from .conv import register_converters as register_converters
DataLoader loading json file: data/chinese_talk.json
vocab size is 4461
DataLoader loading h5 file: data/chinese_talk_fc data/chinese_talk_att data/cocotalk_box data/chinese_talk_label.h5
max sequence length in data is 20
read 240000 image features
assigned 220000 images to split train
assigned 10000 images to split val
assigned 10000 images to split test
Traceback (most recent call last):
File "train.py", line 228, in
train(opt)
File "train.py", line 48, in train
with open(os.path.join(opt.start_from, 'infos
'+opt.id+'.pkl')) as f:
IOError: [Errno 2] No such file or directory: 'log_dense_box_bn/infos
.pkl'

不知道是训练参数设置问题,还是使用resnet18提取特征导致输出的参数跟后面不能匹配?

@ruotianluo
Copy link
Owner

哦,好像不完全兼容。。seq per img不用改。你可以在dataloader的load not时,把14x14x512 reshape成196x512

@weiweili123
Copy link
Author

好的,还想再问一个问题:用resnet101提取特征的时候,我的1080显卡(8.1G))每次处理到3000张图片的时候就爆掉了,您当时提取特征的显存多少?以及代码是否设置了多显卡选项?

@ruotianluo
Copy link
Owner

1080ti。没有多卡

@weiweili123
Copy link
Author

好的,谢谢!

@zhufeijuanjuan
Copy link

您好!我把run_train.sh修改成如下:
#! /bin/sh

#larger batch

id="dense_box_bn"$1
ckpt_path="log_"$id
if [ ! -d $ckpt_path ]; then
mkdir $ckpt_path
fi
if [ ! -f $ckpt_path"/infos_"$id".pkl" ]; then
start_from=""
else
start_from="--start_from "$ckpt_path
fi

python train.py --caption_model denseatt --input_json data/chinese_talk.json --input_label_h5 data/chinese_talk_label.h5 --input_fc_dir data/chinese_talk_fc --input_att_dir data/chinese_talk_att --seq_per_img 512 --batch_size 50 --beam_size 1 --learning_rate 5e-4 --learning_rate_decay_start 0 --scheduled_sampling_start 0 --checkpoint_path $ckpt_path $start_from --save_checkpoint_every 3000 --language_eval 1 --val_images_use 10000 --max_epoch 37 --rnn_size 1300 --use_bn 1

if [ ! -d xe/$ckpt_path ]; then
cp -r $ckpt_path xe/
fi

python train.py --caption_model denseatt --input_json data/chinese_talk.json --input_label_h5 data/chinese_talk_label.h5 --input_fc_dir data/chinese_talk_fc --input_att_dir data/chinese_talk_att --seq_per_img 5 --batch_size 50 --beam_size 1 --learning_rate 5e-5 --learning_rate_decay_start 0 --learning_rate_decay_every 55 --learning_rate_decay_rate 0.1 --scheduled_sampling_start 0 --checkpoint_path $ckpt_path --start_from $ckpt_path --save_checkpoint_every 3000 --language_eval 1 --val_images_use 10000 --self_critical_after 37 --rnn_size 1300 --use_bn 1

运行结果:
zou@zou:~/Image_Captioning_chinese$ sh run_train.sh
/usr/local/lib/python2.7/dist-packages/h5py/init.py:36: FutureWarning: Conversion of the second argument of issubdtype from float to np.floating is deprecated. In future, it will be treated as np.float64 == np.dtype(float).type.
from ._conv import register_converters as _register_converters

DataLoader loading json file: data/chinese_talk.json
vocab size is 4461
DataLoader loading h5 file: data/chinese_talk_fc data/chinese_talk_att data/cocotalk_box data/chinese_talk_label.h5
max sequence length in data is 20
read 240000 image features
assigned 220000 images to split train
assigned 10000 images to split val
assigned 10000 images to split test
Traceback (most recent call last):
File "train.py", line 228, in
train(opt)
File "train.py", line 114, in train
data = loader.get_batch('train')
File "/home/zou/Image_Captioning_chinese/dataloader.py", line 164, in get_batch
data['att_feats'][i*seq_per_img:(i+1)*seq_per_img, :att_batch[i].shape[0]] = att_batch[i]
ValueError: could not broadcast input array from shape (14,14,512) into shape (512,14,14)
Terminating BlobFetcher
/usr/local/lib/python2.7/dist-packages/h5py/init.py:36: FutureWarning: Conversion of the second argument of issubdtype from float to np.floating is deprecated. In future, it will be treated as np.float64 == np.dtype(float).type.
from .conv import register_converters as register_converters DataLoader loading json file: data/chinese_talk.json vocab size is 4461 DataLoader loading h5 file: data/chinese_talk_fc data/chinese_talk_att data/cocotalk_box data/chinese_talk_label.h5 max sequence length in data is 20 read 240000 image features assigned 220000 images to split train assigned 10000 images to split val assigned 10000 images to split test Traceback (most recent call last): File "train.py", line 228, in train(opt) File "train.py", line 48, in train with open(os.path.join(opt.start_from, 'infos'+opt.id+'.pkl')) as f: IOError: [Errno 2] No such file or directory: 'log_dense_box_bn/infos.pkl'

不知道是训练参数设置问题,还是使用resnet18提取特征导致输出的参数跟后面不能匹配?

你好,我也遇到了这个问题,请问是如何解决的,谢谢。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants