-
Notifications
You must be signed in to change notification settings - Fork 277
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
强化学习之后生成不完整的描述? #283
Comments
确实是会这样的。你看一下bad ending rate,大概多大。如果你用的这个库训练的话,这个rate不会特别大。原因是cider metric的问题。如果你比如说加一个bad ending penalty reward,应该能alleviate这个问题我觉得。Ruotian LuoOn Dec 18, 2023, at 9:26 PM, Xiong-can ***@***.***> wrote:
After reinforcement learning, the description will be incomplete such as:
a motorcycle parked in a parking lot with a ..
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>
|
bad ending rate大概是1/2,如何添加bad endiing penalty reward才能alleviate这个问题呢?能请我一下您的模型的这个bad ending rate为多少吗?谢谢! |
这个不太对。scst原本的paper会在算cider的时候加上eos token(原本的cider不加)。我之前试的话如果不加eos会有1/3的bad ending。你有follow这样的cider计算方式吗?Ruotian LuoOn Dec 18, 2023, at 10:15 PM, Xiong-can ***@***.***> wrote:
确实是会这样的。你看一下bad ending rate,大概多大。如果你用的这个库训练的话,这个rate不会特别大。原因是cider metric的问题。如果你比如说加一个bad ending penalty reward,应该能alleviate这个问题我觉得。Ruotian LuoOn Dec 18, 2023, at 9:26 PM, Xiong-can @.> wrote: After reinforcement learning, the description will be incomplete such as: a motorcycle parked in a parking lot with a .. —Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you are subscribed to this thread.Message ID: @.>
bad ending rate大概是1/2,如何添加bad endiing penalty reward才能alleviate这个问题呢?能请我一下您的模型的这个bad ending rate为多少吗?谢谢!
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: ***@***.***>
|
我的cider计算方式是M2(meshed memory)论文中的evaluation方法计算cider,强化学习计算分数的时候应该是添加了eos进行计算的,这个是我的强化学习部分主要的代码 Rewards
这个是cider分数的计算方式:
|
m2的是有问题的。我1/3的结果就是m2跑出来的。我记得m2是没有加eos的。Ruotian LuoOn Dec 19, 2023, at 10:25 AM, Xiong-can ***@***.***> wrote:
原本的cider不加)。我之前试的话如果不加eos会有1/3的bad ending。你有follow这样的cider计算方式吗?
我的cider计算方式是M2(meshed memory)论文中的evaluation方法计算cider,强化学习计算分数的时候应该是添加了eos进行计算的,这个是我的强化学习部分主要的代码
Rewards
caps_gen = text_field.decode(out.view(-1, seq_len))
caps_gt = list(itertools.chain(*([c, ] * beam_size for c in data["text"])))
caps_gen, caps_gt = tokenizer_pool.map(evaluation.PTBTokenizer.tokenize, [caps_gen, caps_gt])
reward = cider.compute_score(caps_gt, caps_gen)[1].astype(np.float32)
reward = torch.from_numpy(reward).to(device).view(detections.shape[0], beam_size)
reward_baseline = torch.mean(reward, -1, keepdim=True)
loss = -torch.mean(log_prob, -1) * (reward - reward_baseline)
loss = loss.mean()
这个是cider分数的计算方式:
def compute_cider(self):
def counts2vec(cnts):
"""
Function maps counts of ngram to vector of tfidf weights.
The function returns vec, an array of dictionary that store mapping of n-gram and tf-idf weights.
The n-th entry of array denotes length of n-grams.
:param cnts:
:return: vec (array of dict), norm (array of float), length (int)
"""
vec = [defaultdict(float) for _ in range(self.n)]
length = 0
norm = [0.0 for _ in range(self.n)]
for (ngram,term_freq) in cnts.items():
# give word count 1 if it doesn't appear in reference corpus
df = np.log(max(1.0, self.doc_frequency[ngram]))
# ngram index
n = len(ngram)-1
# tf (term_freq) * idf (precomputed idf) for n-grams
vec[n][ngram] = float(term_freq)*(self.ref_len - df)
# compute norm for the vector. the norm will be used for computing similarity
norm[n] += pow(vec[n][ngram], 2)
if n == 1:
length += term_freq
norm = [np.sqrt(n) for n in norm]
return vec, norm, length
def sim(vec_hyp, vec_ref, norm_hyp, norm_ref, length_hyp, length_ref):
'''
Compute the cosine similarity of two vectors.
:param vec_hyp: array of dictionary for vector corresponding to hypothesis
:param vec_ref: array of dictionary for vector corresponding to reference
:param norm_hyp: array of float for vector corresponding to hypothesis
:param norm_ref: array of float for vector corresponding to reference
:param length_hyp: int containing length of hypothesis
:param length_ref: int containing length of reference
:return: array of score for each n-grams cosine similarity
'''
delta = float(length_hyp - length_ref)
# measure consine similarity
val = np.array([0.0 for _ in range(self.n)])
for n in range(self.n):
# ngram
for (ngram,count) in vec_hyp[n].items():
# vrama91 : added clipping
val[n] += min(vec_hyp[n][ngram], vec_ref[n][ngram]) * vec_ref[n][ngram]
if (norm_hyp[n] != 0) and (norm_ref[n] != 0):
val[n] /= (norm_hyp[n]*norm_ref[n])
assert(not math.isnan(val[n]))
# vrama91: added a length based gaussian penalty
val[n] *= np.e**(-(delta**2)/(2*self.sigma**2))
return val
scores = []
for test, refs in zip(self.ctest, self.crefs):
# compute vector for test captions
vec, norm, length = counts2vec(test)
# compute vector for ref captions
score = np.array([0.0 for _ in range(self.n)])
for ref in refs:
vec_ref, norm_ref, length_ref = counts2vec(ref)
score += sim(vec, vec_ref, norm, norm_ref, length, length_ref)
# change by vrama91 - mean of ngram scores, instead of sum
score_avg = np.mean(score)
# divide by number of references
score_avg /= len(refs)
# multiply score by 10
score_avg *= 10.0
# append score of an image to the score list
scores.append(score_avg)
return scores
def compute_score(self):
# compute cider score
score = self.compute_cider()
# debug
# print score
return np.mean(np.array(score)), np.array(score)
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: ***@***.***>
|
那我应该如何alleviate这个问题?我是应该增加一个bad endiing penalty reward?还是修改一下cider的计算方式?是不是Cider-D的计算方式就是增加了一个惩罚因子? |
1我没试过reward,但我觉得试一试挺意思的。2 你可以修改m2里面cider的计算方式,加入eos(注意预处理的时候需要加eos,算的时候也要加)3 另一个你可以加上xe loss,blaance一下Ruotian Luo在 2023年12月20日,上午10:49,Xiong-can ***@***.***> 写道:
m2的是有问题的。我1/3的结果就是m2跑出来的。我记得m2是没有加eos的。Ruotian LuoOn Dec 19, 2023, at 10:25 AM, Xiong-can @.> wrote: 原本的cider不加)。我之前试的话如果不加eos会有1/3的bad ending。你有follow这样的cider计算方式吗? 我的cider计算方式是M2(meshed memory)论文中的evaluation方法计算cider,强化学习计算分数的时候应该是添加了eos进行计算的,这个是我的强化学习部分主要的代码 Rewards caps_gen = text_field.decode(out.view(-1, seq_len)) caps_gt = list(itertools.chain(([c, ] * beam_size for c in data["text"]))) caps_gen, caps_gt = tokenizer_pool.map(evaluation.PTBTokenizer.tokenize, [caps_gen, caps_gt]) reward = cider.compute_score(caps_gt, caps_gen)[1].astype(np.float32) reward = torch.from_numpy(reward).to(device).view(detections.shape[0], beam_size) reward_baseline = torch.mean(reward, -1, keepdim=True) loss = -torch.mean(log_prob, -1) * (reward - reward_baseline) loss = loss.mean() 这个是cider分数的计算方式: def compute_cider(self): def counts2vec(cnts): """ Function maps counts of ngram to vector of tfidf weights. The function returns vec, an array of dictionary that store mapping of n-gram and tf-idf weights. The n-th entry of array denotes length of n-grams. :param cnts: :return: vec (array of dict), norm (array of float), length (int) """ vec = [defaultdict(float) for _ in range(self.n)] length = 0 norm = [0.0 for _ in range(self.n)] for (ngram,term_freq) in cnts.items(): # give word count 1 if it doesn't appear in reference corpus df = np.log(max(1.0, self.doc_frequency[ngram])) # ngram index n = len(ngram)-1 # tf (term_freq) * idf (precomputed idf) for n-grams vec[n][ngram] = float(term_freq)(self.ref_len - df) # compute norm for the vector. the norm will be used for computing similarity norm[n] += pow(vec[n][ngram], 2) if n == 1: length += term_freq norm = [np.sqrt(n) for n in norm] return vec, norm, length def sim(vec_hyp, vec_ref, norm_hyp, norm_ref, length_hyp, length_ref): ''' Compute the cosine similarity of two vectors. :param vec_hyp: array of dictionary for vector corresponding to hypothesis :param vec_ref: array of dictionary for vector corresponding to reference :param norm_hyp: array of float for vector corresponding to hypothesis :param norm_ref: array of float for vector corresponding to reference :param length_hyp: int containing length of hypothesis :param length_ref: int containing length of reference :return: array of score for each n-grams cosine similarity ''' delta = float(length_hyp - length_ref) # measure consine similarity val = np.array([0.0 for _ in range(self.n)]) for n in range(self.n): # ngram for (ngram,count) in vec_hyp[n].items(): # vrama91 : added clipping val[n] += min(vec_hyp[n][ngram], vec_ref[n][ngram]) * vec_ref[n][ngram] if (norm_hyp[n] != 0) and (norm_ref[n] != 0): val[n] /= (norm_hyp[n]norm_ref[n]) assert(not math.isnan(val[n])) # vrama91: added a length based gaussian penalty val[n] = np.e(-(delta2)/(2*self.sigma2)) return val scores = [] for test, refs in zip(self.ctest, self.crefs): # compute vector for test captions vec, norm, length = counts2vec(test) # compute vector for ref captions score = np.array([0.0 for _ in range(self.n)]) for ref in refs: vec_ref, norm_ref, length_ref = counts2vec(ref) score += sim(vec, vec_ref, norm, norm_ref, length, length_ref) # change by vrama91 - mean of ngram scores, instead of sum score_avg = np.mean(score) # divide by number of references score_avg /= len(refs) # multiply score by 10 score_avg = 10.0 # append score of an image to the score list scores.append(score_avg) return scores def compute_score(self): # compute cider score score = self.compute_cider() # debug # print score return np.mean(np.array(score)), np.array(score) —Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: @.*>
那我应该如何alleviate这个问题?我是应该增加一个bad endiing penalty reward?还是修改一下cider的计算方式?是不是Cider-D的计算方式就是增加了一个惩罚因子?
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: ***@***.***>
|
After reinforcement learning, the description will be incomplete such as:
a motorcycle parked in a parking lot with a ..
The text was updated successfully, but these errors were encountered: