强化学习之后生成不完整的描述？ #283

Xiong-can · 2023-12-18T13:26:32Z

After reinforcement learning, the description will be incomplete such as:
a motorcycle parked in a parking lot with a ..

ruotianluo · 2023-12-18T13:54:01Z

确实是会这样的。你看一下bad ending rate，大概多大。如果你用的这个库训练的话，这个rate不会特别大。原因是cider metric的问题。如果你比如说加一个bad ending penalty reward，应该能alleviate这个问题我觉得。Ruotian LuoOn Dec 18, 2023, at 9:26 PM, Xiong-can ***@***.***> wrote: After reinforcement learning, the description will be incomplete such as: a motorcycle parked in a parking lot with a .. —Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

Xiong-can · 2023-12-18T14:15:10Z

确实是会这样的。你看一下bad ending rate，大概多大。如果你用的这个库训练的话，这个rate不会特别大。原因是cider metric的问题。如果你比如说加一个bad ending penalty reward，应该能alleviate这个问题我觉得。Ruotian LuoOn Dec 18, 2023, at 9:26 PM, Xiong-can @.> wrote: After reinforcement learning, the description will be incomplete such as: a motorcycle parked in a parking lot with a .. —Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you are subscribed to this thread.Message ID: @.>

bad ending rate大概是1/2，如何添加bad endiing penalty reward才能alleviate这个问题呢？能请我一下您的模型的这个bad ending rate为多少吗？谢谢！

ruotianluo · 2023-12-18T20:28:27Z

这个不太对。scst原本的paper会在算cider的时候加上eos token（原本的cider不加）。我之前试的话如果不加eos会有1/3的bad ending。你有follow这样的cider计算方式吗？Ruotian LuoOn Dec 18, 2023, at 10:15 PM, Xiong-can ***@***.***> wrote: 确实是会这样的。你看一下bad ending rate，大概多大。如果你用的这个库训练的话，这个rate不会特别大。原因是cider metric的问题。如果你比如说加一个bad ending penalty reward，应该能alleviate这个问题我觉得。Ruotian LuoOn Dec 18, 2023, at 9:26 PM, Xiong-can @.> wrote: After reinforcement learning, the description will be incomplete such as: a motorcycle parked in a parking lot with a .. —Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you are subscribed to this thread.Message ID: @.> bad ending rate大概是1/2，如何添加bad endiing penalty reward才能alleviate这个问题呢？能请我一下您的模型的这个bad ending rate为多少吗？谢谢！ —Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: ***@***.***>

Xiong-can · 2023-12-19T02:24:57Z

原本的cider不加）。我之前试的话如果不加eos会有1/3的bad ending。你有follow这样的cider计算方式吗？

我的cider计算方式是M2(meshed memory)论文中的evaluation方法计算cider，强化学习计算分数的时候应该是添加了eos进行计算的，这个是我的强化学习部分主要的代码

Rewards

        caps_gen = text_field.decode(out.view(-1, seq_len))
        caps_gt = list(itertools.chain(*([c, ] * beam_size for c in data["text"])))
        caps_gen, caps_gt = tokenizer_pool.map(evaluation.PTBTokenizer.tokenize, [caps_gen, caps_gt])
        reward = cider.compute_score(caps_gt, caps_gen)[1].astype(np.float32)
        reward = torch.from_numpy(reward).to(device).view(detections.shape[0], beam_size)
        reward_baseline = torch.mean(reward, -1, keepdim=True)
        loss = -torch.mean(log_prob, -1) * (reward - reward_baseline)
        loss = loss.mean()

这个是cider分数的计算方式：
def compute_cider(self):
def counts2vec(cnts):
"""
Function maps counts of ngram to vector of tfidf weights.
The function returns vec, an array of dictionary that store mapping of n-gram and tf-idf weights.
The n-th entry of array denotes length of n-grams.
:param cnts:
:return: vec (array of dict), norm (array of float), length (int)
"""
vec = [defaultdict(float) for _ in range(self.n)]
length = 0
norm = [0.0 for _ in range(self.n)]
for (ngram,term_freq) in cnts.items():
# give word count 1 if it doesn't appear in reference corpus
df = np.log(max(1.0, self.doc_frequency[ngram]))
# ngram index
n = len(ngram)-1
# tf (term_freq) * idf (precomputed idf) for n-grams
vec[n][ngram] = float(term_freq)*(self.ref_len - df)
# compute norm for the vector. the norm will be used for computing similarity
norm[n] += pow(vec[n][ngram], 2)

            if n == 1:
                length += term_freq
        norm = [np.sqrt(n) for n in norm]
        return vec, norm, length

    def sim(vec_hyp, vec_ref, norm_hyp, norm_ref, length_hyp, length_ref):
        '''
        Compute the cosine similarity of two vectors.
        :param vec_hyp: array of dictionary for vector corresponding to hypothesis
        :param vec_ref: array of dictionary for vector corresponding to reference
        :param norm_hyp: array of float for vector corresponding to hypothesis
        :param norm_ref: array of float for vector corresponding to reference
        :param length_hyp: int containing length of hypothesis
        :param length_ref: int containing length of reference
        :return: array of score for each n-grams cosine similarity
        '''
        delta = float(length_hyp - length_ref)
        # measure consine similarity
        val = np.array([0.0 for _ in range(self.n)])
        for n in range(self.n):
            # ngram
            for (ngram,count) in vec_hyp[n].items():
                # vrama91 : added clipping
                val[n] += min(vec_hyp[n][ngram], vec_ref[n][ngram]) * vec_ref[n][ngram]

            if (norm_hyp[n] != 0) and (norm_ref[n] != 0):
                val[n] /= (norm_hyp[n]*norm_ref[n])

            assert(not math.isnan(val[n]))
            # vrama91: added a length based gaussian penalty
            val[n] *= np.e**(-(delta**2)/(2*self.sigma**2))
        return val

    scores = []
    for test, refs in zip(self.ctest, self.crefs):
        # compute vector for test captions
        vec, norm, length = counts2vec(test)
        # compute vector for ref captions
        score = np.array([0.0 for _ in range(self.n)])
        for ref in refs:
            vec_ref, norm_ref, length_ref = counts2vec(ref)
            score += sim(vec, vec_ref, norm, norm_ref, length, length_ref)
        # change by vrama91 - mean of ngram scores, instead of sum
        score_avg = np.mean(score)
        # divide by number of references
        score_avg /= len(refs)
        # multiply score by 10
        score_avg *= 10.0
        # append score of an image to the score list
        scores.append(score_avg)
    return scores

def compute_score(self):
    # compute cider score
    score = self.compute_cider()
    # debug
    # print score
    return np.mean(np.array(score)), np.array(score)

ruotianluo · 2023-12-19T08:46:23Z

m2的是有问题的。我1/3的结果就是m2跑出来的。我记得m2是没有加eos的。Ruotian LuoOn Dec 19, 2023, at 10:25 AM, Xiong-can ***@***.***> wrote: 原本的cider不加）。我之前试的话如果不加eos会有1/3的bad ending。你有follow这样的cider计算方式吗？我的cider计算方式是M2(meshed memory)论文中的evaluation方法计算cider，强化学习计算分数的时候应该是添加了eos进行计算的，这个是我的强化学习部分主要的代码 Rewards caps_gen = text_field.decode(out.view(-1, seq_len)) caps_gt = list(itertools.chain(*([c, ] * beam_size for c in data["text"]))) caps_gen, caps_gt = tokenizer_pool.map(evaluation.PTBTokenizer.tokenize, [caps_gen, caps_gt]) reward = cider.compute_score(caps_gt, caps_gen)[1].astype(np.float32) reward = torch.from_numpy(reward).to(device).view(detections.shape[0], beam_size) reward_baseline = torch.mean(reward, -1, keepdim=True) loss = -torch.mean(log_prob, -1) * (reward - reward_baseline) loss = loss.mean() 这个是cider分数的计算方式： def compute_cider(self): def counts2vec(cnts): """ Function maps counts of ngram to vector of tfidf weights. The function returns vec, an array of dictionary that store mapping of n-gram and tf-idf weights. The n-th entry of array denotes length of n-grams. :param cnts: :return: vec (array of dict), norm (array of float), length (int) """ vec = [defaultdict(float) for _ in range(self.n)] length = 0 norm = [0.0 for _ in range(self.n)] for (ngram,term_freq) in cnts.items(): # give word count 1 if it doesn't appear in reference corpus df = np.log(max(1.0, self.doc_frequency[ngram])) # ngram index n = len(ngram)-1 # tf (term_freq) * idf (precomputed idf) for n-grams vec[n][ngram] = float(term_freq)*(self.ref_len - df) # compute norm for the vector. the norm will be used for computing similarity norm[n] += pow(vec[n][ngram], 2) if n == 1: length += term_freq norm = [np.sqrt(n) for n in norm] return vec, norm, length def sim(vec_hyp, vec_ref, norm_hyp, norm_ref, length_hyp, length_ref): ''' Compute the cosine similarity of two vectors. :param vec_hyp: array of dictionary for vector corresponding to hypothesis :param vec_ref: array of dictionary for vector corresponding to reference :param norm_hyp: array of float for vector corresponding to hypothesis :param norm_ref: array of float for vector corresponding to reference :param length_hyp: int containing length of hypothesis :param length_ref: int containing length of reference :return: array of score for each n-grams cosine similarity ''' delta = float(length_hyp - length_ref) # measure consine similarity val = np.array([0.0 for _ in range(self.n)]) for n in range(self.n): # ngram for (ngram,count) in vec_hyp[n].items(): # vrama91 : added clipping val[n] += min(vec_hyp[n][ngram], vec_ref[n][ngram]) * vec_ref[n][ngram] if (norm_hyp[n] != 0) and (norm_ref[n] != 0): val[n] /= (norm_hyp[n]*norm_ref[n]) assert(not math.isnan(val[n])) # vrama91: added a length based gaussian penalty val[n] *= np.e**(-(delta**2)/(2*self.sigma**2)) return val scores = [] for test, refs in zip(self.ctest, self.crefs): # compute vector for test captions vec, norm, length = counts2vec(test) # compute vector for ref captions score = np.array([0.0 for _ in range(self.n)]) for ref in refs: vec_ref, norm_ref, length_ref = counts2vec(ref) score += sim(vec, vec_ref, norm, norm_ref, length, length_ref) # change by vrama91 - mean of ngram scores, instead of sum score_avg = np.mean(score) # divide by number of references score_avg /= len(refs) # multiply score by 10 score_avg *= 10.0 # append score of an image to the score list scores.append(score_avg) return scores def compute_score(self): # compute cider score score = self.compute_cider() # debug # print score return np.mean(np.array(score)), np.array(score) —Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: ***@***.***>

Xiong-can · 2023-12-20T02:49:31Z

m2的是有问题的。我1/3的结果就是m2跑出来的。我记得m2是没有加eos的。Ruotian LuoOn Dec 19, 2023, at 10:25 AM, Xiong-can @.> wrote: 原本的cider不加）。我之前试的话如果不加eos会有1/3的bad ending。你有follow这样的cider计算方式吗？我的cider计算方式是M2(meshed memory)论文中的evaluation方法计算cider，强化学习计算分数的时候应该是添加了eos进行计算的，这个是我的强化学习部分主要的代码 Rewards caps_gen = text_field.decode(out.view(-1, seq_len)) caps_gt = list(itertools.chain(([c, ] * beam_size for c in data["text"]))) caps_gen, caps_gt = tokenizer_pool.map(evaluation.PTBTokenizer.tokenize, [caps_gen, caps_gt]) reward = cider.compute_score(caps_gt, caps_gen)[1].astype(np.float32) reward = torch.from_numpy(reward).to(device).view(detections.shape[0], beam_size) reward_baseline = torch.mean(reward, -1, keepdim=True) loss = -torch.mean(log_prob, -1) * (reward - reward_baseline) loss = loss.mean() 这个是cider分数的计算方式： def compute_cider(self): def counts2vec(cnts): """ Function maps counts of ngram to vector of tfidf weights. The function returns vec, an array of dictionary that store mapping of n-gram and tf-idf weights. The n-th entry of array denotes length of n-grams. :param cnts: :return: vec (array of dict), norm (array of float), length (int) """ vec = [defaultdict(float) for _ in range(self.n)] length = 0 norm = [0.0 for _ in range(self.n)] for (ngram,term_freq) in cnts.items(): # give word count 1 if it doesn't appear in reference corpus df = np.log(max(1.0, self.doc_frequency[ngram])) # ngram index n = len(ngram)-1 # tf (term_freq) * idf (precomputed idf) for n-grams vec[n][ngram] = float(term_freq)(self.ref_len - df) # compute norm for the vector. the norm will be used for computing similarity norm[n] += pow(vec[n][ngram], 2) if n == 1: length += term_freq norm = [np.sqrt(n) for n in norm] return vec, norm, length def sim(vec_hyp, vec_ref, norm_hyp, norm_ref, length_hyp, length_ref): ''' Compute the cosine similarity of two vectors. :param vec_hyp: array of dictionary for vector corresponding to hypothesis :param vec_ref: array of dictionary for vector corresponding to reference :param norm_hyp: array of float for vector corresponding to hypothesis :param norm_ref: array of float for vector corresponding to reference :param length_hyp: int containing length of hypothesis :param length_ref: int containing length of reference :return: array of score for each n-grams cosine similarity ''' delta = float(length_hyp - length_ref) # measure consine similarity val = np.array([0.0 for _ in range(self.n)]) for n in range(self.n): # ngram for (ngram,count) in vec_hyp[n].items(): # vrama91 : added clipping val[n] += min(vec_hyp[n][ngram], vec_ref[n][ngram]) * vec_ref[n][ngram] if (norm_hyp[n] != 0) and (norm_ref[n] != 0): val[n] /= (norm_hyp[n]norm_ref[n]) assert(not math.isnan(val[n])) # vrama91: added a length based gaussian penalty val[n] = np.e(-(delta2)/(2*self.sigma2)) return val scores = [] for test, refs in zip(self.ctest, self.crefs): # compute vector for test captions vec, norm, length = counts2vec(test) # compute vector for ref captions score = np.array([0.0 for _ in range(self.n)]) for ref in refs: vec_ref, norm_ref, length_ref = counts2vec(ref) score += sim(vec, vec_ref, norm, norm_ref, length, length_ref) # change by vrama91 - mean of ngram scores, instead of sum score_avg = np.mean(score) # divide by number of references score_avg /= len(refs) # multiply score by 10 score_avg = 10.0 # append score of an image to the score list scores.append(score_avg) return scores def compute_score(self): # compute cider score score = self.compute_cider() # debug # print score return np.mean(np.array(score)), np.array(score) —Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: @.*>

那我应该如何alleviate这个问题？我是应该增加一个bad endiing penalty reward？还是修改一下cider的计算方式？是不是Cider-D的计算方式就是增加了一个惩罚因子？

ruotianluo · 2023-12-20T10:02:09Z

1我没试过reward，但我觉得试一试挺意思的。2 你可以修改m2里面cider的计算方式，加入eos（注意预处理的时候需要加eos，算的时候也要加）3 另一个你可以加上xe loss，blaance一下Ruotian Luo在 2023年12月20日，上午10:49，Xiong-can ***@***.***> 写道： m2的是有问题的。我1/3的结果就是m2跑出来的。我记得m2是没有加eos的。Ruotian LuoOn Dec 19, 2023, at 10:25 AM, Xiong-can @.> wrote: 原本的cider不加）。我之前试的话如果不加eos会有1/3的bad ending。你有follow这样的cider计算方式吗？我的cider计算方式是M2(meshed memory)论文中的evaluation方法计算cider，强化学习计算分数的时候应该是添加了eos进行计算的，这个是我的强化学习部分主要的代码 Rewards caps_gen = text_field.decode(out.view(-1, seq_len)) caps_gt = list(itertools.chain(([c, ] * beam_size for c in data["text"]))) caps_gen, caps_gt = tokenizer_pool.map(evaluation.PTBTokenizer.tokenize, [caps_gen, caps_gt]) reward = cider.compute_score(caps_gt, caps_gen)[1].astype(np.float32) reward = torch.from_numpy(reward).to(device).view(detections.shape[0], beam_size) reward_baseline = torch.mean(reward, -1, keepdim=True) loss = -torch.mean(log_prob, -1) * (reward - reward_baseline) loss = loss.mean() 这个是cider分数的计算方式： def compute_cider(self): def counts2vec(cnts): """ Function maps counts of ngram to vector of tfidf weights. The function returns vec, an array of dictionary that store mapping of n-gram and tf-idf weights. The n-th entry of array denotes length of n-grams. :param cnts: :return: vec (array of dict), norm (array of float), length (int) """ vec = [defaultdict(float) for _ in range(self.n)] length = 0 norm = [0.0 for _ in range(self.n)] for (ngram,term_freq) in cnts.items(): # give word count 1 if it doesn't appear in reference corpus df = np.log(max(1.0, self.doc_frequency[ngram])) # ngram index n = len(ngram)-1 # tf (term_freq) * idf (precomputed idf) for n-grams vec[n][ngram] = float(term_freq)(self.ref_len - df) # compute norm for the vector. the norm will be used for computing similarity norm[n] += pow(vec[n][ngram], 2) if n == 1: length += term_freq norm = [np.sqrt(n) for n in norm] return vec, norm, length def sim(vec_hyp, vec_ref, norm_hyp, norm_ref, length_hyp, length_ref): ''' Compute the cosine similarity of two vectors. :param vec_hyp: array of dictionary for vector corresponding to hypothesis :param vec_ref: array of dictionary for vector corresponding to reference :param norm_hyp: array of float for vector corresponding to hypothesis :param norm_ref: array of float for vector corresponding to reference :param length_hyp: int containing length of hypothesis :param length_ref: int containing length of reference :return: array of score for each n-grams cosine similarity ''' delta = float(length_hyp - length_ref) # measure consine similarity val = np.array([0.0 for _ in range(self.n)]) for n in range(self.n): # ngram for (ngram,count) in vec_hyp[n].items(): # vrama91 : added clipping val[n] += min(vec_hyp[n][ngram], vec_ref[n][ngram]) * vec_ref[n][ngram] if (norm_hyp[n] != 0) and (norm_ref[n] != 0): val[n] /= (norm_hyp[n]norm_ref[n]) assert(not math.isnan(val[n])) # vrama91: added a length based gaussian penalty val[n] = np.e(-(delta2)/(2*self.sigma2)) return val scores = [] for test, refs in zip(self.ctest, self.crefs): # compute vector for test captions vec, norm, length = counts2vec(test) # compute vector for ref captions score = np.array([0.0 for _ in range(self.n)]) for ref in refs: vec_ref, norm_ref, length_ref = counts2vec(ref) score += sim(vec, vec_ref, norm, norm_ref, length, length_ref) # change by vrama91 - mean of ngram scores, instead of sum score_avg = np.mean(score) # divide by number of references score_avg /= len(refs) # multiply score by 10 score_avg = 10.0 # append score of an image to the score list scores.append(score_avg) return scores def compute_score(self): # compute cider score score = self.compute_cider() # debug # print score return np.mean(np.array(score)), np.array(score) —Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: @.*> 那我应该如何alleviate这个问题？我是应该增加一个bad endiing penalty reward？还是修改一下cider的计算方式？是不是Cider-D的计算方式就是增加了一个惩罚因子？ —Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: ***@***.***>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

强化学习之后生成不完整的描述？ #283

强化学习之后生成不完整的描述？ #283

Xiong-can commented Dec 18, 2023

ruotianluo commented Dec 18, 2023 via email

Xiong-can commented Dec 18, 2023

ruotianluo commented Dec 18, 2023 via email

Xiong-can commented Dec 19, 2023

ruotianluo commented Dec 19, 2023 via email

Xiong-can commented Dec 20, 2023

ruotianluo commented Dec 20, 2023 via email

强化学习之后生成不完整的描述？ #283

强化学习之后生成不完整的描述？ #283

Comments

Xiong-can commented Dec 18, 2023

ruotianluo commented Dec 18, 2023 via email

Xiong-can commented Dec 18, 2023

ruotianluo commented Dec 18, 2023 via email

Xiong-can commented Dec 19, 2023

Rewards

ruotianluo commented Dec 19, 2023 via email

Xiong-can commented Dec 20, 2023

ruotianluo commented Dec 20, 2023 via email