Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

强化学习之后生成不完整的描述? #283

Open
Xiong-can opened this issue Dec 18, 2023 · 7 comments
Open

强化学习之后生成不完整的描述? #283

Xiong-can opened this issue Dec 18, 2023 · 7 comments

Comments

@Xiong-can
Copy link

After reinforcement learning, the description will be incomplete such as:
a motorcycle parked in a parking lot with a ..

@ruotianluo
Copy link
Owner

ruotianluo commented Dec 18, 2023 via email

@Xiong-can
Copy link
Author

确实是会这样的。你看一下bad ending rate,大概多大。如果你用的这个库训练的话,这个rate不会特别大。原因是cider metric的问题。如果你比如说加一个bad ending penalty reward,应该能alleviate这个问题我觉得。Ruotian LuoOn Dec 18, 2023, at 9:26 PM, Xiong-can @.> wrote: After reinforcement learning, the description will be incomplete such as: a motorcycle parked in a parking lot with a .. —Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you are subscribed to this thread.Message ID: @.>

bad ending rate大概是1/2,如何添加bad endiing penalty reward才能alleviate这个问题呢?能请我一下您的模型的这个bad ending rate为多少吗?谢谢!

@ruotianluo
Copy link
Owner

ruotianluo commented Dec 18, 2023 via email

@Xiong-can
Copy link
Author

原本的cider不加)。我之前试的话如果不加eos会有1/3的bad ending。你有follow这样的cider计算方式吗?

我的cider计算方式是M2(meshed memory)论文中的evaluation方法计算cider,强化学习计算分数的时候应该是添加了eos进行计算的,这个是我的强化学习部分主要的代码

Rewards

        caps_gen = text_field.decode(out.view(-1, seq_len))
        caps_gt = list(itertools.chain(*([c, ] * beam_size for c in data["text"])))
        caps_gen, caps_gt = tokenizer_pool.map(evaluation.PTBTokenizer.tokenize, [caps_gen, caps_gt])
        reward = cider.compute_score(caps_gt, caps_gen)[1].astype(np.float32)
        reward = torch.from_numpy(reward).to(device).view(detections.shape[0], beam_size)
        reward_baseline = torch.mean(reward, -1, keepdim=True)
        loss = -torch.mean(log_prob, -1) * (reward - reward_baseline)
        loss = loss.mean()

这个是cider分数的计算方式:
def compute_cider(self):
def counts2vec(cnts):
"""
Function maps counts of ngram to vector of tfidf weights.
The function returns vec, an array of dictionary that store mapping of n-gram and tf-idf weights.
The n-th entry of array denotes length of n-grams.
:param cnts:
:return: vec (array of dict), norm (array of float), length (int)
"""
vec = [defaultdict(float) for _ in range(self.n)]
length = 0
norm = [0.0 for _ in range(self.n)]
for (ngram,term_freq) in cnts.items():
# give word count 1 if it doesn't appear in reference corpus
df = np.log(max(1.0, self.doc_frequency[ngram]))
# ngram index
n = len(ngram)-1
# tf (term_freq) * idf (precomputed idf) for n-grams
vec[n][ngram] = float(term_freq)*(self.ref_len - df)
# compute norm for the vector. the norm will be used for computing similarity
norm[n] += pow(vec[n][ngram], 2)

            if n == 1:
                length += term_freq
        norm = [np.sqrt(n) for n in norm]
        return vec, norm, length

    def sim(vec_hyp, vec_ref, norm_hyp, norm_ref, length_hyp, length_ref):
        '''
        Compute the cosine similarity of two vectors.
        :param vec_hyp: array of dictionary for vector corresponding to hypothesis
        :param vec_ref: array of dictionary for vector corresponding to reference
        :param norm_hyp: array of float for vector corresponding to hypothesis
        :param norm_ref: array of float for vector corresponding to reference
        :param length_hyp: int containing length of hypothesis
        :param length_ref: int containing length of reference
        :return: array of score for each n-grams cosine similarity
        '''
        delta = float(length_hyp - length_ref)
        # measure consine similarity
        val = np.array([0.0 for _ in range(self.n)])
        for n in range(self.n):
            # ngram
            for (ngram,count) in vec_hyp[n].items():
                # vrama91 : added clipping
                val[n] += min(vec_hyp[n][ngram], vec_ref[n][ngram]) * vec_ref[n][ngram]

            if (norm_hyp[n] != 0) and (norm_ref[n] != 0):
                val[n] /= (norm_hyp[n]*norm_ref[n])

            assert(not math.isnan(val[n]))
            # vrama91: added a length based gaussian penalty
            val[n] *= np.e**(-(delta**2)/(2*self.sigma**2))
        return val

    scores = []
    for test, refs in zip(self.ctest, self.crefs):
        # compute vector for test captions
        vec, norm, length = counts2vec(test)
        # compute vector for ref captions
        score = np.array([0.0 for _ in range(self.n)])
        for ref in refs:
            vec_ref, norm_ref, length_ref = counts2vec(ref)
            score += sim(vec, vec_ref, norm, norm_ref, length, length_ref)
        # change by vrama91 - mean of ngram scores, instead of sum
        score_avg = np.mean(score)
        # divide by number of references
        score_avg /= len(refs)
        # multiply score by 10
        score_avg *= 10.0
        # append score of an image to the score list
        scores.append(score_avg)
    return scores

def compute_score(self):
    # compute cider score
    score = self.compute_cider()
    # debug
    # print score
    return np.mean(np.array(score)), np.array(score)

@ruotianluo
Copy link
Owner

ruotianluo commented Dec 19, 2023 via email

@Xiong-can
Copy link
Author

m2的是有问题的。我1/3的结果就是m2跑出来的。我记得m2是没有加eos的。Ruotian LuoOn Dec 19, 2023, at 10:25 AM, Xiong-can @.> wrote: 原本的cider不加)。我之前试的话如果不加eos会有1/3的bad ending。你有follow这样的cider计算方式吗? 我的cider计算方式是M2(meshed memory)论文中的evaluation方法计算cider,强化学习计算分数的时候应该是添加了eos进行计算的,这个是我的强化学习部分主要的代码 Rewards caps_gen = text_field.decode(out.view(-1, seq_len)) caps_gt = list(itertools.chain(([c, ] * beam_size for c in data["text"]))) caps_gen, caps_gt = tokenizer_pool.map(evaluation.PTBTokenizer.tokenize, [caps_gen, caps_gt]) reward = cider.compute_score(caps_gt, caps_gen)[1].astype(np.float32) reward = torch.from_numpy(reward).to(device).view(detections.shape[0], beam_size) reward_baseline = torch.mean(reward, -1, keepdim=True) loss = -torch.mean(log_prob, -1) * (reward - reward_baseline) loss = loss.mean() 这个是cider分数的计算方式: def compute_cider(self): def counts2vec(cnts): """ Function maps counts of ngram to vector of tfidf weights. The function returns vec, an array of dictionary that store mapping of n-gram and tf-idf weights. The n-th entry of array denotes length of n-grams. :param cnts: :return: vec (array of dict), norm (array of float), length (int) """ vec = [defaultdict(float) for _ in range(self.n)] length = 0 norm = [0.0 for _ in range(self.n)] for (ngram,term_freq) in cnts.items(): # give word count 1 if it doesn't appear in reference corpus df = np.log(max(1.0, self.doc_frequency[ngram])) # ngram index n = len(ngram)-1 # tf (term_freq) * idf (precomputed idf) for n-grams vec[n][ngram] = float(term_freq)(self.ref_len - df) # compute norm for the vector. the norm will be used for computing similarity norm[n] += pow(vec[n][ngram], 2) if n == 1: length += term_freq norm = [np.sqrt(n) for n in norm] return vec, norm, length def sim(vec_hyp, vec_ref, norm_hyp, norm_ref, length_hyp, length_ref): ''' Compute the cosine similarity of two vectors. :param vec_hyp: array of dictionary for vector corresponding to hypothesis :param vec_ref: array of dictionary for vector corresponding to reference :param norm_hyp: array of float for vector corresponding to hypothesis :param norm_ref: array of float for vector corresponding to reference :param length_hyp: int containing length of hypothesis :param length_ref: int containing length of reference :return: array of score for each n-grams cosine similarity ''' delta = float(length_hyp - length_ref) # measure consine similarity val = np.array([0.0 for _ in range(self.n)]) for n in range(self.n): # ngram for (ngram,count) in vec_hyp[n].items(): # vrama91 : added clipping val[n] += min(vec_hyp[n][ngram], vec_ref[n][ngram]) * vec_ref[n][ngram] if (norm_hyp[n] != 0) and (norm_ref[n] != 0): val[n] /= (norm_hyp[n]norm_ref[n]) assert(not math.isnan(val[n])) # vrama91: added a length based gaussian penalty val[n] = np.e(-(delta2)/(2*self.sigma2)) return val scores = [] for test, refs in zip(self.ctest, self.crefs): # compute vector for test captions vec, norm, length = counts2vec(test) # compute vector for ref captions score = np.array([0.0 for _ in range(self.n)]) for ref in refs: vec_ref, norm_ref, length_ref = counts2vec(ref) score += sim(vec, vec_ref, norm, norm_ref, length, length_ref) # change by vrama91 - mean of ngram scores, instead of sum score_avg = np.mean(score) # divide by number of references score_avg /= len(refs) # multiply score by 10 score_avg = 10.0 # append score of an image to the score list scores.append(score_avg) return scores def compute_score(self): # compute cider score score = self.compute_cider() # debug # print score return np.mean(np.array(score)), np.array(score) —Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: @.*>

那我应该如何alleviate这个问题?我是应该增加一个bad endiing penalty reward?还是修改一下cider的计算方式?是不是Cider-D的计算方式就是增加了一个惩罚因子?

@ruotianluo
Copy link
Owner

ruotianluo commented Dec 20, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants