Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

计算embeddings的相似度,为什么不直接用L2距离或者cosine距离,而是? #19

Open
qxzhou1010 opened this issue Jul 29, 2020 · 2 comments

Comments

@qxzhou1010
Copy link

qxzhou1010 commented Jul 29, 2020

 private float evaluate(float[][] embeddings) {
        float[] embeddings1 = embeddings[0];
        float[] embeddings2 = embeddings[1];
        float dist = 0;
        for (int i = 0; i < 192; i++) {
            dist += Math.pow(embeddings1[i] - embeddings2[i], 2);
        }
        float same = 0;
        for (int i = 0; i < 400; i++) {
            float threshold = 0.01f * (i + 1);
            if (dist < threshold) {
                same += 1.0 / 400;
            }
        }
        return same;
    }```
- 有点没看懂为什么要这样计算最后两个embedding的相似度?
@h3clikejava
Copy link

这不就是余弦公式么,关键是400是啥意思,哪里来的?

@syaringan357
Copy link
Owner

syaringan357 commented Aug 17, 2020

def evaluate(embeddings):
    # Calculate evaluation metrics
    thresholds = np.arange(0, 4, 0.01)
    thresholds = thresholds + 0.01
    embeddings1 = embeddings[0]
    embeddings2 = embeddings[1]
    assert (embeddings1.shape[0] == embeddings2.shape[0])

    diff = np.subtract(embeddings1, embeddings2)
    dist = np.sum(np.square(diff))
    predict_issame = np.less(dist, thresholds)
    return np.mean(predict_issame)

这是源码

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants