[question] How is M computed in wasserstein layer? #257

kirk86 · 2021-05-01T23:07:53Z

Hi,
I was wondering if anyone could shed some light on the following questions regarding the wasserstein layer.

How is M computed, where can I find this info?
What is the pred variable is it logits, softmax outputs or something else?
Same for label is it hard labels like [3, 2, 5, 7, 9, 1, 0], one-hot encoded or something else?

Do the above vars change according to the problem (multi-class classification), for instance on MNIST with hard labels (i.e. unique labels) vs MNIST with multi-labels for each class?

The text was updated successfully, but these errors were encountered:

pluskid · 2021-05-03T17:07:49Z

M is the ground metric matrix, which should be defined beforehand according to your prior knowledge of the label space (i.e. how similar label a is to label b).

Both pred and label should be k_class -by- n_eg. Both pred and label should be normalized to 1. So pred should be softmax output. If you have hard labels, they need to be one-hot encoded, and if there are multiple labels for one instance, the one-hot labels need to re-normalized to sum to 1. Hope this helps.

kirk86 · 2021-05-03T19:02:40Z

Thanks that helps!

M is the ground metric according to your prior knowledge of the label space (i.e. how similar label a is to label b)

So basically M is just an adjacency matrix between all pairs of labels?

Given k = num_classes, n = num_samples, where $\text{pred}\in\mathbb{R}^{k\times n}, \text{label}\in\mathbb{R}^{k\times n}$ then $M=\sum_{n}\abs(\text{label}\in\mathbb{R}^{k\times 1\times n}-\text{label}\in\mathbb{R}^{1\times k\times n})^{p}$ ?

pluskid · 2021-05-05T17:12:51Z

I'm not sure if I follow your notation. Basically, M is k x k shape, where M[i, j] is the distance between class i and class j. The distance depend on specific applications. For examples, if you have class cat, dog, car, etc. Then you might have some ground metric that measures longer distance between cat and car than between cat and dog. In the most extreme case, where you do not have any information available on the classes, then you can use the uninformative distances to say every class is equal distant to every other class. If this is the case, then this might not be a good application scenario for wasserstein loss any way.

kirk86 · 2021-05-06T17:46:39Z

Basically, M is k x k shape

Yep, that's what my notation says as well.

if you have class cat, dog, car, etc. Then you might have some ground metric that measures longer distance between cat and car than between cat and dog

Thanks, do you have any references or examples to such ground metrics?

I've used the l1-norm between labels to generate M above in my notation.
When I tried training using SGD the model would not train beyond a 40-50% acc.
Adam gave better results but still nowhere near compared to training without the wasserstein, any thoughts?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[question] How is M computed in wasserstein layer? #257

[question] How is M computed in wasserstein layer? #257

kirk86 commented May 1, 2021 •

edited

Loading

pluskid commented May 3, 2021 •

edited

Loading

kirk86 commented May 3, 2021

pluskid commented May 5, 2021

kirk86 commented May 6, 2021

[question] How is M computed in wasserstein layer? #257

[question] How is M computed in wasserstein layer? #257

Comments

kirk86 commented May 1, 2021 • edited Loading

pluskid commented May 3, 2021 • edited Loading

kirk86 commented May 3, 2021

pluskid commented May 5, 2021

kirk86 commented May 6, 2021

kirk86 commented May 1, 2021 •

edited

Loading

pluskid commented May 3, 2021 •

edited

Loading