Skip to content
This repository has been archived by the owner on Aug 31, 2021. It is now read-only.

Can't position encoding be like temporal encoding #15

Open
deepaksuresh opened this issue Feb 24, 2019 · 1 comment
Open

Can't position encoding be like temporal encoding #15

deepaksuresh opened this issue Feb 24, 2019 · 1 comment

Comments

@deepaksuresh
Copy link

Why should the position encoding change along the dimension of the word embedding?
Shouldn't the entire embedding be multiplied elementwise by a constant value?
Consider the sentence "john, went, to, the, hallway", doesn't it suffice to multiply element-wise "john" by a small constant value, say 0.1, and, the last word "hallway" by a larger value.
I am trying to understand the reason behind varying the weight of position encoding along the dimension of a word embedding

@deepaksuresh deepaksuresh changed the title Can't position embedding be like temporal embedding Can't position encoding be like temporal encoding Feb 24, 2019
@tesatory
Copy link
Contributor

So there are two different things in the model:

  1. temporal embeddings: they added to each sentence representation to preserve their ordering information. They are just like "position embeddings" used Transformer models.
  2. Position encoding: this is a simple trick to preserve ordering of words within a sentence. Because we are doing bag-of-words, anything like "adding an embedding" will not work, which is why we do a multiplication. But a multiplication by a scalar is not a good idea because you can't distinguish "0.5room + 0.5room" from "1.0*room". In addition, a word multiplied by 0.1 is less likely to have an effect on the output, so it becomes an unnecessary bias in the model.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants