Can't position encoding be like temporal encoding #15

deepaksuresh · 2019-02-24T14:10:23Z

Why should the position encoding change along the dimension of the word embedding?
Shouldn't the entire embedding be multiplied elementwise by a constant value?
Consider the sentence "john, went, to, the, hallway", doesn't it suffice to multiply element-wise "john" by a small constant value, say 0.1, and, the last word "hallway" by a larger value.
I am trying to understand the reason behind varying the weight of position encoding along the dimension of a word embedding

tesatory · 2019-02-25T11:48:23Z

So there are two different things in the model:

temporal embeddings: they added to each sentence representation to preserve their ordering information. They are just like "position embeddings" used Transformer models.
Position encoding: this is a simple trick to preserve ordering of words within a sentence. Because we are doing bag-of-words, anything like "adding an embedding" will not work, which is why we do a multiplication. But a multiplication by a scalar is not a good idea because you can't distinguish "0.5room + 0.5room" from "1.0*room". In addition, a word multiplied by 0.1 is less likely to have an effect on the output, so it becomes an unnecessary bias in the model.

deepaksuresh changed the title ~~Can't position embedding be like temporal embedding~~ Can't position encoding be like temporal encoding Feb 24, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can't position encoding be like temporal encoding #15

Can't position encoding be like temporal encoding #15

deepaksuresh commented Feb 24, 2019

tesatory commented Feb 25, 2019

Can't position encoding be like temporal encoding #15

Can't position encoding be like temporal encoding #15

Comments

deepaksuresh commented Feb 24, 2019

tesatory commented Feb 25, 2019