You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
First off, thanks for the implementation of this code, it's great!
I'm interested in applying DIM to non-image data, i.e., I just have a collection of feature vectors (not images) that I'd like to encode and maximise information between the original feature vectors and their new embeddings. I'm trying to translate the problem from 2D inputs to 1D inputs.
I have three questions:
Does doing this even make sense? I can't see why the principle of maximising information between the original representation and the embedding wouldn't apply to 1D inputs.
How can I implement this? As far as I understand it, the local embeddings are 2D feature maps, and the global embeddings are 1D vectors. Obviously in the 1D setting, these 2D feature maps disappear, but the 1D global embeddings remain the same. Could the local embeddings be replaced with 1D embeddings of some sort (rather than 2D maps)? The discriminator models that used 2D convolutions would therefore need to be updated.
Why does the GlobalDiscriminator model have 2D convolutional layers? It was my understanding that for the global discriminator the local feature maps should be flattened and concatenated with the global embedding, but based on the code it seems the local feature maps are being further processed before being concatenated with the global embedding? Could you clarify this please?
Thanks in advance!
The text was updated successfully, but these errors were encountered:
JAEarly
changed the title
How would I apply this to non-image data?
How would I apply this to non-image (1-dimensional) data?
Mar 10, 2022
The local feature maps are kept in 2D, and the global feature map is "tiled" (it's a 1x1 tile) then concatenated. This preserves the spatially embedded info in the local maps. Check out figure 5. in the Appendix of the paper
Q2
Yes. Imagine you had a 1D feature of width 100. You compute a local map using (3x1) convolutions with say 20 filters, and with padding this would give you say a 100 width x 20 depth local feature map. Then, to give the extreme example.. you could do a (100x1) "convolution" with 30 filters. This would give you back a global vector of 1 width, 30 depth, that would integrate information over the entire sequence.
Q1
Yes, it makes sense. However, depending on your application, I would also research BYOL or Barlow twins. There has been a lot of progress in unsupervised/semi-supervised learning of late. Also, I would look into using a transformer architecture based approach, rather than convolution, as these have a lot more power in 1D models, since they allow for "set" to "set" mappings. Depending on your sequence length, transformer could be a viable approach.
First off, thanks for the implementation of this code, it's great!
I'm interested in applying DIM to non-image data, i.e., I just have a collection of feature vectors (not images) that I'd like to encode and maximise information between the original feature vectors and their new embeddings. I'm trying to translate the problem from 2D inputs to 1D inputs.
I have three questions:
Thanks in advance!
The text was updated successfully, but these errors were encountered: