How would I apply this to non-image (1-dimensional) data? #17

JAEarly · 2022-03-10T22:24:08Z

First off, thanks for the implementation of this code, it's great!

I'm interested in applying DIM to non-image data, i.e., I just have a collection of feature vectors (not images) that I'd like to encode and maximise information between the original feature vectors and their new embeddings. I'm trying to translate the problem from 2D inputs to 1D inputs.

I have three questions:

Does doing this even make sense? I can't see why the principle of maximising information between the original representation and the embedding wouldn't apply to 1D inputs.
How can I implement this? As far as I understand it, the local embeddings are 2D feature maps, and the global embeddings are 1D vectors. Obviously in the 1D setting, these 2D feature maps disappear, but the 1D global embeddings remain the same. Could the local embeddings be replaced with 1D embeddings of some sort (rather than 2D maps)? The discriminator models that used 2D convolutions would therefore need to be updated.
Why does the GlobalDiscriminator model have 2D convolutional layers? It was my understanding that for the global discriminator the local feature maps should be flattened and concatenated with the global embedding, but based on the code it seems the local feature maps are being further processed before being concatenated with the global embedding? Could you clarify this please?

Thanks in advance!

DuaneNielsen · 2022-04-11T20:31:11Z

Hi, sorry for the delay,

Best to answer 3. first...

The local feature maps are kept in 2D, and the global feature map is "tiled" (it's a 1x1 tile) then concatenated. This preserves the spatially embedded info in the local maps. Check out figure 5. in the Appendix of the paper

Q2

Yes. Imagine you had a 1D feature of width 100. You compute a local map using (3x1) convolutions with say 20 filters, and with padding this would give you say a 100 width x 20 depth local feature map. Then, to give the extreme example.. you could do a (100x1) "convolution" with 30 filters. This would give you back a global vector of 1 width, 30 depth, that would integrate information over the entire sequence.

Q1

Yes, it makes sense. However, depending on your application, I would also research BYOL or Barlow twins. There has been a lot of progress in unsupervised/semi-supervised learning of late. Also, I would look into using a transformer architecture based approach, rather than convolution, as these have a lot more power in 1D models, since they allow for "set" to "set" mappings. Depending on your sequence length, transformer could be a viable approach.

JAEarly · 2022-04-20T10:08:25Z

Thank you very much for you detailed response; it's very helpful!

JAEarly changed the title ~~How would I apply this to non-image data?~~ How would I apply this to non-image (1-dimensional) data? Mar 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How would I apply this to non-image (1-dimensional) data? #17

How would I apply this to non-image (1-dimensional) data? #17

JAEarly commented Mar 10, 2022

DuaneNielsen commented Apr 11, 2022 •

edited

Loading

JAEarly commented Apr 20, 2022 •

edited

Loading

How would I apply this to non-image (1-dimensional) data? #17

How would I apply this to non-image (1-dimensional) data? #17

Comments

JAEarly commented Mar 10, 2022

DuaneNielsen commented Apr 11, 2022 • edited Loading

JAEarly commented Apr 20, 2022 • edited Loading

DuaneNielsen commented Apr 11, 2022 •

edited

Loading

JAEarly commented Apr 20, 2022 •

edited

Loading