Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How would I apply this to non-image (1-dimensional) data? #17

Open
JAEarly opened this issue Mar 10, 2022 · 2 comments
Open

How would I apply this to non-image (1-dimensional) data? #17

JAEarly opened this issue Mar 10, 2022 · 2 comments

Comments

@JAEarly
Copy link

JAEarly commented Mar 10, 2022

First off, thanks for the implementation of this code, it's great!

I'm interested in applying DIM to non-image data, i.e., I just have a collection of feature vectors (not images) that I'd like to encode and maximise information between the original feature vectors and their new embeddings. I'm trying to translate the problem from 2D inputs to 1D inputs.

I have three questions:

  1. Does doing this even make sense? I can't see why the principle of maximising information between the original representation and the embedding wouldn't apply to 1D inputs.
  2. How can I implement this? As far as I understand it, the local embeddings are 2D feature maps, and the global embeddings are 1D vectors. Obviously in the 1D setting, these 2D feature maps disappear, but the 1D global embeddings remain the same. Could the local embeddings be replaced with 1D embeddings of some sort (rather than 2D maps)? The discriminator models that used 2D convolutions would therefore need to be updated.
  3. Why does the GlobalDiscriminator model have 2D convolutional layers? It was my understanding that for the global discriminator the local feature maps should be flattened and concatenated with the global embedding, but based on the code it seems the local feature maps are being further processed before being concatenated with the global embedding? Could you clarify this please?

Thanks in advance!

@JAEarly JAEarly changed the title How would I apply this to non-image data? How would I apply this to non-image (1-dimensional) data? Mar 10, 2022
@DuaneNielsen
Copy link
Owner

DuaneNielsen commented Apr 11, 2022

Hi, sorry for the delay,

Best to answer 3. first...

The local feature maps are kept in 2D, and the global feature map is "tiled" (it's a 1x1 tile) then concatenated. This preserves the spatially embedded info in the local maps. Check out figure 5. in the Appendix of the paper

Q2

Yes. Imagine you had a 1D feature of width 100. You compute a local map using (3x1) convolutions with say 20 filters, and with padding this would give you say a 100 width x 20 depth local feature map. Then, to give the extreme example.. you could do a (100x1) "convolution" with 30 filters. This would give you back a global vector of 1 width, 30 depth, that would integrate information over the entire sequence.

Q1

Yes, it makes sense. However, depending on your application, I would also research BYOL or Barlow twins. There has been a lot of progress in unsupervised/semi-supervised learning of late. Also, I would look into using a transformer architecture based approach, rather than convolution, as these have a lot more power in 1D models, since they allow for "set" to "set" mappings. Depending on your sequence length, transformer could be a viable approach.

@JAEarly
Copy link
Author

JAEarly commented Apr 20, 2022

Thank you very much for you detailed response; it's very helpful!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants