Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TimSformer in TF, thanks #13

Open
junyongyou opened this issue Apr 20, 2021 · 7 comments
Open

TimSformer in TF, thanks #13

junyongyou opened this issue Apr 20, 2021 · 7 comments

Comments

@junyongyou
Copy link

Is anybody willing to implement TimSformer in Tensorfor2.+? I am trying to do that, but is struggling ...

@lucidrains
Copy link
Owner

@junyongyou hmm, i'm never doing tensorflow

@junyongyou
Copy link
Author

@junyongyou hmm, i'm never doing tensorflow

Aha, I know that. So I am just seeing there might be somebody else :).

@slimaneaymen
Copy link

slimaneaymen commented Apr 26, 2021

Hi everybody !!!
I am trying to implement TimeSFormer for VideoClassification using as input the feature maps of a CNN, my data have the shape (4,50,1,1,256) where:
mini_batch=4 / frames=50 / channels=1 / H=1 / W= 256
The parameters of the TimeSformer are :
TimeSformer(
dim = 128,
image_size = 256,
patch_size = 16,
num_frames = 50,
num_classes = 2,
depth = 12,
heads = 8,
dim_head = 32,
attn_dropout = 0.,
ff_dropout = 0.
)
In order to check if my network is working, I have tried to make it overfit by using only 6 training data and 2 validation data of the same shape as before (4,50,1,1,256).
But the accuracy I'm getting is in oscillation and never reaches a value >80% and my training loss is not decreasing it's always around 0.6900 - 06900
My training function and parameters are:
Capture1
Capture2
Capture3
image

I have also tried to train the modal on Frames of images instead of Feature map data, with an input of the shape (4,50,3,224,224) where:
mini_batch=4 / frames=50 / channels=3 / H=224 / W= 224
But Unfortunately, I am getting the same results.

I would appreciate any suggestion.
thank you

@junyongyou
Copy link
Author

Hi, I didn't think into your question carefully. However, I have some feelings that either your input shape (H=1) and/or such small number of training/val samples might be questionable.

@slimaneaymen
Copy link

Hi, @junyongyou,
concerning H, I have even tried with H=224/W=224
concerning the number of training/val I also have tried with large numbers ( 420)
but still giving the same results

@junyongyou
Copy link
Author

Hi, @junyongyou,
concerning H, I have even tried with H=224/W=224
concerning the number of training/val I also have tried with large numbers ( 420)
but still giving the same results

Sorry, I don't know the. Maybe you need to check your data first. From the screenshot, your train loss didn't reduce at all. I have tried the model in my experiment (not image recognition), it didn't give me very good performance but it indeed does something.

@slimaneaymen
Copy link

Please, could you explain more what do you mean by checking the data?
Regarding your experiment, could you tell me what was your hyperparameters for the training, like (which loss function, Lr,..)
Also, do you think my calculation of the accuracy (the first figure) was right?
Thank you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants