TimSformer in TF, thanks #13

junyongyou · 2021-04-20T19:47:19Z

Is anybody willing to implement TimSformer in Tensorfor2.+? I am trying to do that, but is struggling ...

lucidrains · 2021-04-20T20:42:57Z

@junyongyou hmm, i'm never doing tensorflow

junyongyou · 2021-04-20T20:57:41Z

@junyongyou hmm, i'm never doing tensorflow

Aha, I know that. So I am just seeing there might be somebody else :).

slimaneaymen · 2021-04-26T06:13:13Z

Hi everybody !!!
I am trying to implement TimeSFormer for VideoClassification using as input the feature maps of a CNN, my data have the shape (4,50,1,1,256) where:
mini_batch=4 / frames=50 / channels=1 / H=1 / W= 256
The parameters of the TimeSformer are :
TimeSformer(
dim = 128,
image_size = 256,
patch_size = 16,
num_frames = 50,
num_classes = 2,
depth = 12,
heads = 8,
dim_head = 32,
attn_dropout = 0.,
ff_dropout = 0.
)
In order to check if my network is working, I have tried to make it overfit by using only 6 training data and 2 validation data of the same shape as before (4,50,1,1,256).
But the accuracy I'm getting is in oscillation and never reaches a value >80% and my training loss is not decreasing it's always around 0.6900 - 06900
My training function and parameters are:

I have also tried to train the modal on Frames of images instead of Feature map data, with an input of the shape (4,50,3,224,224) where:
mini_batch=4 / frames=50 / channels=3 / H=224 / W= 224
But Unfortunately, I am getting the same results.

I would appreciate any suggestion.
thank you

junyongyou · 2021-04-26T08:59:15Z

Hi, I didn't think into your question carefully. However, I have some feelings that either your input shape (H=1) and/or such small number of training/val samples might be questionable.

slimaneaymen · 2021-04-26T09:18:10Z

Hi, @junyongyou,
concerning H, I have even tried with H=224/W=224
concerning the number of training/val I also have tried with large numbers ( 420)
but still giving the same results

junyongyou · 2021-04-26T09:22:24Z

Hi, @junyongyou,
concerning H, I have even tried with H=224/W=224
concerning the number of training/val I also have tried with large numbers ( 420)
but still giving the same results

Sorry, I don't know the. Maybe you need to check your data first. From the screenshot, your train loss didn't reduce at all. I have tried the model in my experiment (not image recognition), it didn't give me very good performance but it indeed does something.

slimaneaymen · 2021-04-26T09:48:31Z

Please, could you explain more what do you mean by checking the data?
Regarding your experiment, could you tell me what was your hyperparameters for the training, like (which loss function, Lr,..)
Also, do you think my calculation of the accuracy (the first figure) was right?
Thank you

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TimSformer in TF, thanks #13

TimSformer in TF, thanks #13

junyongyou commented Apr 20, 2021

lucidrains commented Apr 20, 2021

junyongyou commented Apr 20, 2021

slimaneaymen commented Apr 26, 2021 •

edited

Loading

junyongyou commented Apr 26, 2021

slimaneaymen commented Apr 26, 2021

junyongyou commented Apr 26, 2021

slimaneaymen commented Apr 26, 2021

TimSformer in TF, thanks #13

TimSformer in TF, thanks #13

Comments

junyongyou commented Apr 20, 2021

lucidrains commented Apr 20, 2021

junyongyou commented Apr 20, 2021

slimaneaymen commented Apr 26, 2021 • edited Loading

junyongyou commented Apr 26, 2021

slimaneaymen commented Apr 26, 2021

junyongyou commented Apr 26, 2021

slimaneaymen commented Apr 26, 2021

slimaneaymen commented Apr 26, 2021 •

edited

Loading