Results on Youtobe #3

noUmbrella · 2019-12-23T14:30:50Z

Hi, I test your released code and model on Youtobe, but I can get the accuracy reported in the paper. Did you test this code on Youtobe?

seoungwugoh · 2019-12-24T02:15:29Z

The checkpoint in this repo is different from one for Youtube-VOS evaluation.
For Youtube-VOS evaluation, we did not use DAVIS videos for training.
(This gives us a minor improvement)

However, the provided checkpoint should also give similar numbers with one reported in our paper with minor degradation (about 1-2 lower Overall score).
How was your results?

For Youtube-VOS, there are some differences compared to DAVIS:

Some objects start to appear in the middle of video. In that case, we overwrite current mask with the new objects.
While evaluation server takes results computed every 5 frames, we use all the frames for estimation.
We first estimate masks for all the frames, then sample frames to submit from there.

noUmbrella · 2019-12-24T03:03:08Z

Great！It surprised me that using DAVIS videos for training will degrade the performance on Youtube-VOS. Thank you for sharing. I will retest it with your mentioned 1 and 2. Thanks.

sourabhswain · 2020-01-05T12:21:56Z

@seoungwugoh Can you please tell how to test the pretrained model on YouTube VOS ? I tried to use the YoutubeVOS dataset instead of DAVIS17, however, I seem to get empty masks as output.

seoungwugoh · 2020-01-09T02:54:58Z

Getting an empty mask seems to be due to bugs in the code.

siyueyu · 2020-01-09T08:16:02Z

@seoungwugoh In the case that Some objects start to appear in the middle of video then overwriting current mask with the new objects, will the overwritten mask include the old objects?

seoungwugoh · 2020-01-13T01:12:59Z

@siyueyu Yes, we overwrite the pixels belongs to the new object. Other pixels remain the same.

sourabhswain · 2020-01-14T10:37:58Z

@seoungwugoh I could get the correct masks now as predictions, however, I keep getting out of memory error when I test it on YouTubeVOS. I am using all the validation frames instead of every 5 frames. The GPU I am using is GTX 1080. Did you recommend using any particular configurations for YouTubeVOS ? I even played with the mem_every parameter, but still getting out of memory issues.

seoungwugoh · 2020-01-21T01:53:16Z

For YoutubeVOS, some videos are quite long (> 150 frames), it often cause OOM. GPU memories are mostly consumed by a large matrix inner-product when memory reading. We used V100 GPU which has 16GB memory and setting a larger mem_every parameter for some videos works well. To drastically reduce memory consumption. you can consider to use no intermediate memory frames (infinite mem_every). Another extreme solution will be convert that inner-product part to CPU if you afford additional computation time.

sourabhswain · 2020-01-27T18:57:00Z

@seoungwugoh Thanks for the suggestion. I ran it without any intermediate frame and could obtain the results. However, I see that it doesn't consider the masks of objects which start to appear after the first frame. I get no predictions for those objects. Looking at your suggestion above in this thread, you mention that "Some objects start to appear in the middle of video. In that case, we overwrite current mask with the new objects." I already modified the dataset.py. Is it already implemented in the uploaded code ? If not, can you point out where do we need to incorporate those changes ? Thank you.

sourabhswain · 2020-01-28T18:48:53Z

@seoungwugoh Also, to add to what I mentioned above, I get a score of 69.4 (compared to 78.4 in the paper) on YouTube validation set using the pre-trained model. Since, I used no intermediate memory frames, I guess by default it takes only the first and the previous frame.

npmhung · 2020-01-30T21:27:14Z

@seoungwugoh Hi, I'm trying to finetune your model.
In the paper, you claim that batchnorm is turned off for all experiments.
Just to be clear, do you turn off your batchnorm during the main training stage with video only or also during pre-training with images?

seoungwugoh · 2020-01-31T01:37:28Z

@sourabhswain Code in this repository does not contains functionality for evaluating Youtube-VOS. You should implement by yourself. But, It will not too difficult. To get a similar number with the paper, you should estimate masks for objects start to appear in the middle of video.

@npmhung We turned off Batchnorm for both pre-training and main-training. In other words, we use mean and var learned from ImageNet. This can be simply done by setting model.eval() during training.

hkchengrex · 2020-02-07T04:54:42Z

@seoungwugoh Is it possible for you to also provide the checkpoint used for Youtube-VOS evaluation (I'm ok without the code)? Thanks a lot!

sourabhswain · 2020-02-08T11:47:22Z

@seoungwugoh I made the changes specific to YouTube-VOS and now I can get a score 74.17. It's still a bit off from the score mentioned in the paper (78.4). Could it be just due to the different pretrained model which you uploaded here ? Or do you use different hyperparameters for YouTube-VOS ?

seoungwugoh · 2020-02-17T04:43:50Z

@sourabhswain It would be due to different weights. The number in paper (78.4) is measured using weights for Youtube-VOS. Unfortunately, we have no plan to upload weights for Youtube-VOS testing.

chenz97 · 2020-04-14T04:54:28Z

Hi @seoungwugoh , you mentioned that when objects started to appear in the middle of the video, you overwrite the current mask. So only the prev mask was impacted, and the first frame mask remains unchanged. However, for the objects that appear later, they cannot refer to the first frame for the GT mask (since the "first frame" for them is not the first frame of the video). Can this hurt the performance, or do you have any workaround for this? Thank you!

hkchengrex mentioned this issue Mar 25, 2022

Question regarding Table 1 z-x-yang/AOT#14

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Results on Youtobe #3

Results on Youtobe #3

noUmbrella commented Dec 23, 2019

seoungwugoh commented Dec 24, 2019 •

edited

Loading

noUmbrella commented Dec 24, 2019

sourabhswain commented Jan 5, 2020

seoungwugoh commented Jan 9, 2020 •

edited

Loading

siyueyu commented Jan 9, 2020

seoungwugoh commented Jan 13, 2020

sourabhswain commented Jan 14, 2020

seoungwugoh commented Jan 21, 2020 •

edited

Loading

sourabhswain commented Jan 27, 2020

sourabhswain commented Jan 28, 2020

npmhung commented Jan 30, 2020 •

edited

Loading

seoungwugoh commented Jan 31, 2020

hkchengrex commented Feb 7, 2020

sourabhswain commented Feb 8, 2020

seoungwugoh commented Feb 17, 2020

chenz97 commented Apr 14, 2020

Results on Youtobe #3

Results on Youtobe #3

Comments

noUmbrella commented Dec 23, 2019

seoungwugoh commented Dec 24, 2019 • edited Loading

noUmbrella commented Dec 24, 2019

sourabhswain commented Jan 5, 2020

seoungwugoh commented Jan 9, 2020 • edited Loading

siyueyu commented Jan 9, 2020

seoungwugoh commented Jan 13, 2020

sourabhswain commented Jan 14, 2020

seoungwugoh commented Jan 21, 2020 • edited Loading

sourabhswain commented Jan 27, 2020

sourabhswain commented Jan 28, 2020

npmhung commented Jan 30, 2020 • edited Loading

seoungwugoh commented Jan 31, 2020

hkchengrex commented Feb 7, 2020

sourabhswain commented Feb 8, 2020

seoungwugoh commented Feb 17, 2020

chenz97 commented Apr 14, 2020

seoungwugoh commented Dec 24, 2019 •

edited

Loading

seoungwugoh commented Jan 9, 2020 •

edited

Loading

seoungwugoh commented Jan 21, 2020 •

edited

Loading

npmhung commented Jan 30, 2020 •

edited

Loading