Dataset section #2

jpainam · 2024-07-31T18:33:24Z

Hi. Thanks for releasing the code.

Can you provide details in the readme about the dataset preparation? I see a get_dataset that generates a toy_dataset with shape (10000, 2), while extracting feature from UCF_Crime will likely give me (N, 16, 1152). N is the number of frames

The text was updated successfully, but these errors were encountered:

jakubmicorek · 2024-08-06T14:11:47Z

Hi,

for the object-centric approach we used the features provided by Accurate-Interpretable-VAD.

For the frame-centric approach we use the Hiera backbone. For 16 consecutive RGB frames of shape [1, 3, 16, 224, 224] we extract a d-dimensional feature vector just before the classification head. In the case of Hiera-Large we obtain a feature vector of [1, 1152] for the 16 frames. We use the center frame as the ground-truth label. To get the features and ground-truth labels for the whole video clip we extract the features in a rolling window fashion.

jpainam · 2024-08-06T15:42:21Z

Can you be more explicit about what you mean by rolling window fashion?
Given 64 consecutive frames. Do you build your windows in this fashion

[0, 16], [16, 32], [32, 48], [48, 64] or
[0, 16], [1, 17], [2, 18], [3, 19], ... [48, 64]

Both refer to a windowing approach.

Haifu-Ye · 2024-08-13T07:10:47Z

Hi,

for the object-centric approach we used the features provided by Accurate-Interpretable-VAD.

For the frame-centric approach we use the Hiera backbone. For 16 consecutive RGB frames of shape [1, 3, 16, 224, 224] we extract a d-dimensional feature vector just before the classification head. In the case of Hiera-Large we obtain a feature vector of [1, 1152] for the 16 frames. We use the center frame as the ground-truth label. To get the features and ground-truth labels for the whole video clip we extract the features in a rolling window fashion.

Hello！Have you solved your problem？ I'm also reproducing the effect on the Avenue dataset but I'm stuck because I don't have the appropriate processing code for it.

jpainam · 2024-08-13T14:19:21Z

@Haifu-Ye I decided to go with the first approach - no-overlapping frame
[0, 16], [16, 32], [32, 48], [48, 64] and use the label of the middle frame as the clip(window)'s label. i.e., label of the frame at start_frame + 8

I'm using UCF Crime

But the performance I get are far from the ones reported in the paper.

Haifu-Ye · 2024-10-25T05:36:55Z

@Haifu-Ye I decided to go with the first approach - no-overlapping frame [0, 16], [16, 32], [32, 48], [48, 64] and use the label of the middle frame as the clip(window)'s label. i.e., label of the frame at start_frame + 8

I'm using UCF Crime

But the performance I get are far from the ones reported in the paper.

hi！I want to try to use the shanghaitech dataset, but it seems that the format of the dataset in extract_shanghaitech_frames.py is not the same as that of the official shanghaitech dataset, however, the download link for the shanghaitech dataset in the script doesn't work, and I'd like to know how other people I would like to know how other people solve this problem.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dataset section #2

Dataset section #2

jpainam commented Jul 31, 2024 •

edited

Loading

jakubmicorek commented Aug 6, 2024 •

edited

Loading

jpainam commented Aug 6, 2024 •

edited

Loading

Haifu-Ye commented Aug 13, 2024

jpainam commented Aug 13, 2024 •

edited

Loading

Haifu-Ye commented Oct 25, 2024

Dataset section #2

Dataset section #2

Comments

jpainam commented Jul 31, 2024 • edited Loading

jakubmicorek commented Aug 6, 2024 • edited Loading

jpainam commented Aug 6, 2024 • edited Loading

Haifu-Ye commented Aug 13, 2024

jpainam commented Aug 13, 2024 • edited Loading

Haifu-Ye commented Oct 25, 2024

jpainam commented Jul 31, 2024 •

edited

Loading

jakubmicorek commented Aug 6, 2024 •

edited

Loading

jpainam commented Aug 6, 2024 •

edited

Loading

jpainam commented Aug 13, 2024 •

edited

Loading