MediaVideo get_frames method might be unreliable for random frame access #945

h-mayorquin · 2022-09-06T16:36:15Z

h-mayorquin
Sep 6, 2022

When I was looking at your data model for contributing the nwb writing features on sleap-io I noticed that you are using the following mechanism for extracting frames in your MediaVideo class.

sleap/sleap/io/video.py

Lines 482 to 489 in eec8a00

    
           def get_frame(self, idx: int, grayscale: bool = None) -> np.ndarray: 
        
               """See :class:`Video`.""" 
        
               with self.__lock: 
        
                   if self.__reader.get(cv2.CAP_PROP_POS_FRAMES) != idx: 
        
                       self.__reader.set(cv2.CAP_PROP_POS_FRAMES, idx) 
        
                   success, frame = self.__reader.read()

I wanted to give you the heads up (maybe you already known) that said method is known to be not be reliable and the developers of OpenCV have stated (and shown) that it is low priority to repair or test this issue.

In summary if you use this method to read sequential data (as in from the beginning of the video to the end) most users report it works fine but random access is known to be unreliable. Some users are getting between 30 and -6 offsets and another reports a 500 ms offset. I am not sure if this matters in your application (maybe it does not in training but it does in labeling? not sure).

Anyway, I stumbled upon this as I was looking at how OpenCV handles timestamps as this is important for us and I wanted to let you know about this issue. I am also curios about what you think.

talmo · 2022-09-06T16:47:15Z

talmo
Sep 6, 2022
Maintainer

Hi @h-mayorquin,

Indeed this is a problem and something we advise users to work around by using particular compression settings we've tested:

ffmpeg -y -i "input.mp4" -c:v libx264 -pix_fmt yuv420p -preset superfast -crf 23 "output.mp4"

A solution would be to directly use libav (via pyav) and handle seeking at the low-level, but this would be a ton of work.

Other libraries like imageio or scikit-video could also serve as wrappers around just calling ffmpeg, but I think the performance isn't as ideal since they rely on calling the ffmpeg CLI and reading the raw image data over pipes (stdout).

More fundamentally though, this is a video encoder problem. Some codecs just aren't intended for random frame access, so we're asking a lot of ffmpeg to begin with -- and part of why it's not exactly within the scope of OpenCV either.

On our end, the best we would hope to be able to do is to at least be able to detect when this happens, and ideally make it easier to apply the re-encoding fix (#581).

Let me know if you have other suggestions!

Cheers,

Talmo

0 replies

h-mayorquin · 2022-09-06T19:07:23Z

h-mayorquin
Sep 6, 2022
Author

Hi, @talmo

I went through your documentation (which is super nice btw!). I am confused about something, what is the relationship between the compression ratio /decompression speed level set by -preset and the problem of reliably seeking frames on OpenCV? The only connection that I can make (by re-reading the thread shared at the beginning) is that by this probably leads to less duplicates frames and therefore makes the problem less likely to appear (as it is less likely to throw an out of frame error). However, it is not clear to me that this addresses -I don't know if it can be addressed- the silent error of getting incorrect frames (i.e. asking for frame 1000 and getting frame 1030). I did some googling and I could not find a connection.

0 replies

h-mayorquin · 2022-09-06T19:15:23Z

h-mayorquin
Sep 6, 2022
Author

Concerning solutions I know only two projects which provide random frame access:

Pims with their PyAVReaderIndexed reader (this uses pyav).
deffcode which claims to provide very fast random frame access by piping ffmpeg through sub-processes.

I was testing them today because I was trying to access timestamps through something other than OpenCV (it does not work well for that either as it does not return the starting time) and they seem easy to use and intuitive to me. I can't say anything about performance. Pims should be as fast as OpenCV as they both use c code to access libav but the devil is on the details.

0 replies

talmo · 2022-09-06T20:28:52Z

talmo
Sep 6, 2022
Maintainer

Hi @h-mayorquin,

I like this blog post from the folks at Loopbio who did a lot of digging into this problem and found (empirically?) that the superfast preset seemed to give fairly performant (and accurate) random frame seeking.

Here's the actual definition of this preset:
https://github.com/mirror/x264/blob/35fe20d1ba49918ec739a5b068c208ca82f977f7/common/base.c#L517-L528

    else if( !strcasecmp( preset, "superfast" ) )
    {
        param->analyse.inter = X264_ANALYSE_I8x8|X264_ANALYSE_I4x4;
        param->analyse.i_me_method = X264_ME_DIA;
        param->analyse.i_subpel_refine = 1;
        param->i_frame_reference = 1;
        param->analyse.b_mixed_references = 0;
        param->analyse.i_trellis = 0;
        param->rc.b_mb_tree = 0;
        param->analyse.i_weighted_pred = X264_WEIGHTP_SIMPLE;
        param->rc.i_lookahead = 0;
    }

This page is a great reference on x264 parameters (+ this and this) and together with the parameter defs can be used to add some more info on what's going on here:

/* inter partitions */
param->analyse.inter = X264_ANALYSE_I8x8|X264_ANALYSE_I4x4;

/* motion estimation algorithm to use (X264_ME_*) */
param->analyse.i_me_method = X264_ME_DIA;

/* subpixel motion estimation quality */
param->analyse.i_subpel_refine = 1;

/* Maximum number of reference frames
--ref <integer> (x264)
-refs <integer> (FFmpeg)
One of H.264's most useful features is the abillity to reference frames other than the one immediately prior 
to the current frame. This parameter lets one specify how many references can be used, through a maximum 
of 16. Increasing the number of refs increases the DPB (Decoded Picture Buffer) requirement, which means 
hardware playback devices will often have strict limits to the number of refs they can handle. In live-action 
sources, more reference have limited use beyond 4-8, but in cartoon sources up to the maximum value of 16
 is often useful. More reference frames require more processing power because every frame is searched by 
the motion search (except when an early skip decision is made). The slowdown is especially apparent with 
slower motion estimation methods. Recommended default: -refs 6 */
param->i_frame_reference = 1;

/* Allow each macroblock partition to have its own reference number */
param->analyse.b_mixed_references = 0;

/* Trellis RD quantization
--trellis <0,1,2> (x264)
-trellis <0,1,2> (FFmpeg)
0: disabled

1: enabled only on the final encode of a MB

2: enabled on all mode decisions

The main decision made in quantization is which coefficients to round up and which to round down. Trellis 
chooses the optimal rounding choices for the maximum rate-distortion score, to maximize PSNR relative 
to bitrate. This generally increases quality relative to bitrate by about 5% for a somewhat small speed cost. 
It should generally be enabled. Note that trellis requires CABAC.
*/
param->analyse.i_trellis = 0;

/* Macroblock-tree ratecontrol. */
param->rc.b_mb_tree = 0;

/* weighting for P-frames */
param->analyse.i_weighted_pred = X264_WEIGHTP_SIMPLE;

/* Rate control lookahead.
--rc-lookahead
The ratecontrol lookahead (rc-lookahead) setting determines how far ahead the video buffer verifier (VBV) and 
macroblock tree (mbtree) can look. Raising this can slightly increase memory use, but it's generally best to leave 
this as high as possible. */
param->rc.i_lookahead = 0;

Some of these will have an effect on decoding performance, but I'm not sure which ones are responsible for the seeking behavior we want. My guess would be i_frame_reference and analyse.b_mixed_references since they seem to affect the i-frame/b-frame behavior.

In any case, assuming libav has better support for accurate seeking (like ffmpeg), then Pims might be the best option here for broader support of videos encoded differently. Thanks for pointing out PyAVReaderIndexed -- this looks like exactly the right thing to do!

Talmo

0 replies

h-mayorquin · 2022-09-06T20:38:12Z

h-mayorquin
Sep 6, 2022
Author

Thanks for your detailed answer. I will take a look in more detail to the links you provided. Meanwhile, I already read the blog post by loopbio as it is linked in your documentation (which is very great!). I am not sure if I see anything concerning the accuracy as they seem to focus on speed performance. I double checked right now and no luck. Maybe I am misreading but I don't see anything specifically mentioned about frame accuracy specifically other than it is desired. I just reach them to find out.

0 replies

talmo · 2022-09-06T22:54:44Z

talmo
Sep 6, 2022
Maintainer

Totally right -- I'm not sure if I'm remembering from a personal conversation with them or just erroneously assumed they checked for accuracy as well. Let me know what they say if they get back to you!

I'll just add that we have tested it empirically by dumping out the raw pixels and comparing the encoded version against it, and found that we did indeed get frame-exact seeking using the superfast preset. That said, I'm not sure that that wouldn't change under some conditions like if the background is dynamic and complex, causing some of the other x264 features to kick in.

Going back to the discussion above, this table actually seems perfect for testing out the encoding parameters that lead to exact seeking using different readers. For example, it seems like superfast disables the MB Tree and lookahead entirely. It also disables mixed references (analyse.b_mixed_references) which might also have an impact here.

The thing to do could be to try encoding a test set of videos with different motion complexity and trying out random seeking using OpenCV vs Pims vs ffmpeg CLI. If that rabbit hole sounds appealing to you, let us know and we might be able to provide support!

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MediaVideo get_frames method might be unreliable for random frame access #945

{{title}}

Replies: 6 comments

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

MediaVideo get_frames method might be unreliable for random frame access #945

h-mayorquin Sep 6, 2022

Replies: 6 comments

talmo Sep 6, 2022 Maintainer

h-mayorquin Sep 6, 2022 Author

h-mayorquin Sep 6, 2022 Author

talmo Sep 6, 2022 Maintainer

h-mayorquin Sep 6, 2022 Author

talmo Sep 6, 2022 Maintainer

h-mayorquin
Sep 6, 2022

talmo
Sep 6, 2022
Maintainer

h-mayorquin
Sep 6, 2022
Author

h-mayorquin
Sep 6, 2022
Author

talmo
Sep 6, 2022
Maintainer

h-mayorquin
Sep 6, 2022
Author

talmo
Sep 6, 2022
Maintainer