MediaVideo get_frames method might be unreliable for random frame access #945
Replies: 6 comments
-
Hi @h-mayorquin, Indeed this is a problem and something we advise users to work around by using particular compression settings we've tested:
A solution would be to directly use libav (via pyav) and handle seeking at the low-level, but this would be a ton of work. Other libraries like imageio or scikit-video could also serve as wrappers around just calling ffmpeg, but I think the performance isn't as ideal since they rely on calling the ffmpeg CLI and reading the raw image data over pipes (stdout). More fundamentally though, this is a video encoder problem. Some codecs just aren't intended for random frame access, so we're asking a lot of ffmpeg to begin with -- and part of why it's not exactly within the scope of OpenCV either. On our end, the best we would hope to be able to do is to at least be able to detect when this happens, and ideally make it easier to apply the re-encoding fix (#581). Let me know if you have other suggestions! Cheers, Talmo |
Beta Was this translation helpful? Give feedback.
-
Hi, @talmo I went through your documentation (which is super nice btw!). I am confused about something, what is the relationship between the compression ratio /decompression speed level set by |
Beta Was this translation helpful? Give feedback.
-
Concerning solutions I know only two projects which provide random frame access:
I was testing them today because I was trying to access timestamps through something other than OpenCV (it does not work well for that either as it does not return the starting time) and they seem easy to use and intuitive to me. I can't say anything about performance. Pims should be as fast as OpenCV as they both use c code to access libav but the devil is on the details. |
Beta Was this translation helpful? Give feedback.
-
Hi @h-mayorquin, I like this blog post from the folks at Loopbio who did a lot of digging into this problem and found (empirically?) that the Here's the actual definition of this preset:
This page is a great reference on x264 parameters (+ this and this) and together with the parameter defs can be used to add some more info on what's going on here: /* inter partitions */
param->analyse.inter = X264_ANALYSE_I8x8|X264_ANALYSE_I4x4;
/* motion estimation algorithm to use (X264_ME_*) */
param->analyse.i_me_method = X264_ME_DIA;
/* subpixel motion estimation quality */
param->analyse.i_subpel_refine = 1;
/* Maximum number of reference frames
--ref <integer> (x264)
-refs <integer> (FFmpeg)
One of H.264's most useful features is the abillity to reference frames other than the one immediately prior
to the current frame. This parameter lets one specify how many references can be used, through a maximum
of 16. Increasing the number of refs increases the DPB (Decoded Picture Buffer) requirement, which means
hardware playback devices will often have strict limits to the number of refs they can handle. In live-action
sources, more reference have limited use beyond 4-8, but in cartoon sources up to the maximum value of 16
is often useful. More reference frames require more processing power because every frame is searched by
the motion search (except when an early skip decision is made). The slowdown is especially apparent with
slower motion estimation methods. Recommended default: -refs 6 */
param->i_frame_reference = 1;
/* Allow each macroblock partition to have its own reference number */
param->analyse.b_mixed_references = 0;
/* Trellis RD quantization
--trellis <0,1,2> (x264)
-trellis <0,1,2> (FFmpeg)
0: disabled
1: enabled only on the final encode of a MB
2: enabled on all mode decisions
The main decision made in quantization is which coefficients to round up and which to round down. Trellis
chooses the optimal rounding choices for the maximum rate-distortion score, to maximize PSNR relative
to bitrate. This generally increases quality relative to bitrate by about 5% for a somewhat small speed cost.
It should generally be enabled. Note that trellis requires CABAC.
*/
param->analyse.i_trellis = 0;
/* Macroblock-tree ratecontrol. */
param->rc.b_mb_tree = 0;
/* weighting for P-frames */
param->analyse.i_weighted_pred = X264_WEIGHTP_SIMPLE;
/* Rate control lookahead.
--rc-lookahead
The ratecontrol lookahead (rc-lookahead) setting determines how far ahead the video buffer verifier (VBV) and
macroblock tree (mbtree) can look. Raising this can slightly increase memory use, but it's generally best to leave
this as high as possible. */
param->rc.i_lookahead = 0; Some of these will have an effect on decoding performance, but I'm not sure which ones are responsible for the seeking behavior we want. My guess would be In any case, assuming libav has better support for accurate seeking (like ffmpeg), then Pims might be the best option here for broader support of videos encoded differently. Thanks for pointing out Talmo |
Beta Was this translation helpful? Give feedback.
-
Thanks for your detailed answer. I will take a look in more detail to the links you provided. Meanwhile, I already read the blog post by loopbio as it is linked in your documentation (which is very great!). I am not sure if I see anything concerning the accuracy as they seem to focus on speed performance. I double checked right now and no luck. Maybe I am misreading but I don't see anything specifically mentioned about frame accuracy specifically other than it is desired. I just reach them to find out. |
Beta Was this translation helpful? Give feedback.
-
Totally right -- I'm not sure if I'm remembering from a personal conversation with them or just erroneously assumed they checked for accuracy as well. Let me know what they say if they get back to you! I'll just add that we have tested it empirically by dumping out the raw pixels and comparing the encoded version against it, and found that we did indeed get frame-exact seeking using the Going back to the discussion above, this table actually seems perfect for testing out the encoding parameters that lead to exact seeking using different readers. For example, it seems like The thing to do could be to try encoding a test set of videos with different motion complexity and trying out random seeking using OpenCV vs Pims vs ffmpeg CLI. If that rabbit hole sounds appealing to you, let us know and we might be able to provide support! |
Beta Was this translation helpful? Give feedback.
-
When I was looking at your data model for contributing the nwb writing features on sleap-io I noticed that you are using the following mechanism for extracting frames in your
MediaVideo
class.sleap/sleap/io/video.py
Lines 482 to 489 in eec8a00
I wanted to give you the heads up (maybe you already known) that said method is known to be not be reliable and the developers of OpenCV have stated (and shown) that it is low priority to repair or test this issue.
In summary if you use this method to read sequential data (as in from the beginning of the video to the end) most users report it works fine but random access is known to be unreliable. Some users are getting between 30 and -6 offsets and another reports a 500 ms offset. I am not sure if this matters in your application (maybe it does not in training but it does in labeling? not sure).
Anyway, I stumbled upon this as I was looking at how OpenCV handles timestamps as this is important for us and I wanted to let you know about this issue. I am also curios about what you think.
Beta Was this translation helpful? Give feedback.
All reactions