Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Correction of FrameUtility V3 #48

Open
wants to merge 30 commits into
base: master
Choose a base branch
from

Conversation

moi15moi
Copy link
Contributor

@moi15moi moi15moi commented Nov 7, 2023

Finally, this should be the last version of FrameUtility.
For reference, here are the previous version: #37, #46

Why it is needed?

The previous FrameUtility would not work with VFR video, now yes.
Finally, the previous FrameUtility was more a hack than anything else.

What has been done

  • Add TimeType Enum
  • Add ffmpeg dependency to get the video file timestamps
  • Add mkvtoolnix dependency to get the video file timestamps for mkv file since it is more performant then ffmpeg
  • Be able to create Timestamps from 3 methods (from_fps, from_video_file, from_timestamps_file)
  • Correction how to convert ms to ass_timestamps (Convert.time)
  • Update of the algorithm of Convert.ms_to_frames and Convert.frames_to_ms
  • Add many tests to be sure that we have the right behaviour
  • Update FrameUtility to use the Timestamps class.
  • Add proof for the algorith of ms_to_frames since it isn't intuitive.

What still need to be been done

  • Correct the github action workflow. I don't know why, but when I try to run it with act, the workflow blocks and stop doing anything
  • Update the logic of FrameUtility.add to use Timestamps. I don't understand what is the expected behaviour of this method.
  • Update pyonfx version
  • Pack ffmpeg with pyonfx. We could try to replicate this method

Copy link
Owner

@CoffeeStraw CoffeeStraw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reviewed the general logic of the code, suggested some structural changes and exposed some doubts that I couldn't figure out myself.

Once this is solved I can take a look at the details of the implementation, at the documentation and at the tests.

.github/workflows/ci.yml Outdated Show resolved Hide resolved
pyonfx/timestamps.py Outdated Show resolved Hide resolved
pyonfx/timestamps.py Outdated Show resolved Hide resolved


class Timestamps:
"""Timestamps object contains informations about the timestamps of an video.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This documentation will need to have a TLDR on top, as a normal user will not be interested in all the technical details. We can put all the technical discussion in a note, and put the TLDR inside the description of the parameter "rounding_method".

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moreover, users reading documentation need to know that they need to call one of the factory methods (from_fps, from_timestamps_file, from_video_file).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this sound good to you?

"""Timestamps object contains informations about the timestamps of an video.
Both Constant Frame Rate (CFR) and Variable Frame Rate (VFR) videos are supported.

To create an Timestamps object, you need to call one of the following method ``from_fps``, ``from_timestamps_file``, ``from_video_file``.
Ex:
	timestamps = Timestamps.from_fps(Fraction(24000, 1001))


Parameters:
	rounding_method (RoundingMethod): A rounding method. 99% of the time, you want to use RoundingMethod.ROUND.
		Note:
			Video player have 2 methods to deal with timestamps. Some floor them and other round them.
			This can lead to difference when displaying the subtitle.
				Ex:
					Player - Method - proof
					mpv    - round  - https://github.com/mpv-player/mpv/blob/7480efa62c0a2a1779b4fdaa804a6512aa488400/sub/sd_ass.c#L499
					FFmpeg - floor  - https://github.com/FFmpeg/FFmpeg/blob/fd1712b6fb8b7acc04ccaa7c02b9a5c9f233cfb3/libavfilter/vf_subtitles.c#L194-L196
					VLC    - floor  - https://code.videolan.org/videolan/vlc/-/blob/df6394ea8003e035a281b6818e6432c7d492ed2f/modules/codec/libass.c#L453-454
									  https://code.videolan.org/videolan/vlc/-/blob/df6394ea8003e035a281b6818e6432c7d492ed2f/include/vlc_tick.h#L120-132
					MPC-HC - floor  - https://github.com/clsid2/mpc-hc/blob/0994fd605a9fb4d15806d0efdd6399ba1bc5f984/src/Subtitles/LibassContext.cpp#L843
			Important note:
				Matroska (.mkv) file are an exception !!!
				If you want to be compatible with mkv, use RoundingMethod.ROUND.
				By default, they only have a precision to milliseconds instead of nanoseconds like most format.
					For more detail see:
						1- https://mkvtoolnix.download/doc/mkvmerge.html#mkvmerge.description.timestamp_scale
						2- https://matroska.org/technical/notes.html#timestampscale-rounding
	timestamps (List[int], optional): A list of [timestamps](https://en.wikipedia.org/wiki/Timestamp) in milliseconds encoded as integers.
							It represent each frame [presentation timestamp (PTS)](https://en.wikipedia.org/wiki/Presentation_timestamp)
	normalize (bool, optional): If True, it will shift the timestamps to make them start from 0. If false, the option does nothing.
	fpms (Fraction, optional): The fpms.
	last_frame_time (Fraction, optional): The last frame time not rounded.
"""

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better, but it's still not correct, because you created a more general method (and not only one that "contains information about the timestamps of a video").

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you update the doc for this?
I don't understand what you want

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I will do it later

pyonfx/timestamps.py Outdated Show resolved Hide resolved
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The math is correct, but we are missing the most important explanation: why, to obtain a frame, you need to add 0.5 to the ms BEFORE multiplying with fps * 1/1000. Why isn't it done just by:

round(ms * fps * 1/1000)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which line are you talking about?
The general formula is: ms=method_of_rounding(frame * fps *1/1000) where method_of_rounding is floor or round

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mhhhh... at a second glance, the formula isn't even right: you are talking about "ms_to_frames", but the formula is actually for "frames_to_ms"

That aside, my understanding is that the result of this formula yields a floating point number, but a frame is an integer. We must then decide which integer we should return.

In python3, there is the round method for that, but reading the code from the other (mpv/mpc-hc/libass...) it seems like everyone has opted for what you implemented as well for PyonFX. I can't understand why there was a need to implement a custom "round" and "floor" method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mhhhh... at a second glance, the formula isn't even right: you are talking about "ms_to_frames", but the formula is actually for "frames_to_ms"

Warning, I start with ms=method_of_rounding(frame * fps *1/1000) which is effectively frames_to_ms, but from this equation, I isolate frame, so it become ms_to_frames !

In python3, there is the round method for that, but reading the code from the other (mpv/mpc-hc/libass...) it seems like everyone has opted for what you implemented as well for PyonFX. I can't understand why there was a need to implement a custom "round" and "floor" method.

Python round isn't the same has the mathematical one. Don't ask me why, I don't know.
Ex:
Python: round(0.5) = 0
Mathematic: round(0.5) = 1

This is why I implemented a custom round.
Now for the floor, simply to not import the one form math.floor. This is purely esthetic.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mhhhh... Ok.
Then, what I would do is to expose our doubts in the python documentation.

We need to say that we just mimicked what other people did, and that we don't have the full knowledge about this matter. I'll do it later

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then, what I would do is to expose our doubts in the python documentation.

Which doubts are you talking about? The round method doesn't follow the mathematical rounding convention. See this article for more information.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was just wondering why the authors decided to go for a custom rounding method.
It doesn't seem to me that:

# We use the upper bound
upper_bound = (ms + 0.5) * fps * 1/1000
# Then, we trunc the result
trunc_frame = int(upper_bound)

# If the upper_bound equals to the trunc_frame, this means that we don't respect the inequation because it is "greater than", not "greater than or equals".
# So if it happens, this means we need to return the previous frame
if upper_bound == trunc_frame:
    return trunc_frame - 1
else:
    return trunc_frame

does implement the mathematical rounding. To me it seems like just a... strange rounding.

The mathematical rounding for this case would just be:

int(ms * fps * 1/1000 + 0.5)

If you don't have an explanation as of why they implemented this rounding it's ok, but it seems important to me for it to be documented.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was just wondering why the authors decided to go for a custom rounding method.

Because frames_to_ms needs to be exactly the opposite of ms_to_frames. I am the authors of these equation (I mean, I didn't found them online. It is purely math deduction).
To calculate frames_to_ms, it is well known that the equation is: $ms = roundingMethod(frame \times {1 \over fps} \times 1000)$.

Now, from the equation, our goal is to create a formula to be able to calculate an ms_to_frames function. To do so, we need to isolate $frame$ from the equation.

Since there are two possible roundingMethod options (floor, round), there are two different inequations that can be deduced from the original equation, which are written in the document.
Additionally, from the inequation, we can also deduce an algorithm

The mathematical rounding for this case would just be:

int(ms * fps * 1/1000 + 0.5)

No, because $frame = round(ms \times fps \times {1 \over 1000})$ isn't perfectly the inverse of $ms = round(frame \times {1 \over fps} \times 1000)$. Important to note, here the rounding method round up, so, if it encounter $round(0.5)$, it will become $1$.

Here is a proof by contradiction

$fps = 24000/1001$
$ms = 105 $

With your method $frame = round(ms \times fps \times {1 \over 1000})$

$$\begin{gather} frame = round(ms \times fps \times {1 \over 1000}) \\\ frame = round(105 \times {24000 \over 1001} \times {1 \over 1000}) \\\ frame = round({360 \over 143}) \\\ frame = 3 \end{gather}$$

With what is wrotten in the proof doc

$$\begin{gather} upperBound = (ms + 0.5) \times fps * {1 \over 1000} \\\ upperBound = {2532 \over 1001} \\\ truncFrame = int(upperBound ) = 2 \\\ frame = truncFrame = 2 \end{gather}$$

Little reminder to show which frame correspond to which ms

$$\begin{gather} fps = 24000/1001 \\\ Frame_0 : [0, 42[ ms \\\ Frame_1 : [42, 83[ ms \\\ Frame_2 : [83, 125[ ms \\\ Frame_3 : [125, 167[ ms \end{gather}$$

Conclusion

Like you can see, your method return $frame = 3$, but the one in the doc return $frame = 2$ which is the right result.

pyonfx/convert.py Show resolved Hide resolved
pyonfx/convert.py Show resolved Hide resolved
pyonfx/convert.py Show resolved Hide resolved
pyonfx/utils.py Outdated Show resolved Hide resolved
Comment on lines +32 to +34
class RoundingMethod(Enum):
FLOOR: Callable[[Fraction], int] = lambda ms: int(ms)
ROUND: Callable[[Fraction], int] = lambda ms: int(ms + Fraction("0.5"))
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no need to define them as callable, as we never call them. Simply keeping them as enum should be enough.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"""

if os.path.isfile(path_to_timestamps_file_or_content):
with open(path_to_timestamps_file_or_content, "r") as f:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may need to specify the encoding to "utf-8" because timestamps can contains comments with non-ascii character.

mkvtoolnix doesn't seems to care about the encoding. It will just convert the bytes to char, so equivalent to ascii.

But if we set the encoding to "ascii", it will just raise an exception when the file contains a non-ascii char, so it would be safer to use utf-8

What do you think about it?

ffprobe can be slow for big mkv file (1 gb or more)
mkvextract is way faster, so if it is available, we use it
@CoffeeStraw
Copy link
Owner

CoffeeStraw commented Jan 16, 2024

Had another quick glance at the code: do you think we could avoid uploading a new font (thus making use of the font we already fetch in the workflow)?

Also, how much does it take to generate the fake videos? Could we generate them on the fly while testing to keep the codebase minimal?

Finally, folder name "timestamps" doesn't seem to me to represent well what's inside. If you manage to remove the file I told you, we could drop it and move the python generator file outside of it, naming it as something like "utils.py".

@moi15moi
Copy link
Contributor Author

Had another quick glance at the code: do you think we could avoid uploading a new font (thus making use of the font we already fetch in the workflow)?

If i do that, how could we generate the video locally (sewhen we don't run the workflow but simply running the file tests/timestamps/generate_test_video.py)? We need to have the font to generate the video.

Also, how much does it take to generate the fake videos? Could we generate them on the fly while testing to keep the codebase minimal?

Yes we could. It take less then 10 seconds. Important to note, we need to have installed mkvmerge AND ffmpeg to be able to run that script.

Finally, folder name "timestamps" doesn't seem to me to represent well what's inside. If you manage to remove the file I told you, we could drop it and move the python generator file outside of it, naming it as something like "utils.py".

If you are talking about the folder tests/timestamps, there is no need to rename the file generate_test_video.py to utils.py. I provided the script since I guess it can be useful for the maintenance to know how those videos where generated. A user will never call this script.
All the test in tests/test_timestamps.py use the file in the timestamps folder as source which if why I named it like that.


if ffprobe_output_dict["format"]["format_name"] == "matroska,webm":
# We only do this check for .mkv file. See the note about mkv in the class documentation
time_base = Fraction(ffprobe_output_dict["streams"][0]["time_base"])
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, the _from_mkvextract method doesn't perform a check for the time_base like ffmpeg because when I wrote this code, it was a information that mkvtoolnix didn't expose.
But, I asked if it could be added here.
It was introduced in version 82.0, which was released on 2 January 2024. We could perform a version check and if the user has a V82 or higher, then we perform the check.

Also, I propose to add a new file named mkvtoolnix.py and extract the timestamps in that file since timestamps.py is becoming too big.

video_path,
"-J",
]
mkvmerge_output = subprocess.run(cmd, capture_output=True, text=True)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add --command-line-charset UTF-8 --output-charset UTF-8 to the command. It is necessary when the user asks for a filepath that contains special characters that the locale encoding (which be queried with locale.getencoding()) doesn't support.

Suggested change
mkvmerge_output = subprocess.run(cmd, capture_output=True, text=True)
cmd = [
mkvmerge_path,
video_path,
"-J",
"--command-line-charset", "UTF-8",
"--output-charset", "UTF-8",
]
mkvmerge_output = subprocess.run(cmd, capture_output=True, text=True, encoding="utf-8")

"0:" + timestamps_file_path
]

mkvextract_output = subprocess.run(cmd, capture_output=True, text=True)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same problem has here

"-print_format",
"json",
]
ffprobe_output = subprocess.run(cmd, capture_output=True, text=True)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ffprobe seems to always output the result in utf-8 has said here.

Suggested change
ffprobe_output = subprocess.run(cmd, capture_output=True, text=True)
ffprobe_output = subprocess.run(cmd, capture_output=True, text=True, encoding="utf-8")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants