First you need to get setup!
REQUIREMENTS:
- Windows PC ( This guide is written for a windows user, unfortunately I don't use Linux or MacOS so i can't help you there)
- CUDA-Capable GPU ( which should be most modern nvidia GPUs)
https://developer.nvidia.com/cuda-toolkit
https://docs.anaconda.com/free/miniconda/index.html After installation, verify that you've conda available:
Open up command prompt, verify install by typing in where conda
, it should show the path to the conda exe.
https://git-scm.com/download/win
You'd need Git in order to clone the repo!
git clone https://github.com/blewClue215/RVM_ON_SEGMENTS.git
Move into the repo root folder
cd RVM_ON_SEGMENTS
4.1 Open Command Prompt
- Make sure (base) is not active if it is then:
conda deactivate
conda create --name rvm python==3.8
conda activate rvm
4.2 Install Pytorch
- nvcc --version
- Pytorch needs to be compiled according to the response from the above command (the cuda version)
- Cuda 12.1:
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
- Cuda 11.8:
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
- Cuda 12.1:
This installs the core dependency of pytorch that this whole project requires!
Subject to changes based on this: https://pytorch.org/get-started/locally/
4.3. Once this is done make sure you're still in the project root folder (/RVM_ON_SEGMENTS)
pip install -r requirements_inference.txt
https://phoenixnap.com/kb/ffmpeg-windows We need this to split up the segments, and recombine them after
Before you start inferencing you need to split up long-form videos into segments!
- This makes the overall inference process more reliable as you wouldn't block the whole inference process due to one small decoding problem
- Becomes less of a memory hog, it can eat up a lot of memory or even run out of memory trying to infer on a 15 minutes 8K video!
- Plus you can stop the process at any time and restart the inference on remaining video segments because the inferenceCustom.py script is designed for it.
The shorter the segment, the more matting pops you will see when the video is combined but it also drastically reduces the amount of memory hog and improves reliability of the whole inference process (I like to go with 15 seconds here)
There should be a folder named the same as the video but with a suffix of "_segments" There should be segmented mp4 here + a file_list.txtInferencing is the act of "inferring" data from the input using the model that has been trained; in this case we want the segments to be used to infer alpha mattes from the model ( rvm_mobilenetv3.pth )
That's handled by inferenceCustom.py!
All you need to do is to:
I added this because sometimes the inference can run for up to 40 hours on 40 minutes of footage, so i prefer to let it shutdown itself after it's done to save power.
Ensure it is all True ( which means all the segments were processed successfully)!
Note: At any point you can stop or restart the inference process!
Restarting is simply dragging-and-dropping the segment folder onto the script again at which point the inference will be restarted for any file that is "False" in this .json, but you might want to check out why the video segment failed by playing it in the "segments" folder before you restart the process.
Now that inference is done and you have matted video segments, you need to combine them together!
This will produce a "COMPOSITE_SEGMENTS_COMBINED.mp4" but it will not have audio.
This will produce the matted video with the audio from the original video!
So by the end of it your folder should look something like this:
Now your matted video is ready to enjoy!
For the frozen images it could be any number of reasons!
So the steps to troubleshoot is:
- Go to the “segments_matted” folder and find the video segment that had frozen frames eg. Output_0001.mp4
- Go to the “segments” folder and find Output_0001.mp4, play it in your media player and see if it skips/weird pixel artifacts/purple screen
If it does, then that source segment failed to encode properly when segmenting:
- Resegment the video
- Copy the resegmented Output_0001.mp4 and pop it into a new folder, maybe name it “FIX_ME”
- Run inference on “FIX_ME” folder
- Once done, copy “FIX_ME/COMPOSITE/Output_0001.mp4” back to the “segments_matted” folder
If not, try the same steps as above but without resegmenting the video.
If the above does not work, the worst case scenario is to segment and reencode to H.265:
- Open up "1. DRAG AND DROP VIDEO TO SEGMENT HERE.bat"
- Copy and replace everything in that file with this:
@echo off
if "%~1" == "" (
echo Drag and drop a video file onto this batch file to split it into 1-minute segments.
pause
exit /b
)
set /p time="Time in seconds per segment:"
set input_file=%~1
set output_folder=%~dpn1_segments
mkdir "%output_folder%"
ffmpeg -i "%input_file%" -c:v libx265 -crf 18 -preset medium -c:a aac -b:a 128k -f segment -segment_time "%time%" -reset_timestamps 1 "%output_folder%\output_%%03d.mp4"
cd "%output_folder%"
(for %%i in (*.mp4) do @echo file '%%i') > file_list.txt
echo Video has been split into 1-minute segments and reencoded to H.265 with minimal loss.
pause
- Save
- Drag and drop video file on this.
WARNING: THIS WILL TAKE A LOT LONGER TO SEGMENT THE VIDEO BUT SHOULD MAKE IT WORK!