Welcome to the TripleX repository! This project provides tools for downloading videos from supported websites and processing them using utilities like scene detection, trimming, frame analysis, and dataset creation for model training.
Reddit: https://www.reddit.com/r/NSFW_API
Discord: https://discord.gg/bW4Bhkfk
- Overview
- Features
- Directory Structure
- Installation
- Usage
- Contributing
- License
- Disclaimer
- Additional Notes
- Contact
TripleX is designed to help users download videos from supported websites and perform various processing tasks such as scene detection, trimming unwanted frames, analyzing frames using machine learning models, and creating datasets for training custom AI models. The toolkit is modular, allowing for easy addition of new downloaders and utilities.
- Video Downloaders: Currently supports downloading videos from xHamster. Designed to be extensible for other sites.
- Scene Detection: Uses PySceneDetect to split videos into individual scenes.
- Frame Trimming: Trims a specified number of frames from the beginning of videos.
- Frame Analysis: Analyzes frames extracted from videos using machine learning models for classification and detection.
- Dataset Creation: Facilitates the creation of datasets for training video generation AI models like Mochi LoRA.
- Modular Utilities: Easily add new utilities or downloaders to extend functionality.
.
├── LICENSE
├── README.md
├── data
│ ├── clips
│ ├── images
│ └── videos
├── downloaders
│ └── download_xhamster.py
├── guides
│ ├── fine_tuning_hunyuan_video_with_finetrainers.md <--- Adding this guide
│ └── fine_tuning_mochi_with_modal.md
├── requirements.txt
├── setup_models.py
└── utils
├── analyze_frames.py
├── extract_sharpest_frame.py
├── split_by_scene.py
├── training
│ └── hunyuan
│ └── output_clips_to_hunyuan_dataset <--- Adding this script
└── trim_frame_beginning.py
- downloaders/: Contains scripts for downloading videos from supported websites.
- models/: Directory where machine learning models will be downloaded and stored.
- data/: Default directory where videos and processed outputs are saved.
- videos/: Contains downloaded videos.
- clips/: Contains clips extracted from videos based on scene detection.
- images/: Contains extracted frames and analysis results.
- requirements.txt: Lists the Python dependencies required for the project.
- setup_models.py: Script to download machine learning models from Google Drive.
- utils/: Contains utility scripts for processing videos.
- split_by_scene.py: Splits videos into scenes.
- trim_frame_beginning.py: Trims frames from the beginning of videos.
- extract_sharpest_frame.py: Extracts the sharpest frame from a video.
- analyze_frames.py: Analyzes frames using machine learning models.
-
Clone the Repository
git clone https://github.com/NSFW-API/TripleX.git cd TripleX
-
Create a Virtual Environment
python3 -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
-
Install Dependencies
pip install -r requirements.txt
-
Install FFmpeg
-
macOS:
brew install ffmpeg
-
Ubuntu/Debian:
sudo apt-get install ffmpeg
-
Windows:
- Download FFmpeg from the official website.
- Add FFmpeg to your system PATH.
-
-
Install TensorFlow and Additional Dependencies
-
TensorFlow:
pip install tensorflow
-
OpenCV and NumPy:
pip install opencv-python numpy
-
-
Set Up Machine Learning Models
The machine learning models required for frame analysis are stored externally due to their size. Use the provided script to download and set up the models.
Instructions:
-
Run the Model Setup Script:
python setup_models.py
- This script will download the necessary model files from Google Drive and place them in the
models/
directory following the required structure. - Ensure you have an active internet connection.
- This script will download the necessary model files from Google Drive and place them in the
Note:
- The
setup_models.py
script handles downloading large model files that cannot be included directly in the repository due to size limitations.
-
The download_xhamster.py
script allows you to download videos from xHamster.
Note: Ensure you comply with all legal requirements and terms of service when downloading content.
Example Usage:
python downloaders/download_xhamster.py <video_url>
- Replace
<video_url>
with the actual URL of the xHamster video.
Instructions:
-
Run the Script:
python downloaders/download_xhamster.py <video_url>
-
The Video will be downloaded to
data/videos
.- The script automatically saves the downloaded video in the
data/videos
directory. - No additional input is required after providing the URL.
- The script automatically saves the downloaded video in the
The split_by_scene.py
script splits all videos in data/videos
into scenes based on content detection.
Example Usage:
python utils/split_by_scene.py
Instructions:
-
Run the Script:
python utils/split_by_scene.py
-
Processing:
- The script processes all videos in
data/videos
. - For each video, it creates a subdirectory within
data/clips
named after the video file (without extension). - The split scenes are saved in the respective subdirectories.
- The script processes all videos in
Notes:
- Content Detection Parameters:
- The script uses default parameters for scene detection (
threshold=15.0
,min_scene_len=15
). - If you wish to adjust these parameters, you can modify the default values directly in the script.
- The script uses default parameters for scene detection (
The trim_frame_beginning.py
script trims a specified number of frames from the beginning of all videos in data/videos
and its subdirectories.
Example Usage:
python utils/trim_frame_beginning.py [num_frames]
[num_frames]
(optional): The number of frames to trim from the beginning of each video. If not provided, the default is5
.
Instructions:
-
Run the Script:
-
To trim a specific number of frames:
python utils/trim_frame_beginning.py 10
- This command trims
10
frames from the beginning of each video.
- This command trims
-
To use the default number of frames (5):
python utils/trim_frame_beginning.py
-
-
Processing:
- The script processes all videos in
data/videos
and its subdirectories. - Overwrites the original video files after trimming.
- The script processes all videos in
Notes:
- Backup: Be cautious when overwriting files. It's recommended to keep backups if you might need the original files later.
- Adjusting the Default Number of Frames: If you frequently use a different number of frames, you can change the default value directly in the script.
The analyze_frames.py
script analyzes frames extracted from videos using machine learning models. It processes the sharpest frame from each video scene to classify and detect various elements.
Example Usage:
python utils/analyze_frames.py
Instructions:
-
Ensure Scenes are Available:
- Before running the analysis, make sure you have split the videos into scenes using
split_by_scene.py
. - The scenes should be located in
data/clips
.
- Before running the analysis, make sure you have split the videos into scenes using
-
Ensure Models are Set Up:
- Run
setup_models.py
as described in the Installation section to download and set up the required models.
- Run
-
Run the Script:
python utils/analyze_frames.py
-
Processing:
-
The script processes each video in
data/clips
. -
For each video, it extracts the sharpest frame using
extract_sharpest_frame.py
. -
The extracted frame is analyzed using the following models:
- Pose Classification: Classifies the pose in the frame.
- Watermark Detection: Detects any watermarks present.
- Genital Detection: Identifies regions in the frame.
- Penetration Detection: Detects specific activities.
-
-
Outputs:
- Analysis results are saved in JSON format in
data/images
. - The analyzed frames are also saved in
data/images
.
- Analysis results are saved in JSON format in
Notes:
-
Dependencies:
- Ensure that you have installed all necessary dependencies, including TensorFlow, OpenCV, and NumPy.
-
Model Requirements:
- The models are automatically downloaded and set up using
setup_models.py
.
- The models are automatically downloaded and set up using
-
Adjustable Parameters:
- The script processes all videos by default. You can modify it to process specific videos or frames as needed.
TripleX can be used to create customized datasets for training video generation AI models like Mochi LoRA. By processing and extracting frames from videos, you can generate datasets suitable for model training.
For a detailed guide on how to use this repository to create a dataset and train a Mochi LoRA model using Modal (a GPU app hosting platform), refer to the following article:
How to Train a Video Model Using TripleX and Mochi LoRA:
Instructions:
-
Prepare Your Dataset:
- Use the utilities provided in TripleX to download videos, split them into scenes, and extract frames.
- The frames and metadata generated can form the basis of your training dataset.
-
Follow the Guide:
- The linked guide provides step-by-step instructions on how to process the dataset created with TripleX and train a Mochi LoRA model.
- It includes information on setting up the training environment on Modal, configuring parameters, and running the training process.
Notes:
-
Model Training Considerations:
- Ensure that you have the rights and permissions to use the videos and frames for training purposes.
- Be mindful of data privacy, legal compliance, and ethical considerations when creating and using datasets.
-
Compatibility:
- The dataset created using TripleX should be compatible with the training requirements of the Mochi LoRA model as described in the guide.
Contributions are welcome! You can contribute to this project in the following ways:
- Create a New Downloader Script: Follow the structure of
download_xhamster.py
to create a downloader for another site. - Place the Script in the
downloaders/
Directory. - Testing: Thoroughly test your script to ensure it works reliably.
- Documentation: Update the README with instructions on how to use your new downloader.
- Submit a Pull Request: Once you're ready, submit a pull request for review.
- Create a New Utility Script: Develop your utility and place it in the
utils/
directory. - Explain the Utility: Provide clear instructions and examples on how to use your utility.
- Dependencies: If your utility requires additional Python packages, update
requirements.txt
. - Submit a Pull Request: Include information about the utility and its usage in your pull request.
-
Fork the Repository: Click the "Fork" button at the top-right corner of the repository page.
-
Clone Your Fork:
git clone https://github.com/your-username/TripleX.git
-
Create a New Branch:
git checkout -b feature/new-utility
-
Make Your Changes: Add your downloader or utility script.
-
Commit Your Changes:
git add . git commit -m "Add new utility for dataset creation"
-
Push to Your Fork:
git push origin feature/new-utility
-
Open a Pull Request: Go to the original repository and click "New Pull Request."
Note: Be cautious about including large files (e.g., model files) in your commits. If your changes involve large files, consider alternative methods such as hosting the files externally or using Git LFS (Large File Storage).
This project is licensed under the MIT License. See the LICENSE file for details.
- Legal Compliance: This toolkit is intended for educational and personal use. Users are responsible for ensuring they comply with all applicable laws, regulations, and terms of service of the websites they interact with.
- Content Rights: Downloading and distributing copyrighted material without permission may infringe on intellectual property rights.
- Content Sensitivity: Some downloaders may interact with websites containing sensitive content. Users should be aware of and comply with all legal age restrictions and content regulations in their jurisdiction.
- No Liability: The authors and contributors of this project are not liable for any misuse of the toolkit.
Thank you for using TripleX! If you have any questions or need assistance, feel free to open an issue on the repository or reach out to the maintainers.
-
Logging: The scripts log their activities, which can be helpful for debugging.
-
Error Handling: The scripts include basic error handling to inform you of issues that may arise during processing.
-
Dependencies: Ensure that all dependencies listed in
requirements.txt
are installed in your virtual environment. If you encounter issues, double-check that all required packages are installed.# requirements.txt requests scenedetect beautifulsoup4 tensorflow opencv-python numpy
-
FFmpeg: FFmpeg is a crucial dependency for video processing in this toolkit. Ensure that it is correctly installed and accessible from your system's PATH.
-
Python Version: This toolkit is developed for Python 3.x. Ensure you are using a compatible version of Python.
-
Model Files: The machine learning models required for frame analysis are downloaded using
setup_models.py
. Do not add these large files directly to the repository.
-
Git Limitations:
- Git repositories have file size limitations. Pushing files larger than 50 MB is generally discouraged, and files over 100 MB are rejected.
-
Using
setup_models.py
:- The
setup_models.py
script handles the downloading of large model files from external sources (e.g., Google Drive) and places them in the appropriate directories.
- The
-
Modifying
setup_models.py
:- If you have new models or updates, modify
setup_models.py
to include the new download links and ensure models are placed correctly.
- If you have new models or updates, modify
-
Alternative Methods:
- If you prefer, you can manually download the models and place them in the
models/
directory following the required structure.
- If you prefer, you can manually download the models and place them in the
-
Avoid Committing Large Files:
- Do not commit files larger than 50 MB to the repository.
-
Use External Hosting:
- For large files, host them on external services like Google Drive, and modify
setup_models.py
to download them.
- For large files, host them on external services like Google Drive, and modify
-
Git Large File Storage (LFS):
- Alternatively, consider using Git LFS for handling large files. Note that users cloning the repository will need to have Git LFS installed.
- Maintainer: NSFW API
- Email: [email protected]
- GitHub Issues: Please report any issues or bugs by opening an issue on the repository.
Feel free to reach out if you need any further assistance or have suggestions for improving the toolkit!