Skip to content

Latest commit

 

History

History
119 lines (89 loc) · 3.89 KB

README.md

File metadata and controls

119 lines (89 loc) · 3.89 KB

Rim

GitHub Workflow Status (with event) GitHub license GitHub contributors GitHub commit activity (branch) GitHub top language Open Issues Code Size GitHub all releases
GitHub release (with filter)

Rim, a Rust based Multi-Modal Hyper Caption Tool in Parallel, v3.0 released!

Features

  • support Universal image/video media mixed caption task
  • support OpenAI Models in Azure Platform, GPT-4o, GPT-4v
  • support Gemini Model in Google Cloud Platform, Gemini-1.5-flash, Gemini-1.5-pro
  • support Multi-Prompt with seperate naming space
  • support Optional Service Selection
  • support QPS config, default is 20 in parallel
  • support Limit config, default is first 100 jobs
  • support Seperate saving path for $MODEL/$PROMPT/$File.txt

Usage

Tip

rim assets/images/1.png -c config.toml --limit 100 --qps 20

For a single key on single project, we recommend using rim ${path} -c config.toml --limit 360.

Old Usage
  1. Single Image/Video Captioning:
rim -f ${file_path} -c `config.toml`

Rim generates a *.txt file containing the caption for a single image or video.

  1. Batch Image/Video Captioning:
rim -d ${dir_path} -c `config.toml`
  1. Batch of Batch:
DATA=/data
for i in $DATA/*; do [ -d "$i" ] && ./target/release/rim $i -c config.toml  --limit 1500 --qps 500 ; done

For a directory of images or videos, Rim generates a corresponding list of *.txt caption files.


  1. Rim will now generates a folder called xxx_cap contains *.txt caption files.
  2. Sample config.toml can be found in config.toml

Config

Creating a Sample Configuration (Unix):

cat <<EOF | tee config.toml
[[prompt]]
name = "simple"
value = "Caption this video."

[[prompt]]
name = "example"
value = "Provide a brief summary of the video content focusing on key themes and messages."

[azure]
api = [
    ['https://closedAI-1.openai.azure.com', 'sk-00000000000000000000000000000000', 'gpt-4o'],
    ['https://closedAI-2.openai.azure.com', 'sk-00000000000000000000000000000001', 'gpt-4v']
]

[gemini]
api = [
    ['https://generativelanguage.googleapis.com', 'AIza00000000000000000000000000000000000', 'gemini-1.5-flash-latest'],
    ['https://generativelanguage.googleapis.com', 'AIza00000000000000000000000000000000001', 'gemini-1.5-pro-latest'],
]
EOF

Nightly Build

curl -fsSL https://sh.rustup.rs | sh -s -- -y
. "$HOME/.cargo/env"
rustup update nightly && rustup default nightly

cargo build --release
./target/release/rim "assets/images" -c config.toml

Nightly Build with mirror

curl -fsSL https://sh.rustup.rs | sh -s -- -y
. "$HOME/.cargo/env"
echo """
[source.crates-io]
replace-with = 'mirror'

[source.mirror]
registry = 'sparse+https://mirrors.tuna.tsinghua.edu.cn/crates.io-index/'
""" | tee ${CARGO_HOME:-$HOME/.cargo}/config.toml
rustup update nightly && rustup default nightly

cargo build --release
./target/release/rim "assets/images" -c config.toml