CLIP Video Search

CLIP (Contrastive Language–Image Pre-training) is a technique which efficiently learns visual concepts from natural language supervision. CLIP has found applications in stable diffusion.

This repository aims act as a POC in exploring the ability to use CLIP for video search using natural language outlined in the article found here.

Usage

start up the inference zmq server found in the ./inference directory python3 zmq_server.py.
start up the go server with go run main.go.

Before running this example, please ensure that your environment is correctly configured and the application is running without errors.

curl -X POST -d '{"videoURI": "<path_to_dir>/examples/videos/<video_name>.mp4" }' http://localhost:3000/insert

note: it can take a moment for the video to become searchable.

curl -X POST -d '{"input": "a man cutting pepper", "maxResults": 1 }' http://localhost:3000/search

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
examples/videos		examples/videos
inference		inference
internal		internal
pkg		pkg
.gitignore		.gitignore
go.mod		go.mod
go.sum		go.sum
main.go		main.go
readme.md		readme.md