You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am currently using LateInteraction and BM25 model using fastembed library but my GPU is not being fully utilized! my provider is set to CUDAExecutionProvider, still only 4GB is being utilized out of 24GB!
What is the expected behaviour?
All of the available GPU memory should be utilized!
A minimal reproducible example
`def _initialize_colbert_model(gpu: bool):
"""Initialize ColBERT model with GPU or CPU based on the flag."""
provider = ["CUDAExecutionProvider"] if gpu else ["CPUExecutionProvider"]
use_cuda = True if gpu else False
logger.info(f"Initializing ColBERT model with {provider}")
return LateInteractionTextEmbedding("colbert-ir/colbertv2.0", providers=provider,cuda=use_cuda, parallel=0,local_files_only=LOCAL_FILES_ONLY)
def _initialize_sparse_bm25_model(gpu: bool):
"""Initialize FastEmbedSparse model with GPU or CPU based on the flag."""
provider = ["CUDAExecutionProvider"] if gpu else None
use_cuda = True if gpu else False
logger.info(f"Initializing FastEmbedSparse model with {'GPU' if gpu else 'CPU'}")
return FastEmbedSparse(providers=provider, cuda = use_cuda, parallel=0,local_files_only=LOCAL_FILES_ONLY)`
What happened?
I am currently using LateInteraction and BM25 model using fastembed library but my GPU is not being fully utilized! my provider is set to CUDAExecutionProvider, still only 4GB is being utilized out of 24GB!
What is the expected behaviour?
All of the available GPU memory should be utilized!
A minimal reproducible example
`def _initialize_colbert_model(gpu: bool):
"""Initialize ColBERT model with GPU or CPU based on the flag."""
provider = ["CUDAExecutionProvider"] if gpu else ["CPUExecutionProvider"]
use_cuda = True if gpu else False
logger.info(f"Initializing ColBERT model with {provider}")
return LateInteractionTextEmbedding("colbert-ir/colbertv2.0", providers=provider,cuda=use_cuda, parallel=0,local_files_only=LOCAL_FILES_ONLY)
def _initialize_sparse_bm25_model(gpu: bool):
"""Initialize FastEmbedSparse model with GPU or CPU based on the flag."""
provider = ["CUDAExecutionProvider"] if gpu else None
use_cuda = True if gpu else False
logger.info(f"Initializing FastEmbedSparse model with {'GPU' if gpu else 'CPU'}")
return FastEmbedSparse(providers=provider, cuda = use_cuda, parallel=0,local_files_only=LOCAL_FILES_ONLY)`
here is my docker-compose file:
`version: '3.8'
services:
web:
build:
context: .
dockerfile: Dockerfile-gpu
environment:
- GPU_DEPLOYMENT=TRUE
restart: always
ports:
- "5003:5003"
volumes:
- ./logs:/app/logs
deploy:
resources:
limits:
memory: 15g
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]`
What Python version are you on? e.g. python --version
python 3.12
FastEmbed version
fastembed-gpu==0.4.0
What os are you seeing the problem on?
Linux
Relevant stack traces and/or logs
No response
The text was updated successfully, but these errors were encountered: