You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm using localAI with GPU A800.
I built it from source with make, e.g.
git clone https://github.com/go-skynet/LocalAI
cd LocalAI
make build
When I load a model, I will get such information as
12:46AM INF [llama-cpp] Attempting to load
12:46AM INF Loading model 'OpenBioLLM-8B-lora.gguf' with backend llama-cpp
12:46AM DBG Loading model in memory from file: /path/to/model.gguf
12:46AM DBG Loading Model OpenBioLLM-8B-lora.gguf with gRPC (file: /path/to/model.gguf) (backend: llama-cpp): {backendString:llama-cpp model:OpenBioLLM-8B-lora.gguf threads:16 assetDir:/tmp/localai/backend_data context:{emptyCtx:{}} gRPCOptions:0xc000431208 externalBackends:map[] grpcAttempts:20 grpcAttemptsDelay:2 singleActiveBackend:false parallelRequests:false}
12:46AM INF GPU device found but no CUDA backend present
12:46AM INF GPU device found but no CUDA backend present
12:46AM INF GPU device found but no CUDA backend present
12:46AM INF GPU device found but no CUDA backend present
12:46AM INF GPU device found but no CUDA backend present
12:46AM INF GPU device found but no CUDA backend present
12:46AM INF GPU device found but no CUDA backend present
12:46AM INF GPU device found but no CUDA backend present
12:46AM INF [llama-cpp] attempting to load with AVX2 variant
12:46AM DBG Loading GRPC Process: /tmp/localai/backend_data/backend-assets/grpc/llama-cpp-avx2
12:46AM DBG GRPC Service for OpenBioLLM-8B-lora.gguf will be running at: '127.0.0.1:42153'
12:46AM DBG GRPC Service state dir: /tmp/go-processmanager2034810530
12:46AM DBG GRPC Service Started
12:46AM DBG GRPC(OpenBioLLM-8B-lora.gguf-127.0.0.1:42153): stdout Server listening on 127.0.0.1:42153
12:46AM DBG GRPC Service Ready
12:46AM DBG GRPC: Loading model with options: {state:{NoUnkeyedLiterals:{} DoNotCompare:[] DoNotCopy:[] atomicMessageInfo:<nil>} sizeCache:0 unknownFields:[] Model:OpenBioLLM-8B-lora.gguf ContextSize:1024 Seed:1974897596 NBatch:512 F16Memory:true MLock:false MMap:true VocabOnly:false LowVRAM:false Embeddings:false NUMA:false NGPULayers:22 MainGPU: TensorSplit: Threads:16 LibrarySearchPath: RopeFreqBase:0 RopeFreqScale:0 RMSNormEps:0 NGQA:0 ModelFile:/path/to/model.gguf Device: UseTriton:false ModelBaseName: UseFastTokenizer:false PipelineType: SchedulerType: CUDA:false CFGScale:0 IMG2IMG:false CLIPModel: CLIPSubfolder: CLIPSkip:0 ControlNet: Tokenizer: LoraBase: LoraAdapter: LoraScale:0 NoMulMatQ:false DraftModel: AudioPath: Quantization: GPUMemoryUtilization:0 TrustRemoteCode:false EnforceEager:false SwapSpace:0 MaxModelLen:0 TensorParallelSize:0 MMProj: RopeScaling: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0 Type: FlashAttention:false NoKVOffload:false}
12:46AM DBG GRPC(OpenBioLLM-8B-lora.gguf-127.0.0.1:42153): stderr llama_model_loader: loaded meta data with 26 key-value pairs and 291 tensors from /path/to/model.gguf (version GGUF V3 (latest))
12:46AM DBG GRPC(OpenBioLLM-8B-lora.gguf-127.0.0.1:42153): stderr llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
---some information about model structure
12:46AM DBG GRPC(OpenBioLLM-8B-lora.gguf-127.0.0.1:42153): stdout {"timestamp":1722789978,"level":"INFO","function":"initialize","line":502,"message":"initializing slots","n_slots":1}
12:46AM DBG GRPC(OpenBioLLM-8B-lora.gguf-127.0.0.1:42153): stdout {"timestamp":1722789978,"level":"INFO","function":"initialize","line":511,"message":"new slot","slot_id":0,"n_ctx_slot":1024}
12:46AM INF [llama-cpp] Loads OK
I checked nvidia-smi and it shows GPU is not using. I tried to set f16: true in '.yaml' file but it didn't work.
The questions is How can I use GPU acceleration so that I can speed it up. Thanks!
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
I'm using localAI with GPU A800.
I built it from source with make, e.g.
When I load a model, I will get such information as
I checked nvidia-smi and it shows GPU is not using. I tried to set
f16: true
in '.yaml' file but it didn't work.The questions is How can I use GPU acceleration so that I can speed it up. Thanks!
Beta Was this translation helpful? Give feedback.
All reactions