Skip to content

letsRTFM/AI-Workstation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 

Repository files navigation

Dual 3090 AI Inference Workstation

Videos

Hardware

Software

Configuration

Install any Nvidia Drivers and Cuda Dependencies:

https://docs.nvidia.com/cuda/cuda-installation-guide-linux/
Reboot

Disable Xorg binding to nvidia cards:

You will find that xorg places processes on your GPUs rather than using the iGPU on our CPU. This can cause out of memory errors when running AI workloads.

To prevent this you can comment out the x-org configuration found within /etc/X11/xorg.conf.d/ This will cause x-org not to see the nvidia driver and therefore it won't use it for window management

#Section "OutputClass"
#    Identifier "nvidia"
#    MatchDriver "nvidia-drm"
#    Driver "nvidia"
#    Option "AllowEmptyInitialConfiguration"
#    Option "PrimaryGPU" "no"
#    Option "SLI" "Auto"
#    Option "BaseMosaic" "on"
#EndSection

Section "OutputClass"
    Identifier "intel"
    MatchDriver "i915"
    Driver "modesetting"
EndSection

Install Llama.cpp

clone llama.cpp

git clone https://github.com/ggerganov/llama.cpp.git
Reference: https://github.com/ggerganov/llama.cpp
Compile llama.cpp with nvidia support

export PATH=$PATH:/usr/local/cuda-12.5/bin
make LLAMA_CUDA=1

Load the models

Download Mixtral 8x7B Instruct GGUF quant
https://huggingface.co/TheBloke/Mixtral-8x7B-v0.1-GGUF/blob/main/mixtral-8x7b-v0.1.Q4_K_M.gguf Reference: https://huggingface.co/TheBloke/Mixtral-8x7B-v0.1-GGUF/tree/main

Download Dolphin Starcoder2 7B quant:
https://huggingface.co/bartowski/dolphincoder-starcoder2-7b-GGUF/resolve/main/dolphincoder-starcoder2-7b-Q6_K.gguf?download=true reference: https://huggingface.co/bartowski/dolphincoder-starcoder2-7b-GGUF/tree/main

Place the models into the llama cpp models folder

Start the llama.cpp server instances

Start the Instruct Server

./llama-server --port 8080 -m models/mixtral-8x7b-instruct-v0.1.Q4_K_M.gguf -ngl 99

Start the Autocomplete Server

./llama-server --port 8081 -m models/dolphincoder-starcoder2-7b-Q6_K.gguf -ngl 99

Configure VScode

Install VScode
VSCode

Install VScode Extension "Continue"
https://github.com/continuedev/continue

Configure continue
Open the configure configuration using the vscode command Palette
Reference: https://docs.continue.dev/reference/Model%20Providers/llamacpp

{
  "models": [
    {
      "title": "Mixtral 8x7B",
      "provider": "llama.cpp",
      "model": "mistral-8x7b",
      "apiBase": "http://localhost:8080",
      "systemMessage": "You are an expert software developer. You give helpful and concise responses. if asked to write something like a function, comment or docblock wrap it in code ticks for easy copy paste"
    }
  ],
  "customCommands": [
    {
      "name": "test",
      "prompt": "{{{ input }}}\n\nWrite a comprehensive set of unit tests for the selected code. It should setup, run tests that check for correctness including important edge cases, and teardown. Ensure that the tests are complete and sophisticated. Give the tests just as chat output, don't edit any file.",
      "description": "Write unit tests for highlighted code"
    }
  ],
  "tabAutocompleteModel": {
    "title": "Dolphin Starcoder2",
    "provider": "llama.cpp",
    "model": "starcoder2:7b",
    "apiBase": "http://localhost:8081",
    "useCopyBuffer": false,
    "maxPromptTokens": 4000,
    "prefixPercentage": 0.5,
    "multilineCompletions": "always",
    "debounceDelay": 150
  },
  "allowAnonymousTelemetry": false
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published