Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[8.x] [ESS] [Serverless] Updates BYO LLM page (backport #6326) #6362

Merged
merged 2 commits into from
Dec 17, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
36 changes: 20 additions & 16 deletions docs/AI-for-security/connect-to-byo.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ This page provides instructions for setting up a connector to a large language m

This example uses a single server hosted in GCP to run the following components:

* LM Studio with the https://mistral.ai/technology/#models[Mixtral-8x7b] model
* LM Studio with the https://huggingface.co/mistralai/Mistral-Nemo-Instruct-2407[Mistral-Nemo-Instruct-2407] model
* A reverse proxy using Nginx to authenticate to Elastic Cloud

image::images/lms-studio-arch-diagram.png[Architecture diagram for this guide]
Expand All @@ -20,7 +20,7 @@ NOTE: For testing, you can use alternatives to Nginx such as https://learn.micro
[discrete]
== Configure your reverse proxy

NOTE: If your Elastic instance is on the same host as LM Studio, you can skip this step.
NOTE: If your Elastic instance is on the same host as LM Studio, you can skip this step. Also, check out our https://www.elastic.co/blog/herding-llama-3-1-with-elastic-and-lm-studio[blog post] that walks through the whole process of setting up a single-host implementation.

You need to set up a reverse proxy to enable communication between LM Studio and Elastic. For more complete instructions, refer to a guide such as https://www.digitalocean.com/community/tutorials/how-to-configure-nginx-as-a-reverse-proxy-on-ubuntu-22-04[this one].

Expand Down Expand Up @@ -74,7 +74,14 @@ server {
}
--------------------------------------------------

IMPORTANT: If using the example configuration file above, you must replace several values: Replace `<secret token>` with your actual token, and keep it safe since you'll need it to set up the {elastic-sec} connector. Replace `<yourdomainname.com>` with your actual domain name. Update the `proxy_pass` value at the bottom of the configuration if you decide to change the port number in LM Studio to something other than 1234.
[IMPORTANT]
====
If using the example configuration file above, you must replace several values:

* Replace `<secret token>` with your actual token, and keep it safe since you'll need it to set up the {elastic-sec} connector.
* Replace `<yourdomainname.com>` with your actual domain name.
* Update the `proxy_pass` value at the bottom of the configuration if you decide to change the port number in LM Studio to something other than 1234.
====

[discrete]
=== (Optional) Set up performance monitoring for your reverse proxy
Expand All @@ -85,23 +92,20 @@ You can use Elastic's {integrations-docs}/nginx[Nginx integration] to monitor pe

First, install https://lmstudio.ai/[LM Studio]. LM Studio supports the OpenAI SDK, which makes it compatible with Elastic's OpenAI connector, allowing you to connect to any model available in the LM Studio marketplace.

One current limitation of LM Studio is that when it is installed on a server, you must launch the application using its GUI before doing so using the CLI. For example, by using Chrome RDP with an https://cloud.google.com/architecture/chrome-desktop-remote-on-compute-engine[X Window System]. After you've opened the application the first time using the GUI, you can start it by using `sudo lms server start` in the CLI.
You must launch the application using its GUI before doing so using the CLI. For example, use Chrome RDP with an https://cloud.google.com/architecture/chrome-desktop-remote-on-compute-engine[X Window System]. After you've opened the application the first time using the GUI, you can start it by using `sudo lms server start` in the CLI.

Once you've launched LM Studio:

1. Go to LM Studio's Search window.
2. Search for an LLM (for example, `Mixtral-8x7B-instruct`). Your chosen model must include `instruct` in its name in order to work with Elastic.
3. Filter your search for "Compatibility Guess" to optimize results for your hardware. Results will be color coded:
* Green means "Full GPU offload possible", which yields the best results.
* Blue means "Partial GPU offload possible", which may work.
* Red for "Likely too large for this machine", which typically will not work.
2. Search for an LLM (for example, `Mistral-Nemo-Instruct-2407`). Your chosen model must include `instruct` in its name in order to work with Elastic.
3. After you find a model, view download options and select a recommended version (green). For best performance, select one with the thumbs-up icon that indicates good performance on your hardware.
4. Download one or more models.

IMPORTANT: For security reasons, before downloading a model, verify that it is from a trusted source. It can be helpful to review community feedback on the model (for example using a site like Hugging Face).

image::images/lms-model-select.png[The LM Studio model selection interface]

In this example we used https://huggingface.co/TheBloke/Mixtral-8x7B-Instruct-v0.1-GGUF[`TheBloke/Mixtral-8x7B-Instruct-v0.1.Q3_K_M.gguf`]. It has 46.7B total parameters, a 32,000 token context window, and uses GGUF https://huggingface.co/docs/transformers/main/en/quantization/overview[quanitization]. For more information about model names and format information, refer to the following table.
In this example we used https://huggingface.co/mistralai/Mistral-Nemo-Instruct-2407[`mistralai/Mistral-Nemo-Instruct-2407`]. It has 12B total parameters, a 128,000 token context window, and uses GGUF https://huggingface.co/docs/transformers/main/en/quantization/overview[quanitization]. For more information about model names and format information, refer to the following table.

[cols="1,1,1,1", options="header"]
|===
Expand All @@ -124,18 +128,18 @@ After downloading a model, load it in LM Studio using the GUI or LM Studio's htt
[discrete]
=== Option 1: load a model using the CLI (Recommended)

It is a best practice to download models from the marketplace using the GUI, and then load or unload them using the CLI. The GUI allows you to search for models, whereas the CLI only allows you to import specific paths, but the CLI provides a good interface for loading and unloading.
It is a best practice to download models from the marketplace using the GUI, and then load or unload them using the CLI. The GUI allows you to search for models, whereas the CLI allows you to use `lms get` to search for models. The CLI provides a good interface for loading and unloading.

Use the following commands in your CLI:
Once you've downloaded a model, use the following commands in your CLI:

1. Verify LM Studio is installed: `lms`
2. Check LM Studio's status: `lms status`
3. List all downloaded models: `lms ls`
4. Load a model: `lms load`
4. Load a model: `lms load`.

image::images/lms-cli-welcome.png[The CLI interface during execution of initial LM Studio commands]

After the model loads, you should see a `Model loaded successfully` message in the CLI.
After the model loads, you should see a `Model loaded successfully` message in the CLI.

image::images/lms-studio-model-loaded-msg.png[The CLI message that appears after a model loads]

Expand All @@ -156,8 +160,8 @@ Refer to the following video to see how to load a model using LM Studio's GUI. Y
<img
style="width: 100%; margin: auto; display: block;"
class="vidyard-player-embed"
src="https://play.vidyard.com/FMx2wxGQhquWPVhGQgjkyM.jpg"
data-uuid="FMx2wxGQhquWPVhGQgjkyM"
src="https://play.vidyard.com/c4AxH8d9tWMnwNp5J6bcfX.jpg"
data-uuid="c4AxH8d9tWMnwNp5J6bcfX"
data-v="4"
data-type="inline"
/>
Expand Down
Binary file modified docs/AI-for-security/images/lms-cli-welcome.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/AI-for-security/images/lms-model-select.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/AI-for-security/images/lms-ps-command.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/AI-for-security/images/lms-studio-model-loaded-msg.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.