Skip to content

Latest commit

 

History

History

10-byom-oss-llm-ai-core

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 

Bringing Open-sourced LLMs into SAP AI Core

The open-source community surrounding Large Language Models (LLMs) is evolving rapidly, with new models, backends, libraries, and toolings constantly emerging. These developments enable the running of LLMs locally or on self-hosted environments. SAP AI Core is a service in the SAP Business Technology Platform that is designed to handle the execution and operations of your AI assets in a standardized, scalable, and hyperscaler-agnostic way. This repository serves as a guide on how to bring popular open-source Large Language Models (such as LLaMa 3, Phi-3, Mistral, Mixtral, LlaVA, Gemma, etc.) and open-source Text Embedding Models into SAP AI Core using widely adopted open-source LLM tools or backends, which complements SAP Generative AI Hub with self-hosted open-source LLMs.


Please refer to the blog post about Bring Open-Source LLMs into SAP AI Core for details.

Why running open-sourced LLMs with SAP AI Core?

  • Data Protection & Privacy
  • Security
  • Cost-effectiveness
  • Flexibility of choices for LLMs and LLM backends etc.
  • Making open-sourced LLMs enterprise ready

Solution Architecture

Solution Architecture In principle, there are three essential parts for bringing an open-source LLM/LMM into SAP AI Core.

  • Commercially viable Open-Source or Open-Weight Models: e.g. Mistral, Mixtral, LLaVa etc.
  • Public accessible Model Hub: For instance, Ollama Model Library tailored for Ollama, Hugging Face as a general purposed model repository.
  • Inference server in SAP AI Core: You can bring your own code to implement an inference server, for example,Custom Inference Server with Hugging Face Transformers Library. Alternatively, there are open-source and ready-to-use llm inference servers which can be reused in SAP AI Core, like Ollama, LocalAI, llama.cpp and vLLM with minimal custom code as a custom Dockerfile and configurable serving template adapted for SAP AI Core. Ollama is recommended for its simplicity and efficiency.

Why leverage Ollama, LocalAI, llama.cpp and vLLM in SAP AI Core?

Ollama, LocalAI, llama.cpp and vLLM offer a comprehensive solution for running Large Language Models (LLMs) locally or in self-hosted environments. Their full stack capabilities include:

  • Model Management: Dynamically pull or download LLMs from model repository through API during run-time (exclusive to Ollama and LocalAI. vLLM provides seamless integration with Hugging Face models)
  • Running LLM efficiently with GPU Acceleration in SAP AI Core using open-source backends such as llama.cpp, vllm, transformer, exllama etc.
  • Serving with OpenAI-compatible chat completions and embedding APIs.
  • Easy deployment and setup without the need of without the need for custom code deployment in SAP AI Core.
  • Commercially viability: They are all under MIT licenses or Apache 2.0 License.

<style> table { width: 100%; } td:nth-child(1) { width: 20%; } td:nth-child(2), td:nth-child(3) { width: 40%; } </style>

Ollama vs LocalAI in context of SAP AI Core

Ollama
Ollama
LocalAI
LocalAI
Description "Ollama: Get up and running with Llama 2, Mistral, Gemma, and other large language models." "LocalAI is the free, Open Source OpenAI alternative. LocalAI act as a drop-in replacement REST API that’s compatible with OpenAI API specifications for local inference..."
Recommendation Recommended if just inference LLMs/LMMs in SAP AI Core. See its AI capabilities below for detail. Recommended if speech recognition, speech generation and image generation are also required apart from LLMs/LMMs.
AI Capabilities -Text generation
-Vision
-Text Embedding
-Text generation
-Vision
-Text Embedding
-Speech to Text
-Text to Speech
-Image Generation
Installation & Setup Easy installation and setup Assure to use the corresponding docker image or build with the right variables for GPU acceleration.
GPU Acceleration Automatically detect and apply GPU Supported. Require configuration on model
Model Management Easy built-in model management through CLI command or APIs Experimental model gallery
May require additional configuration for GPU acceleration per model
Supported Backends llama.cpp multi-backend support and backend agnostic. Default backend as llama.cpp, also support extra backends such as vLLM, rwkv, huggingface transformer, bert, whisper.cpp etc. Please check its model compatibility table for details
Supported Models Built-in Model Library Experimental Model Gallery
Model Switching Seamless model switching with automatic memory management Supported
APIs -Model Management API
-OpenAI-compatible chat/complemtions API
-Embedding API
-Model Management API
-Text Generation API
-OpenAI-compatible chat/complemtions API
-Embedding API
Model Customization supported supported
License MIT MIT

llama.cpp vs vLLM in context of SAP AI Core

llama.cpp
llama.cpp
vLLM
vLLM
Description "The main goal of llama.cpp is to enable LLM inference with minimal setup and state-of-the-art performance on a wide variety of hardware - locally and in the cloud." "A high-throughput and memory-efficient inference and serving engine for LLMs"
Recommendation Recommended for private custom LLMs or fine tuning model. Recommended for private custom LLMs or fine tuning models.
AI Capabilities -Text generation
-Vision
-Text Embedding
-Text generation
-Vision
-Text Embedding
Deployment & Setup Easy deployment via docker. Many arguments to explore on starting llama.cpp server Easy deployment via docker. Many engine arguments on starting vllm.entrypoints.openai.api_server
GPU Acceleration Supported Supported
Model Management Not supported. Need external tool(wget etc) to download models from Hugging Face Seamless integration with popular HuggingFace models
Supported Quantization 1.5-bit, 2-bit, 3-bit, 4-bit, 5-bit, 6-bit, and 8-bit integer quantization GPTQ, AWQ, SqueezeLLM, FP8 KV Cache
Supported Models https://github.com/ggerganov/llama.cpp > Supported models Supported Model
Model Switching Not supported. One deployment for one model. Not supported. One deployment for one model.
APIs -OpenAI-compatible chat/complemtions API
-Embedding API
-OpenAI-compatible chat/complemtions API
-Embedding API
License MIT Apache-2.0 license

How to bring open-sourced LLMs into SAP AI Core

In the following section, we see how to bring open-sourced LLMs into SAP AI Core with Ollama, LocalAI, llama.cpp and vLLM.

Prerequisites

The following softwares are required to serve an AI model in SAP AI Core. Please follow this tutorial to provision and set up your own SAP AI Core if it is new to you, which will cover the list below.

Important: Please assure to entitle Standard Plan or Extended Plan of SAP AI Core, which require a BTPEA, or Pay-As-You-Go contract. Please refer to pricing of SAP AI Core for detail. Due to restriction of Free Tier service plan. We can't run the open-source llms with Free Tier plan. Please refer to the official document about Resource Plan in SAP AI Core for detail.

For the Free Tier service plan, only the Starter resource plan is available. Specifying other plans will result in error. For the Standard service plan, all resource plans are available. For more information, see Free Tier and Service Plans.

In this sample, SAP AI Launchpad is optional, and only used to show and check the results. All the configurations like create resource group, docker registry secret, github repo onboarding, application, configuration and deployment etc. are automated through SAP AI Core SDK. However, it is still recommended to have SAP AI Launchpad for more user-friendly graphical cockpit to perform administration tasks, especially if you are new to SAP AI Core.

Please skip if you have previously completed the initial configurations for your SAP AI Core.

Please skip if you have done it before.
Only take the steps about Generate a GitHub personal access token, which will be used to onboard the github repository into SAP AI Core afterwards.

4. Install Docker Desktop and create a personal Docker Registry

Please skip if you have done it before.
Instructions can be found here, Step 1 to 4. We recommend you to create an access token to be used in place of your password. Instructions on how to generate a token can be found here.

5. Install Git and Visual Studio Code(Optional)

  • Install Git by following the instructions here.
  • Download and Install Visual Studio Code by following instructions here

Fork with this url into your own github account. Set your forked repository to private to prevent from public access.

7.Clone your forked repository

git clone <YOUR_FORKED_REPOSITORY_URL> 

8.Setup a local Python3 environment

  • Download and Install Python3(>=3.7) in your local environment from here or other approaches.
  • Create a virtual environment and install the dependencies
# Create a virtual env and install the dependencies 
cd btp-generative-ai-hub-use-cases/10-byom-oss-llm-ai-core
python3 -m venv oss-llm-env
source oss-llm-env/bin/activate
pip3 install -r byom-oss-llm-code/requirements.txt

Perform the initial configurations for byom-oss-llm-ai-core application in SAP AI Core

Please follow and run the jupyter notebook 00-init-config.ipynb to perform the initial configurations for byom-oss-llm-ai-core application in SAP AI Core. To run the jupyter notebook, you can use either of the options below

# Start the JupyterLab
jupyter lab

Option 1: (Recommended) Bring open-sourced LLMs into SAP AI Core with Ollama

Please refer to this blog post about Bring Open-Source LLMs into SAP AI Core with Ollama for more details.

Please follow the jupyter notebooks below to deploy and test Ollama in SAP AI Core.

Option 2: Bring open-sourced LLMs into SAP AI Core with LocalAI

Please follow the jupyter notebooks below to deploy and test LocalAI in SAP AI Core.

Option 3: Bring open-sourced LLMs into SAP AI Core with llama.cpp

Please follow the jupyter notebooks below to deploy and test llama.cpp in SAP AI Core.

Option 4: Bring open-sourced LLMs into SAP AI Core with vLLM

Please follow the jupyter notebooks below to deploy and test vLLM in SAP AI Core.

Option 5: Bring open-sourced LLMs into SAP AI Core with a Custom Inference Server with Hugging Face Transformer Library

Please follow the jupyter notebooks below to deploy and test Custom Transformer Server in SAP AI Core.

License

Copyright (c) 2024 SAP SE or an SAP affiliate company. All rights reserved. This project is licensed under the Apache Software License, version 2.0 except as noted otherwise in the LICENSE file.