Release v3.0.0
This release marks public availability for the SHARK AI project, with a focus on serving the Stable Diffusion XL model on AMD Instinct™ MI300X Accelerators.
Highlights
shark-ai
The shark-ai package is the recommended entry point to using the project. This meta package includes compatible versions of all relevant sub-projects.
shortfin
The shortfin sub-project is SHARK's high performance inference library and serving engine.
Key features:
- Fast inference using ahead of time model compilation powered by IREE
- Throughput optimization via request batching and support for flexible device topologies
- Asynchronous execution and efficient threading
- Example applications for supported models
- APIs available in Python and C
- Detailed profiling support
For this release, shortfin uses precompiled programs built by the SHARK team using the sharktank sub-project. Future releases will streamline the model conversion process, add user guides, and enable adventurous users to bring their own custom models.
Current shortfin system requirements:
- Python 3.11+
- An AMD Instinct™ MI300X Accelerator
- A compatible version of Linux and ROCm (see the ROCm compatability matrix)
Serving Stable Diffusion XL (SDXL) on MI300X
See the user guide for the latest instructions.
To serve the Stable Diffusion XL model, which generates output images given input text prompts:
# Set up a Python virtual environment.
python -m venv .venv
source .venv/bin/activate
# Optional: faster installation of torch with just CPU support.
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu
# Install shark-ai, including extra dependencies for apps.
pip install shark-ai[apps]
# Start the server then wait for it to download artifacts.
python -m shortfin_apps.sd.server \
--device=amdgpu --device_ids=0 --topology="spx_single" \
--build_preference=precompiled
# (wait for setup to complete)
# INFO - Application startup complete.
# INFO - Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
# Run the interactive client, sending text prompts and receiving generated images back.
python -m shortfin_apps.sd.simple_client --interactive
# Enter a prompt: a single cybernetic shark jumping out of the waves set against a technicolor sunset
# Sending request with prompt: ['a single cybernetic shark jumping out of the waves set against a technicolor sunset']
# Sending request batch # 0
# Saving response as image...
# Saved to gen_imgs/shortfin_sd_output_2024-11-15_16-30-30_0.png
Roadmap
This release is just the start of a longer journey. The SHARK platform is fully open source, so stay tuned for future developments. Here is a taste of what we have planned:
- Support for a wider range of ML models, including popular LLMs
- Performance improvements and optimized implementations for supported models across a wider range of devices
- Integrations with other popular frameworks and APIs
- General availability and user guides for the sharktank model development toolkit