Skip to content

teilomillet/inferencemax

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

inferencemax

heyheyhey

inferencemax is a flexible library for text generation using the max engine. It provides a streamlined workflow for loading, exporting, and running inference on various model architectures.

Project Structure

inferencemax/
├── data/
│   ├── __init__.py
│   ├── export.py
│   ├── hf.py
│   ├── load.py
│   └── onnx.py
├── utils/
│   ├── __init__.py
│   ├── decorators.py
│   └── logger.py
├── __init__.py
├── generator.py
├── initializer.py
├── kv_cache.py
├── sampler.py
├── text_generation.py
└── tokenizer.py

Key Features

  • Support for loading models from HuggingFace and ONNX formats
  • Efficient model export to ONNX format
  • Customizable text generation pipeline
  • KV-cache support for improved inference speed
  • Flexible sampling strategies (temperature, top-k)
  • Comprehensive logging and timing decorators

Usage

Here's a basic example of how to use InferenceMax:

from inferencemax.data.load import load_model, load_tokenizer
from inferencemax.text_generation import generate_text

# Load model and tokenizer
model_path = "path/to/your/model"
model = load_model(model_path)
tokenizer = load_tokenizer(model_path)

# Generate text
input_text = "Once upon a time"
generated_text = generate_text(model, tokenizer, input_text)

print(generated_text)

CLI Usage

InferenceMax also provides a command-line interface for easy text generation:

python cli.py --model_path "path/to/your/model" --input_text "Once upon a time" --max_new_tokens 50

Configuration

You can customize the generation parameters using a YAML configuration file:

max_new_tokens: 50
temperature: 0.8
top_k: 40

Then use it with the CLI:

python cli.py --model_path "path/to/your/model" --input_text "Once upon a time" --config_path "path/to/config.yaml"

Contributing

Contributions are welcome! Please feel free to submit a Pull Request. This is not aim to replace vLLM or something else but rather a place to learn and test things.

License

This project is licensed under the terms of the LICENSE file in the root directory.

Releases

No releases published

Packages

No packages published

Languages