Rathe

A tiny library for working with language model prompts.

Contains routines for formatting instructional and chat-based prompts into the formats expected by a variety of large language models, including but not limited to:

Alpaca
Open Assistant
Vicuna
Pygmalion

Also contains code for transforming various LLM datasets into a common representation.

Examples

For training

from rathe import ChatPrompt, ChatMessage, MessageSender, ChatPromptFormatter, TokenizationOptions

chat = ChatPrompt(
    messages=[
        ChatMessage(MessageSender.human, "hello it is me the user, i seek a boon"),
        ChatMessage(MessageSender.model, "name your desire, fleshling"),
    ]
)

formatter = ChatPromptFormatter.vicuna()
result = formatter.format(chat, special_tokens={"eos_token": "</s>"})
tokenized = result.to_tokens(tokenizer=..., options=TokenizationOptions(generate_labels=False))

For inference

from rathe import AlpacaPromptFormatter, InstructPrompt

instruction = InstructPrompt(
    "In a few lines of concise proof, demonstrate that the non-trivial "
    "zeros of the Riemann zeta function have real part 1/2.",
    output=None,
)
formatter = AlpacaPromptFormatter()
result = formatter.format(instruction, special_tokens={"eos_token": "</s>"})
print(result.to_string())

For use with `transformers` and `datasets` libraries

import transformers
import datasets
from rathe import GenericInstructParser, ChatPromptFormatter
from rathe.pipeline import DataPipeline

parser = rathe.GenericInstructParser.dolly()
formatter = rathe.ChatPromptFormatter.vicuna()
tokenizer = transformers.LlamaTokenizer.from_pretrained("huggyllama/llama-7b")
pipeline = DataPipeline(parser, formatter, tokenizer)

dataset = datasets.load_dataset("databricks/databricks-dolly-15k")
tokenized = dataset.map(pipeline, remove_columns=dataset['train'].column_names)

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
rathe		rathe
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md
ruff.toml		ruff.toml
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Rathe

Examples

For training

For inference

For use with `transformers` and `datasets` libraries

About

Releases

Packages

Languages

License

cg123/rathe

Folders and files

Latest commit

History

Repository files navigation

Rathe

Examples

For training

For inference

For use with transformers and datasets libraries

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

For use with `transformers` and `datasets` libraries

Packages