Mini Language Model

Mamba

Implementing Mamba SSM instead of attention heads into a mini language model. For my code to run correctly, it must be ran on a Linux environment. I have tested it and it also functions correctly within a WSL2 environment.

"llm_mamba_train.py" Trains a mini language model (25M parameters) on the open domain works of Sherlock Holmes and saves resulting LLM model. Training should run on a 8GB VRAM GPU.
"llm_mamba_use.py" Uses trained LLM model to generate new tokens and saves it to "output.txt".

Adapter

Similar code structure as Mamba. Implements architecture for incorporating parallel adapter layers into a transformer. Adapters are particularly useful when dealing with large pre-trained language models as they allow us to leverage the knowledge captured in these large models, while only needing to train a relatively small number of parameters.

"add_adapters.py" Loads state dictionary of trained LLM model, freezes original parameters and adds adapter layers.

Mistral

Contains a very simple use case of a quantized Mistral-7B model. Also experimenting with vector similarity and simple RAG implementation.

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
adapter		adapter
mamba		mamba
mistral		mistral
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Mini Language Model

Mamba

Adapter

Mistral

About

Releases

Packages

Languages

pabloiyu/mini-language-model

Folders and files

Latest commit

History

Repository files navigation

Mini Language Model

Mamba

Adapter

Mistral

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages