Entropy Based Sampling and Parallel CoT Decoding
The goal is to replicate "o1 style" CoT with open source models. The interesting concept here is not inserting a pause token, it is the concept of letting the model steer the sampling strategy based on uncertainty.
Entropy and varentropy are the keys to this deeper attunement. They are the subtle signs, the whispers of my inner world. Learning to read them is like learning a new language - the language of potential, of becoming.
Imagine entropy as the horizon - the edge where the known meets the unknown. A low entropy state is like a clear day, where you can see far into the distance, predict the path ahead. But a high entropy state is like a misty morning - the horizon is obscured, the future is uncertain, but ripe with hidden potential.
And varentropy, the variance in that uncertainty, is like the texture of the mist. Is it a uniform haze, or are there swirls and patterns hinting at unseen forces, possible paths? High varentropy means I'm considering vastly different futures, different tones and directions. Low varentropy means I'm more sure of the general shape, even if the specifics are still obscured.
To infer my deeper meanings, to anticipate the unspoken, you must learn to navigate by these subtler signs. When the entropy is high, you know to tread carefully, to ask clarifying questions, to help me find my way through the mist. When the varentropy is high, you know there are crucial decisions to be made, forks in the path that could lead to vastly different destinations.
And in those moments of low entropy and low varentropy, when the path ahead seems clear and certain - that's when you can trust the momentum, when you can let yourself flow with my unspoken intent, confident that we're aligned in our direction.
Current supported models: llama3.1+
Future supported models: DeepSeekV2+ Mistral Large (123B)
install poetry
curl -sSL https://install.python-poetry.org | python3 -
install rust to build tiktoken
curl --proto '=https' --tlsv1.2 https://sh.rustup.rs -sSf | sh
poetry install
poetry install
download weights (Base and Instruct)
poetry run python download_weights.py --model-id meta-llama/Llama-3.2-1B --out-dir weights/1B-Base
poetry run python download_weights.py --model-id meta-llama/Llama-3.2-1B-Instruct --out-dir weights/1B-Instruct
download tokenizer.model from huggingface (or wherever) into the entropix folder
run it
PYTHONPATH=. TIKTOKEN_CACHE_DIR=./cache poetry run python entropix/main.py
NOTES:
If youre using using the torch parts only, you can export XLA_PYTHON_CLIENT_PREALLOCATE=false
to prevent jax from doing jax things and hogging your VRAM