[P1] Compatibility with tooling that expects a HF transformer model #36

chris-aeviator · 2024-04-08T12:57:20Z

I'm raising the issue that in terms of "production readyness" (statet goal) pyreft, designed as a very thoughtful library, will need to work together with tooling that expects a loadable vanilla transformer model. A real world reproducible example is loading a pyvene trained model with https://github.com/outlines-dev/outlines in order to create structured json/ schema following outputs.

While the model can be accessed via pyref_model.model - it is not loadable, and in any case one tool would miss the other's functionality when loaded this way. What would be a advisable strategy to integrate with other tooling? May I suggest also different backend engines (e.g. vllm, ollama, llama.cpp) will need to have have interfaces to pyreft. Maybe I'm overseeing some documentation here but I'm unsure how to proceed.

Is merging a pyvene intervention into the base model possible or is pyvene/pyreft more of an active component that will require code changes in any case?

aryamanarora · 2024-04-08T21:05:52Z

Hey! So:

We got similar questions on Twitter about accelerating inference with different backends (vllm, mlx, etc.) Currently, pyvene is a major dependency for which no alternative exists: it manages the torch hooks that are used to intervene on hidden representations at the token-level in pyreft. To enable support for non-HF and/or non-torch models, we would need to replicate some pyvene functionality. We have thought about how to do this simply without needing to port pyvene entirely¹, but it's a long-term software engineering task that we don't immediately have the time/resources/people for. Maybe in the summer once pyreft is known to be stable for a variety of models + tasks, we will invest time into this.
The LoReFT intervention can't be merged into the base model for two reasons. (1) It is a complex function applied directly to the hidden state, so it operates differently than existing model components (which add to the hidden state via residuals) and so can't be folded into them as far as we can tell. (2) It operates only on some tokens, not all, but model weights are the same for every token.

So overall, using LoReFT in a model requires either torch-style hooking functionality or code changes to the model to support token-level interventions.

E.g. we could just load pyvene for the KV-cache population when processing the prompt, and then use the efficient backend for generation. But in the future, we want to support intervention on decoding steps as well which is messier. ↩

frankaging · 2024-04-08T21:14:58Z

assigning with P1 since there is no blocker.

chris-aeviator · 2024-04-10T20:55:51Z

an elegant solution could be providing an import AutoModel from pyreft that encapsulates the hooks while preserving compatibility with other libraries. Is this on a high level possible? If so, I'd be willing to contribute , my interest here lies also in supporting high troughput vllm and per request model switching, both possible with vllm already. They just loads a HF AutoModel in the end.

aryamanarora added the question Further information is requested label Apr 8, 2024

frankaging changed the title ~~Compatibility with tooling that expects a HF transformer model~~ [P1] Compatibility with tooling that expects a HF transformer model Apr 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[P1] Compatibility with tooling that expects a HF transformer model #36

[P1] Compatibility with tooling that expects a HF transformer model #36

chris-aeviator commented Apr 8, 2024 •

edited

Loading

aryamanarora commented Apr 8, 2024 •

edited

Loading

frankaging commented Apr 8, 2024

chris-aeviator commented Apr 10, 2024 •

edited

Loading

[P1] Compatibility with tooling that expects a HF transformer model #36

[P1] Compatibility with tooling that expects a HF transformer model #36

Comments

chris-aeviator commented Apr 8, 2024 • edited Loading

aryamanarora commented Apr 8, 2024 • edited Loading

Footnotes

frankaging commented Apr 8, 2024

chris-aeviator commented Apr 10, 2024 • edited Loading

chris-aeviator commented Apr 8, 2024 •

edited

Loading

aryamanarora commented Apr 8, 2024 •

edited

Loading

chris-aeviator commented Apr 10, 2024 •

edited

Loading