Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request for Hidden States Access in Phi-3 with ONNX Runtime #20969

Closed
ajliouat opened this issue Jun 7, 2024 · 7 comments
Closed

Request for Hidden States Access in Phi-3 with ONNX Runtime #20969

ajliouat opened this issue Jun 7, 2024 · 7 comments
Assignees
Labels
documentation improvements or additions to documentation; typically submitted using template

Comments

@ajliouat
Copy link

ajliouat commented Jun 7, 2024

Describe the documentation issue

Hi,

I am using the Phi-3 LLM on the ONNX runtime and noticed the API lacks a method to access the hidden states. Could you inform me how to access these states, or if there will be an update to include this functionality?

Thank you for your help.

Best,
Abdeljalil

Page / URL

No response

@ajliouat ajliouat added the documentation improvements or additions to documentation; typically submitted using template label Jun 7, 2024
@tianleiwu
Copy link
Contributor

@ajliouat, you can use onnx API to modify the graph to add hidden state to graph output. @kunal-vaishnavi, is it possible to add an option to model builder to output hidden state?

@kunal-vaishnavi
Copy link
Contributor

You can generate an ONNX model that outputs the hidden states using ONNX Runtime GenAI's model builder with --extra_options exclude_lm_head=true.

If you already have the PyTorch model saved on disk:

# From wheel:
python3 -m onnxruntime_genai.models.builder -i path_to_local_folder_on_disk -o path_to_output_folder -p precision -e execution_provider -c cache_dir_to_store_temp_files --extra_options exclude_lm_head=true

# From source:
python3 builder.py -i path_to_local_folder_on_disk -o path_to_output_folder -p precision -e execution_provider -c cache_dir_to_store_temp_files --extra_options exclude_lm_head=true

If you do not have the PyTorch model saved on disk:

# From wheel:
python3 -m onnxruntime_genai.models.builder -m model_name -o path_to_output_folder -p precision -e execution_provider -c cache_dir_to_store_temp_files --extra_options exclude_lm_head=true

# From source:
python3 builder.py -m model_name -o path_to_output_folder -p precision -e execution_provider -c cache_dir_to_store_temp_files --extra_options exclude_lm_head=true

@tianleiwu
Copy link
Contributor

@kunal-vaishnavi, the option -extra_options exclude_lm_head=true only outputs last hidden state but not logits. Is there option to output both logits and last hidden state?

@pacman100
Copy link

pacman100 commented Aug 23, 2024

Hello @kunal-vaishnavi, I followed the above steps but it doesn't work:

  1. Convert the model with --extra_options exclude_lm_head=true:
python3 -m onnxruntime_genai.models.builder -m microsoft/Phi-3.5-mini-instruct -o phi_onnx_embed -p int4 -e cpu -c cache --extra_options  exclude_lm_head=true
  1. I don't find hidden_states in genai_config.json:
{
    "model": {
        "bos_token_id": 1,
        "context_length": 131072,
        "decoder": {
            "session_options": {
                "log_id": "onnxruntime-genai",
                "provider_options": []
            },
            "filename": "model.onnx",
            "head_size": 96,
            "hidden_size": 3072,
            "inputs": {
                "input_ids": "input_ids",
                "attention_mask": "attention_mask",
                "past_key_names": "past_key_values.%d.key",
                "past_value_names": "past_key_values.%d.value"
            },
            "outputs": {
                "logits": "logits",
                "present_key_names": "present.%d.key",
                "present_value_names": "present.%d.value"
            },
            "num_attention_heads": 32,
            "num_hidden_layers": 32,
            "num_key_value_heads": 32
        },
        "eos_token_id": [
            32007,
            32001,
            32000
        ],
        "pad_token_id": 32000,
        "type": "phi3",
        "vocab_size": 32064
    },
    "search": {
        "diversity_penalty": 0.0,
        "do_sample": false,
        "early_stopping": true,
        "length_penalty": 1.0,
        "max_length": 131072,
        "min_length": 0,
        "no_repeat_ngram_size": 0,
        "num_beams": 1,
        "num_return_sequences": 1,
        "past_present_share_buffer": true,
        "repetition_penalty": 1.0,
        "temperature": 1.0,
        "top_k": 1,
        "top_p": 1.0
    }
}
  1. If I rename logits to hidden_states, og.Model(folder) fails with RuntimeError: Error encountered while parsing 'phi_onnx_embed/genai_config.json' JSON Error: Unknown value: hidden_states at line 20 index 49
  2. Post this, I am still very confused about what to run to get the hidden_states.

Expected Behaviour:

  1. Get final layer hidden states for phi3 models.

@kunal-vaishnavi
Copy link
Contributor

@kunal-vaishnavi, the option -extra_options exclude_lm_head=true only outputs last hidden state but not logits. Is there option to output both logits and last hidden state?

The option doesn't currently exist but we can add it.

@kunal-vaishnavi
Copy link
Contributor

Hello @kunal-vaishnavi, I followed the above steps but it doesn't work:

If you open the ONNX model saved to disk, you will have hidden_states as the first output instead of logits. While the ONNX model can be run with just ONNX Runtime to get the hidden_states, the ONNX model won't run in ONNX Runtime GenAI for two reasons: 1) As you mentioned, the genai_config.json file does not currently have hidden_states in the outputs dictionary. This gap is intentional because 2) ONNX Runtime GenAI assumes that logits will be outputted. This is because the generation loop (sampling, searching, etc.) that ONNX Runtime GenAI implements is done on logits and not hidden_states.

This can be fixed in ONNX Runtime GenAI so that both the logits and the last hidden_states are outputted from the ONNX model. Then hidden_states = generator.get_output('hidden_states') during your generation loop.

kunal-vaishnavi added a commit to microsoft/onnxruntime-genai that referenced this issue Dec 11, 2024
### Description

This PR adds support for outputting the last hidden state in addition to
the logits in ONNX models. Users can run their models with ONNX Runtime
GenAI and use the generator's `GetOutput` API to obtain the hidden
states.

C/C++:
```c
std::unique_ptr<OgaTensor> embeddings = generator->GetOutput("hidden_states");
```

C#:
```csharp
using var embeddings = generator.GetOutput("hidden_states");
```

Java:
```java
Tensor embeddings = generator.getOutput("hidden_states");
```

Python:
```python
embeddings = generator.get_output("hidden_states")
```

### Motivation and Context

In SLMs and LLMs, the last hidden state represents a model's embeddings
for a particular input before the language modeling head is applied.
Generating embeddings for a model is a popular task. These embeddings
can be used for many scenarios such as text classification, sequence
labeling, information retrieval using [retrieval-augmented generation
(RAG)](https://en.wikipedia.org/wiki/Retrieval-augmented_generation),
and more.

This PR helps the following issues:
- microsoft/onnxruntime#20969
- #442
- #474
- #713
@kunal-vaishnavi
Copy link
Contributor

The hidden states are now accessible via ONNX Runtime GenAI. You can create the ONNX model to output the hidden states using ONNX Runtime GenAI's model builder with --extra_options include_hidden_states=true.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation improvements or additions to documentation; typically submitted using template
Projects
None yet
Development

No branches or pull requests

4 participants