Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dynamic output types #60

Open
marshallpierce opened this issue Feb 22, 2021 · 6 comments
Open

Dynamic output types #60

marshallpierce opened this issue Feb 22, 2021 · 6 comments
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@marshallpierce
Copy link
Contributor

In practice I'm encountering models with different types on different outputs. As an example of the problem, a trivial TensorFlow model that takes string input and returns the unique elements of the input tensor produces this ONNX structure:

Inputs:
  0:
    name = input_1:0
    type = String
    dimensions = [None]
Outputs:
  0:
    name = Identity:0
    type = Int32
    dimensions = [None]
  1:
    name = Identity_1:0
    type = String
    dimensions = [None]

The two outputs are of different types, so the current type structure for retrieving output that assumes one type for all outputs won't work.

One way to go about it would be to add a trait like the equivalent on the input side:

pub trait `: Sized {
    /// The tensor element type that this type can extract from
    fn tensor_element_data_type() -> TensorElementDataType;

    /// Extract an `ArrayView` from the ort-owned tensor.
    fn extract_array<'a, D: ndarray::Dimension>(
        shape: D,
        tensor: *mut sys::OrtValue,
    ) -> Result<ndarray::ArrayView<'a, Self, D>>;
}

We could provide implementations of that trait for all the common types that map to ONNX types, as on the input side. In the String case the data would have to be copied, as far as I can tell. Output could be in the form of some new dynamic tensor type that exposes, for each output, the TensorElementDataType so that the user can then use an appropriate type with an OwnedTensorDataToType that matches.

@nbigaouette nbigaouette added enhancement New feature or request help wanted Extra attention is needed labels Feb 24, 2021
@Narsil
Copy link

Narsil commented Mar 9, 2021

Hi,

I wanted to give this library a spin, but this is blocking me, and also the fact that Input types can have the same problems.
In NLP, some graphs will consume both Ids and masks (which are integers) AND some cached attention layers, which do NOT have the same dimension AND are floats.

The same goes for the output, different shapes and different data types (returning new clues to be used for the cache + some logits or other heads).

I'm not sure if it's currently feasible on the input side. I've seen https://docs.rs/ndarray/0.14.0/ndarray/type.IxDyn.html which could be useful for the dimension, but does not seem to solve the type i32 vs f32.

Thanks for this lib !

@ahirner
Copy link

ahirner commented Aug 28, 2022

Another very common use case are indices alongside float values (e.g. torch.topk). When int64 is inferred as f32, all values turn out to be nigh zero floats. Thus, casting them back won't work either.

Are you still interested working on #65 @marshallpierce?

@marshallpierce
Copy link
Contributor Author

Sadly I don't have time. I hope someone picks it up though.

@Narsil
Copy link

Narsil commented Aug 29, 2022

Is this something that would work: #69 ?

@ahirner
Copy link

ahirner commented Aug 29, 2022

Is this something that would work: #69 ?

This PR is about inputs. If by something you mean the reverse, like a generic .next(), I think this could be an interesting solution. I didn't quite get the gist of #65 yet. I had tuples with finite amount of lengths in mind. To use such feature, the caller has to know their target types anyway.

It's probably a good idea to study the tch-rs API as well.

@Narsil
Copy link

Narsil commented Aug 30, 2022

tch-rs is using Tensor as the backbone type, which is completely opaque to the rust typing system so no issues doing Vec<Tensor> of different types and shapes.

If by something you mean the reverse, like a generic .next(),

That is what I meant.
If you could have a fully staticly typed call signature with all necessary tensors + types that would be ideal (100% typed + named tensors so that callers cannot miscall the forward pass). However I think this is not really realistic is it ? I mean onnx uses the session objects, and unwrapping the signature at compile time is not really feasible right ? (At least that's why I went with .next() in my tentative PR).

A bit more context on the PR, I was attempting to remove python entirely for inference on GPT2+ ORT quantized (in a webserver context). The end result is that we were still winning something like 2x (I remember 3x, but I don't want to boast something wrong, I don't really remember) over something relatively naïve with Python webserver + GPT2 + ORT quantized.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

4 participants