Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

C++ ONNX parser performance issues #754

Open
MatthewDaggitt opened this issue Feb 16, 2024 · 18 comments
Open

C++ ONNX parser performance issues #754

MatthewDaggitt opened this issue Feb 16, 2024 · 18 comments

Comments

@MatthewDaggitt
Copy link
Collaborator

Follow up from #744 (see the example network).

The C++ parser is currently slower than the ONNX one because my crude implementation of multi-dimensional tensors is not as elegant as those in numpy. In particular the two offending operations are transpose and

TensorShape transposedShape = transposeVector( shape, permutation );
Vector<T> result;
for ( PackedTensorIndices rawOutputIndex = 0; rawOutputIndex < tensor.size(); rawOutputIndex++ )
{
TensorIndices outputIndex = unpackIndex( transposedShape, rawOutputIndex );
TensorIndices inputIndex = transposeVector( outputIndex, permutation );
T value = tensorLookup( tensor, shape, inputIndex );
result.append( value );
}
return result;

and the all the lookups and broadcasting done here:

for ( TensorIndex k = 0; k < finalInput1Shape[1]; k++ )
{
double coefficient = alpha * tensorLookup( matrix, finalInput2Shape, { k, j } );
Variable inputVariable = tensorLookup( inputVariables, finalInput1Shape, { i, k } );
e.addAddend( coefficient, inputVariable );
}
// Set the bias
TensorIndices biasIndices = broadcastIndex( biasShape, outputShape, { i, j } );
double bias = beta * tensorLookup( biases, biasShape, biasIndices );
e.setScalar( -bias );

I'm a bit torn on how to address this. I can either continue to refine the tensor representation to incorporate strides which allows us to perform these operations much cheaper, or start to depend on an existing tensor library. For the latter case I'm not quite sure what I would use...

@wu-haoze
Copy link
Collaborator

wu-haoze commented Feb 17, 2024

Hi @MatthewDaggitt , I think we do want to fix this performance issue. If on the small network I sent, parsing is already taking 20+ seconds it's not a good sign. I would be in favor of depending on an existing Tensor library. Do you think it'd be helpful to go to Pytorch or Tensorflow? While this might be a overkill for parsing, it would lay a good foundation for our future agenda of incorporating gradient-based method.

We also need to look at the license of the library we choose. Maybe something to discuss next week.

@MatthewDaggitt
Copy link
Collaborator Author

MatthewDaggitt commented Feb 19, 2024

I'm not sure whether Pytorch or Tensorflow would help. We need efficient access to array operations over arbitrary-dimension C++ arrays, but I suspect the overhead of setting up an entire Pytorch/Tensorflow neural network just to compute them won't be worth it. Maybe I've misunderstood their capabilities though?

I have found that mdspan (multi-dimensional span) was introduced in C++ 2023 which looks promising:

If we implement it ourselves this looks useful:

Otherwise a third party library might be:

@wu-haoze
Copy link
Collaborator

I was suggesting using Pytoch/Tensorflow just for Tensor computation. From pytorch's documentation, they support general operations over tensors independent of machine learning. Though I haven't looked closely at it..

https://pytorch.org/tutorials/beginner/basics/tensorqs_tutorial.html

@MatthewDaggitt
Copy link
Collaborator Author

Ah I didn't realise they supported pure tensor operations as well, although it makes perfect sense when you think about it. Having said that the C++ Pytorch documentation is very minimal (https://pytorch.org/cppdocs/api/namespace_at.html#namespace-at) but I guess it's workable with.

@wu-haoze
Copy link
Collaborator

@MatthewDaggitt , it'd be great if Pytorch is workable. That'd lay a great foundation for our future development work. Their license also seems quite permissive: https://github.com/pytorch/pytorch/blob/main/LICENSE

@MatthewDaggitt
Copy link
Collaborator Author

Hmm so Pytorch is now being included fine, but I'm running into the problem that it also defines a LOG macro which clashes with our LOG macro. Do we have any preferences in the project with how to deal with this problem?

@wu-haoze
Copy link
Collaborator

wu-haoze commented Feb 28, 2024

Hmm so Pytorch is now being included fine, but I'm running into the problem that it also defines a LOG macro which clashes with our LOG macro. Do we have any preferences in the project with how to deal with this problem?

I'm not sure we have a protocol for this.. The solution below seems sensible..?
https://stackoverflow.com/questions/7109795/macro-definition-conflict

@guykatzz , do you have any suggestions about resolving conflict of macro definition?

@wu-haoze
Copy link
Collaborator

wu-haoze commented Feb 28, 2024

@MatthewDaggitt actually I just recall we have encountered similar issues, and undefining the clashing operations seem to be the solution.

#ifdef _WIN32
#undef max
#undef min
#endif

@MatthewDaggitt
Copy link
Collaborator Author

Okay after much suffering I think I've finally got libtorch included in our build process without breaking everything 🎉 Can be found on the branch use-pytorch.

The whole experience has really made me really appreciate the sane package managers and hygenic scoping of more modern languages. The next step is now to try and actually use Torch to handle the tensors...

@MatthewDaggitt
Copy link
Collaborator Author

MatthewDaggitt commented Mar 5, 2024

I was over optimistic in my last comment. Unfortunately it seems like the act of simply adding the import #include <torch/torch.h> to the header file, degrades the performance of the Marabou tests by a factor of about 100x on my machine.

@wu-haoze any chance you could confirm its the same on your machine? If you pull the branch use-pytorch and build, and then uncomment lines 23 and 25 in OnnxParser.h and then build again?

@MatthewDaggitt
Copy link
Collaborator Author

MatthewDaggitt commented Mar 8, 2024

Hmm you don't even have to include it in the header to trigger the problem. Simply linking against it is enough to degrade performance. I've asked a question on stack overflow here:

https://stackoverflow.com/questions/78125867/why-might-the-single-act-of-linking-against-a-library-drastically-degrade-the-ru

🤞 we get a useful answer

@MatthewDaggitt
Copy link
Collaborator Author

My personal contact also says:

statically, linked libraries, may have some initialization code, and that may cause the issues that you were seeing, which you could investigate a little bit with a debugger, but also which you could try to circumvent, or at least mitigate by simply linking to the library at runtime

@wu-haoze
Copy link
Collaborator

wu-haoze commented Mar 9, 2024

I was over optimistic in my last comment. Unfortunately it seems like the act of simply adding the import #include <torch/torch.h> to the header file, degrades the performance of the Marabou tests by a factor of about 100x on my machine.

@wu-haoze any chance you could confirm its the same on your machine? If you pull the branch use-pytorch and build, and then uncomment lines 23 and 25 in OnnxParser.h and then build again?

I tried it on my end and observed the same phenomenon.

@wu-haoze
Copy link
Collaborator

wu-haoze commented Mar 9, 2024

My personal contact also says:

statically, linked libraries, may have some initialization code, and that may cause the issues that you were seeing, which you could investigate a little bit with a debugger, but also which you could try to circumvent, or at least mitigate by simply linking to the library at runtime

Does linking to the library at runtime requires building a dynamically linked pytorch library?

@MatthewDaggitt
Copy link
Collaborator Author

I suspect so, but I'm unsure if that would solve the problem? Surely we'd have a similar bootup time whenever we imported the module instead? Which we always will thanks to the proposed dependency on ONNX...

@MatthewDaggitt
Copy link
Collaborator Author

Okay I've been playing around with this some more, and I haven't found a way of avoiding the start-up cost. Having said that I've failed to get PyTorch to build from source. What are your opinions @wu-haoze? Is increasing the flat start-up time of Marabou by 2 seconds acceptable?

@barrettcw
Copy link
Collaborator

I don't think we want to increase the start-up time by 2 seconds.

@wu-haoze
Copy link
Collaborator

wu-haoze commented Mar 21, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants