Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Training] [Windows] #19965

Open
Positronx opened this issue Mar 18, 2024 · 7 comments
Open

[Training] [Windows] #19965

Positronx opened this issue Mar 18, 2024 · 7 comments
Labels
platform:windows issues related to the Windows platform stale issues that have not been addressed in a while; categorized by a bot training issues related to ONNX Runtime training; typically submitted using template

Comments

@Positronx
Copy link

Describe the issue

OS : Windows 10
Is there a way to generate training artifacts in C++, without having to use python utilities? I took a look at the source code and I think that it is possible. I'm just having a hard time to link the necessary header files related to generating the artifacts.

To reproduce

#include "orttraining/training_api/checkpoint.h"

I can't even compile an empty code containing above header, even though I linked the .lib files and required headers. The error says that the file 'onnx/onnx..pb.h' can't be opened.

Urgency

No response

ONNX Runtime Installation

Built from Source

ONNX Runtime Version or Commit ID

1.18.0

PyTorch Version

2.2.0

Execution Provider

Default CPU

Execution Provider Library Version

No response

@Positronx Positronx added the training issues related to ONNX Runtime training; typically submitted using template label Mar 18, 2024
@github-actions github-actions bot added the platform:windows issues related to the Windows platform label Mar 18, 2024
@baijumeswani
Copy link
Contributor

Is there a way to generate training artifacts in C++, without having to use python utilities? I took a look at the source code and I think that it is possible. I'm just having a hard time to link the necessary header files related to generating the artifacts.

Generating the training artifacts is currently not supported through C++ and requires usage of our python utilities.

May I ask why you would like to generate the training artifacts from c++?

@Positronx
Copy link
Author

I have a graphical framework written in C that reads functions compiled into DLL files. I want to be able to generate the artifacts directly from the graphical framework without having the need to use a third party function from python. Unfortunately, I can't communicate with python and C++ is the best I can do since it is the closest thing to C.

@eric-vision-e
Copy link

Hi @Positronx

I'm looking for the same thing. Have you found a way to generate artifacts with C++?

Thanks

@Positronx
Copy link
Author

Positronx commented Mar 21, 2024

Hello @eric-vision-e
I managed to generate a checkpoint file using the function SaveCheckpoint defined in the file orttraining/training_api/checkpoint.h. Otherwise, I'm struggling to use the class OrtModuleGraphBuilder defined in orttraining/core/framework/ortmodule_graph_builder.h to generate the gradient graph (what I assume to be related to training_model.onnx).

@eric-vision-e
Copy link

Hi @Positronix,

Thanks for your answer. But as far as i understood checkpoint is only for weights values. For example if I have a classification model already deployed with 2 classes and for some reason I want to add another class. The only way is to export again the model in Python then retrain in C++ with new data (old classes + new one). Because in this case my model architecture has changed.

Am I correct? Have you understood how to handle this scenario using only C++?

Thanks

@Positronx
Copy link
Author

Hi @eric-vision-e

The checkpoint is only for weight values (and other metadata like optimizers momentums but that doesn't concern me yet). If I understood your problem roughly, you want to change the training_model architecture without having to resort to python. What I'm looking for aligns with that. As far as I can tell, this will require (partial) rewriting of the python libraries that call onnxruntime_pybind11_state. Unfortunately, for the training model, it isn't as straightforward as the checkpoint file.

Copy link
Contributor

This issue has been automatically marked as stale due to inactivity and will be closed in 30 days if no further activity occurs. If further support is needed, please provide an update and/or more details.

@github-actions github-actions bot added the stale issues that have not been addressed in a while; categorized by a bot label Apr 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
platform:windows issues related to the Windows platform stale issues that have not been addressed in a while; categorized by a bot training issues related to ONNX Runtime training; typically submitted using template
Projects
None yet
Development

No branches or pull requests

3 participants