Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updated installation instructions for on-device training package #192

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

carzh
Copy link
Contributor

@carzh carzh commented Jun 25, 2024

To reflect the updated installation instructions for installing onnxruntime for training. Also adds requirements.txt for mobilebert example

Addresses ONNXRuntime issue #21149

@GeorgeS2019
Copy link

GeorgeS2019 commented Jun 29, 2024

@carzh

@carzh
Copy link
Contributor Author

carzh commented Jun 29, 2024

What do you mean update the notebook to the latest 1.18.1 version? Are you running into issues with running the notebook with 1.18.1?

The README for the masked language modeling example already includes the updated installation instructions for ONNXRuntime, and this example was written for CPU EP only.

@GeorgeS2019
Copy link

GeorgeS2019 commented Jun 29, 2024

image

Try this on the latest 1.18.* onnxruntime-training
import onnxruntime.training.onnxblock as onnxblock

@GeorgeS2019
Copy link

GeorgeS2019 commented Jun 29, 2024

The whole problem with onnxruntime-training is lack of specific information on requirements for CUDA 12.* and lack of testing that when using c#

microsoft/onnxruntime#21212
microsoft/onnxruntime#21197

@carzh
Copy link
Contributor Author

carzh commented Jul 1, 2024

I'll update the notebook & add a requirements.txt file for the on-device training example.

We can add adding a CUDA C# example to the backlog. I understand your frustration with the lack of documentation and we do need to improve ONNXRuntime documentation, but creating good documentation and examples also takes time.

@GeorgeS2019
Copy link

GeorgeS2019 commented Jul 1, 2024

@carzh
Thx for updating the documentation and training codes. Onnxrunetime training will become increasingly important, due to the possibilities to combine with generative AI e.g. Phi3 through AI orchestration using semantic kernel

@carzh
Copy link
Contributor Author

carzh commented Jul 1, 2024

Try this on the latest 1.18.* onnxruntime-training import onnxruntime.training.onnxblock as onnxblock

Ah, I gave a try, and did not run into any issues with that import line. What error do you run into and with what version of ONNXRuntime package? (ie, are you using onnxruntime-training-cpu?)

@GeorgeS2019
Copy link

GeorgeS2019 commented Jul 2, 2024

@carzh

I can not use python to upgrade onnxruntime-training from 1.15.1 to 1.18.1
No windows version is available

The link to download the windows version is hard to find
microsoft/onnxruntime#21149 (comment)
pip install -i https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/ORT/pypi/simple/ onnxruntime-training-cpu

mobilebert-uncased.ckpt is not created. Only checkpoint file is written.

https://github.com/carzh/onnxruntime-training-examples/blob/cebaf5cb5077007163d16720f139c5c847df66e6/on_device_training/desktop/csharp/masked_language_modeling/csharp_console_app/Program.cs#L36C84-L36C96

@carzh
Copy link
Contributor Author

carzh commented Jul 2, 2024

@GeorgeS2019 The link to download the windows version is available at onnxruntime.ai. All the documentation should point to that installation table.

Try the following:

pip uninstall onnxruntime-training -y
python -m pip install cerberus flatbuffers h5py numpy>=1.16.6 onnx packaging protobuf sympy setuptools>=41.4.0
pip install -i https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/ORT/pypi/simple/ onnxruntime-training-cpu --no-cache-dir

I appended the no-cache-dir flag to ensure that pip doesn't pick up on any locally cached onnxruntime-training python packages.

@GeorgeS2019
Copy link

GeorgeS2019 commented Jul 2, 2024

@carzh

I managed to download onnxruntime-training python package for windows: 1.18.0
I could run the notebook, but the generated checkpoint file is only called => checkpoint

Problem1:
mobilebert-uncased.ckpt is not created. Only checkpoint file is written.
Problem2:
All attempt to load the create file fail, not sure which Onnxruntime and Onnxruntime-Training nuget to use.

string checkpointPath = Path.Combine(parentDir, "training_artifacts", "mobilebert-uncased.ckpt");

https://github.com/carzh/onnxruntime-training-examples/blob/cebaf5cb5077007163d16720f139c5c847df66e6/on_device_training/desktop/csharp/masked_language_modeling/csharp_console_app/Program.cs#L36C84-L36C96

@GeorgeS2019
Copy link

There are confusion with Cuda 12 support.
Which version of onnxruntime and onnxruntime-training I could use for Cuda 12?

When running the notebook python, which version of python package I need for cuda 12? 1.18.1?

image

@GeorgeS2019
Copy link

For Cuda 12, Onnruntime-training in windows for both python and windows is not supported it seems

image

@carzh
Copy link
Contributor Author

carzh commented Jul 2, 2024

There are confusion with Cuda 12 support. Which version of onnxruntime and onnxruntime-training I could use for Cuda 12?

image

Use the most up-to-date package unless the example specifies otherwise. The latest release includes CUDA 12 support, but CUDA 12 is not supported for all configurations of ORT, as it looks like you've discovered.

For example, on-device training with Python for Linux supports CUDA 12, but on-device training with Python for Windows does not. The easiest way to check if a configuration supports CUDA 12 is with the installation table on onnxruntime.ai.

When running the notebook python, which version of python package I need for cuda 12? 1.18.1?

The README specifies to use the python package for CPU. The C# example was written for CPU EP, and not CUDA.

I'll try reproducing the issues you are running into later today and I'll push some updates to make it clearer that the masked language modeling example is for CPU.

@GeorgeS2019
Copy link

I'll try reproducing the issues you are running into later today

Any update?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants