-
-
Notifications
You must be signed in to change notification settings - Fork 16.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Query/Issue with Custom YOLOv5 Model and ONNX Export #13473
Comments
👋 Hello @AbhirupSinha1811, thank you for your detailed report and for using YOLOv5 🚀! Your observations and debugging steps are very thorough, which is highly appreciated. If this is indeed a 🐛 Bug Report, we kindly request a minimum reproducible example (MRE) to better assist in debugging this issue. An MRE would ideally contain simplified, complete code snippets and/or instructions to reproduce the ONNX export and the tensor shape discrepancy. From the context provided, here are a few steps you can double-check:
RequirementsEnsure you are using Python>=3.8 with all dependencies installed correctly. Install requirements using: pip install -r requirements.txt Verified EnvironmentsThe ONNX export process is generally supported on environments such as notebooks, cloud platforms, or Docker. Make sure your training and export environments meet the dependencies, including PyTorch, CUDA, and ONNX runtime versions. Additionally, it's worth confirming if the issue persists when running the export script on different setups or versions. This is an automated response, but don't worry! An Ultralytics engineer will review your issue promptly to provide further assistance. In the meantime, feel free to share any additional findings or code snippets that could help us debug further 🚀. |
@AbhirupSinha1811 thank you for providing a detailed explanation of the issue. Based on your observations, it seems the problem stems from a misconfigured detection head in the
To resolve this issue definitively, it is recommended to retrain the model with the correct class configuration (4 classes). If you suspect a training script issue, ensure you are using the latest YOLOv5 version and verify the Feel free to share further observations or questions. The YOLO community and Ultralytics team are here to help! |
Hello, after check the detection head of the yolo .pt model what I'm get is given below:-
Why does the detection head of my custom YOLOv5s model have 68 output channels when it was trained on 4 classes? Shouldn’t it be 27 (3 × (5 + 4) for 4 classes and 3 anchors)?
How does it handle the extra channels?
Key Observations to Share Behavior: Model detects all 4 classes correctly during inference with .pt but shows unexpected behavior during ONNX export. |
@AbhirupSinha1811 thank you for the detailed observations. Here's a concise response addressing your queries:
For further details on ONNX export, refer to the YOLOv5 Export Tutorial. Feel free to follow up with additional questions! |
Hello, @pderrenger I have reviewed the training script and data.yaml file thoroughly, and there have been no modifications. The script is standard and directly references data.yaml with nc=4 and class names: ["bird", "drone", "helicopter", "jetplane"]. No customizations or deviations have been made. Training sample code:- Training starts heremodel = YOLO(data="data.yaml", epochs=100) # Initiates training with 100 epochs YOLOv5 Version:- Observed Issue:- Training Script Behavior: The training script appears standard and passes data="data.yaml" with nc=4. Is there any additional step required to ensure that the detection head is correctly initialized with the number of classes (4) during training? When resuming training with resume=True, does the detection head automatically align with the nc value in data.yaml, or could it retain the configuration from the checkpoint (last.pt)? Detection Head Configuration: Does the model automatically reconfigure the detection head when nc changes, or does it require manual intervention (e.g., reinitializing layers)? Data.yaml Verification: The data.yaml file has nc=4 and lists four class. Are there any other factors (e.g., anchor settings or dataset labels) that could lead to a mismatch in detection head outputs?" Does the order or format of the class names in data.yaml impact the detection head configuration during training?" When resuming training with last.pt, could the detection head's configuration (e.g., no and anchors) differ from the new dataset's nc? If so, what steps are needed to realign the detection head? Model Export and Compatibility: TensorRT What is the best way to inspect the detection head during training or inference to verify its nc and no configuration? Are there specific checkpoints or logging steps recommended to avoid such mismatches? |
Hello, @AbhirupSinha1811, and thank you for the detailed explanation and observations. Based on your description, here are some points to address your concerns:
Let us know if you need further clarification! For more export-related guidance, refer to the YOLOv5 Export Tutorial. |
Hello , I've check the out custom .pt model into Netron and get this:- Detection Head and Outputs Why is the number of outputs (no) from the detection head 68, what are the factors are suppose to be responsible for this kind of value and if we do re-train what things keep in mind to before perform ? The anchors tensor has a shape of float16[2,7497]. Is this correct for my custom-trained model, or does it indicate an issue? How does the detection head configuration relate to the number of classes (nc=4)? Anchors How can I confirm if the anchors used during training were correct for my dataset? Training and Configuration |
Hello, thank you for your observations. Here's a concise breakdown addressing your concerns:
To avoid such issues, verify the |
Hello @pderrenger , |
Hello @AbhirupSinha1811, To trace and understand the calculations resulting in
This will help identify where the configuration might deviate from expectations. Let me know if you need further clarification! |
Hello @pderrenger , |
Hello @AbhirupSinha1811, to avoid issues like incorrect anchor tensor shapes (
For more on anchor generation, review the YOLOv5 Architecture Documentation. Let me know if further details are needed! |
Hello, I am currently working on retraining a YOLOv5 model using the last.pt checkpoint and I would like to continue training with additional epochs. I am considering using the --resume argument in the train.py script for this purpose. Could you please confirm if using the --resume argument is the correct approach for continuing the training from the last.pt checkpoint with additional epochs |
Search before asking
YOLOv5 Component
Detection, Export
Bug
I am working with a custom-trained YOLOv5 model that was trained on a dataset with 4 classes. After exporting the model to ONNX format, I am facing discrepancies in the output tensor shape and class configurations, which are creating confusion and potential issues in downstream tasks. Below, I outline the details of my observations, potential root causes, and attempts to resolve the issue.
Environment
yolov5s.pt, ubuntu 22.04, in own system.
Minimal Reproducible Example
normal detection code from"https://github.com/arindal1/yolov5-onnx-object-recognition/blob/main/yolov5.py"
Additional
Observations:
Custom Model Details:
The .pt model was trained on a dataset with 4 classes (bird, drone, helicopter, jetplane).
When inspecting the .pt model, the number of classes is confirmed as 4 both in the names field and in the nc parameter from the data.yaml.
The .pt model performs as expected, detecting all 4 classes correctly during inference.
ONNX Export Details:
After exporting the model to ONNX, the output tensor shape is reported as [1, 8, 8400].
The 8 indicates the number of output channels in the detection head, which suggests it is configured for only 3 classes (5 + 3 = 8 instead of 5 + 4 = 9).
This is inconsistent with the .pt model, which was trained on 4 classes.
When checking the ONNX model metadata, the class names (bird, drone, helicopter, jetplane) are correctly stored, indicating 4 classes in the metadata.
Comparison with Default COCO Model:
For reference, the output tensor shape of a YOLOv5 model trained on the COCO dataset (80 classes) is [1, 25200, 85].
Here, 85 = 5 + 80 (5 for bounding box attributes + 80 for classes).
This format aligns with the expected configuration for YOLO models.
Key Issues:
Mismatch in Output Tensor Shape:
The ONNX model’s output tensor shape suggests it is configured for only 3 classes ([1, 8, 8400]), despite the .pt model being trained on 4 classes.
This raises concerns about whether the ONNX model will correctly detect all 4 classes.
Potential Causes of the Issue:
The detection head in the .pt model might have been misconfigured during training or export.
For 4 classes, the detection head’s out_channels should be 5 + 4 = 9, but it appears to be set to 8.
The ONNX export process might not be correctly handling the model’s class configuration.
Implications for Object Detection:
If the ONNX model is truly configured for only 3 classes, it may fail to detect one of the classes or produce incorrect predictions.
Steps Taken to Debug:
Inspected Detection Head of .pt Model:
Verified the out_channels of the detection head (last layer).
The .pt model’s detection head is confirmed to have out_channels = 8, indicating a configuration for 3 classes.
This discrepancy persists despite the model being trained on 4 classes.
Verified ONNX Model Metadata:
Extracted metadata from the ONNX model, which correctly lists 4 class names (bird, drone, helicopter, jetplane).
Tried Re-exporting the Model:
Re-exported the .pt model to ONNX using the official YOLOv5 export script.
The issue with the output tensor shape ([1, 8, 8400]) remains.
Request for Assistance:
Clarification on Detection Head Configuration:
Could this issue arise from a misconfiguration of the detection head during training? If so, how can I fix it without retraining the model?
Is there a way to manually adjust the detection head’s out_channels in the .pt model and re-export it to ONNX?
ONNX Export Process:
Are there known issues with the YOLOv5 ONNX export script that could cause this mismatch?
How can I ensure the ONNX model’s detection head is correctly configured for 4 classes?
General Guidance:
What steps can I take to verify that the ONNX model will correctly detect all 4 classes?
Are there tools or scripts you recommend for validating the ONNX model’s outputs?
Additional Context:
ultralytics - 2.4.1
PyTorch Version: 2.4.1
ONNX Runtime Version:1.16.3
Thank you for your assistance in resolving this issue!
Are you willing to submit a PR?
The text was updated successfully, but these errors were encountered: