Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about the train on my own dataset ! #38

Open
GiftNovice opened this issue Oct 27, 2024 · 3 comments
Open

Question about the train on my own dataset ! #38

GiftNovice opened this issue Oct 27, 2024 · 3 comments

Comments

@GiftNovice
Copy link

屏幕截图 2024-10-27 163058

I run the training script. But there was no training started, and there were no errors reported, and it just ended,i dont know the reason 。In addition, Can two gtx3090 24G Gpus complete the training? thanks

@jingyirubyli
Copy link

屏幕截图 2024-10-27 163058

I run the training script. But there was no training started, and there were no errors reported, and it just ended,i dont know the reason 。In addition, Can two gtx3090 24G Gpus complete the training? thanks

hi I have the same problem. have you solved it?

@pILLOW-1
Copy link

pILLOW-1 commented Nov 2, 2024

I have the same problem

@frank-xwang
Copy link
Owner

Hi, sorry for the late reply. Submitit is a tool for submitting Python functions for computation within a Slurm cluster. If you are not using a Slurm cluster, you may need to directly run the main_submitit.py (some small modifications may be needed) to launch experiments locally.

We used 64 A100-80G GPUs for model training. We have never tried to use 3090 GPUs for model training, I think you may meet the OOM error if you are using two 24G 3090 GPUs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants