Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

async branch: error if I don't add the project account (which is not the default that daint picks up) #56

Open
jcanton opened this issue Jan 26, 2024 · 4 comments · Fixed by #54

Comments

@jcanton
Copy link
Collaborator

jcanton commented Jan 26, 2024

terribly explained, just a friday afternoon reminder for next week

Starting chain for case icon-seq-test and workflow icon-seq-test
Running the Processing Chain in sequential mode.
Using built-in model restarts.
└── Starting chunk with startdate 2018-01-01 00:00:00+00:00
    └── Process "prepare_icon" for chunk "2018010100_2018010106"
Traceback (most recent call last):
  File "/scratch/snx3000/jcanton/processing-chain.async/./run_chain.py", line 298, in run_chunk
    to_call.main(cfg)
  File "/scratch/snx3000/jcanton/processing-chain.async/jobs/prepare_icon.py", line 97, in main
    cfg.submit('prepare_icon', script)
  File "/scratch/snx3000/jcanton/processing-chain.async/config.py", line 451, in submit
    job_id = int(result.stdout)
             ^^^^^^^^^^^^^^^^^^
ValueError: invalid literal for int() with base 10: b''

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/scratch/snx3000/jcanton/processing-chain.async/./run_chain.py", line 531, in <module>
    main()
  File "/scratch/snx3000/jcanton/processing-chain.async/./run_chain.py", line 517, in main
    restart_runs(cfg=cfg, force=args.force, resume=args.resume)
  File "/scratch/snx3000/jcanton/processing-chain.async/./run_chain.py", line 375, in restart_runs
    run_chunk(cfg=cfg, force=force, resume=resume)
  File "/scratch/snx3000/jcanton/processing-chain.async/./run_chain.py", line 315, in run_chunk
    raise RuntimeError(subject)
RuntimeError: ERROR or TIMEOUT in job 'prepare_icon' for chain '2018010100_2018010106'
@leclairm
Copy link
Contributor

Looks like the async submission process is getting an empty string as input for a job id. Will check that.

@mjaehn
Copy link
Contributor

mjaehn commented Jan 29, 2024

For me, this works. What did you do before this and have you changed any settings? What is your project account and where did you set it? @jcanton

@jcanton
Copy link
Collaborator Author

jcanton commented Jan 29, 2024

I didn't explain well (Friday reminder).
For some reason the default account for my user on CSCS is hymet, which doesn't exist anymore or at least has zero computational hours available.
If I run the chain tests as they are, independently of seq or async, hymet is picked up as account, then sbatch returns an error that is not correctly picked up either:

This is prepare_icon.sh, with the "incorrect" --account

    2 #SBATCH --job-name="prepare_icon_2018010100_2018010106"
    3 #SBATCH --nodes=1
    4 #SBATCH --time=00:10:00
    5 #SBATCH --output=/scratch/snx3000/jcanton/processing-chain.async/work/icon-test/2018010100_2018
    6 #SBATCH --open-mode=append
    7 #SBATCH --account=hymet
    8 #SBATCH --partition=normal
    9 #SBATCH --constraint=gpu
   10
   11 cd /scratch/snx3000/jcanton/processing-chain.async
   12 ./run_chain.py icon-test -j prepare_icon -c 2018010100_2018010106 -f -s --no-logging

and this is the error that sbatch throws if I try to run it manually:

[jcanton@daint105 processing-chain.async]$ sbatch --parsable ./work/icon-test/2018010100_2018010106/job_scripts/prepare_icon.sh

ERROR: invalid account specified (hymet)

sbatch: error: cli_filter plugin terminated with error

so the issue is actually with the default values used for the cfg

@mjaehn
Copy link
Contributor

mjaehn commented Feb 7, 2024

Should be solved in #54

@mjaehn mjaehn linked a pull request Feb 7, 2024 that will close this issue
18 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants