Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SIPNET swallows all memory when met is missing in the middle of a time series #2156

Closed
ashiklom opened this issue Oct 19, 2018 · 6 comments
Closed

Comments

@ashiklom
Copy link
Member

Describe the bug
Long (multi-decadal) runs of SIPNET can exceed system memory limitations, at least on Docker.
This causes the model run process to be killed by the kernel OOM killer.
In the logfile, this just looks like the cryptic error message Killed.
In the system logs (log command on Mac), this looks like:

2018-10-19 10:56:16.133675-0400 0x20cf5    Default     0x0                  11671  com.docker.hyperkit: [51322.605158] [ pid ]   uid  tgid total_vm      rss nr_ptes nr_pmds swapents oom_score_adj name
...
2018-10-19 10:56:16.191227-0400 0x20cf5    Default     0x0                  11671  com.docker.hyperkit: [51322.662809] [75007]     0 75007  1968408  1845966    3849      11   121487             0 sipnet.136
2018-10-19 10:56:16.191775-0400 0x20cf5    Default     0x0                  11671  com.docker.hyperkit: [51322.663345] Out of memory: Kill process 75007 (sipnet.136) score 855 or sacrifice child
2018-10-19 10:56:16.192534-0400 0x20cf5    Default     0x0                  11671  com.docker.hyperkit: [51322.663901] Killed process 75007 (sipnet.136) total-vm:7873632kB, anon-rss:7383864kB, file-rss:0kB, shmem-rss:0kB

To Reproduce
Run SIPENT (r136) for 100 years.

Expected behavior
The model should run.

Machine (please complete the following information):

  • Server: Docker
  • OS: Mac OS High Sierra
  • Machine itself has tons of memory (32 GB totall; >10 GB free at time of execution), so this from an artificial bottleneck imposed by Docker or the system.
@ashiklom
Copy link
Member Author

An extended Kernel log is available here.

@robkooper
Copy link
Member

Does the same apply if you try to run this on the machine (outside of the container)?

@ashiklom
Copy link
Member Author

Haven't tried yet, but I'll give it a whirl.

@ashiklom
Copy link
Member Author

So the problem here turned out to be that I was missing a year of met (2004, for a run from 1902 to 2008) in the middle of the time series. That apparently tripped up SIPNET. Running 1900 to 2003 worked fine and, profiling via Valgrind, only used about 20-30 MB of RAM.

That missing met would kill it this way is insidious. I'm leaving this issue open but changing the name to reflect the real problem.

(The missing met was partially user error -- I had already downloaded year 2004 for this site, but because I did it before my land-sea mask fix, it was all NA.)

@ashiklom ashiklom changed the title Dockerized SIPNET runs out of memory for long runs SIPNET swallows all memory when met is missing in the middle of a time series Oct 19, 2018
@github-actions
Copy link

This issue is stale because it has been open 365 days with no activity.

@ashiklom
Copy link
Member Author

This is a SIPNET issue, not a PEcAn issue, so transferring to PecanProject/sipnet#7.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants