-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Repeating serialization error #33
Comments
Something possibly related from the It also occurs to me that we are using separate multi-threading tools in Python and R. It's possible there are some unexpected side effects from doing that. |
Oof. Regarding use of separate multi-threading tools, I suppose that we could have each Python thread spawn its own instance of R, and then distribute jobs respectively. A kernel score outside the prescribed bounds could conceivably be due to numerical overflow (see large diagonal problem in kernel matrices). |
I just ran into this problem on my workstation. Delving into my system logs, I found this report:
I suspect there is some exception being thrown at the level of deSolve that is not being handled correctly by rcolgem/Python. Will try to find the offending line in dQAL. |
…hing time units in tree. Fixed typo in rcolgem.py, PANGEA model specification. Successfully running PANGEA model, but encountered issue #33.
This might be fixed by rcolgem svn 126, integrated these changes as of commit 27ec24c |
I think this one's finally squished by commit 9fe8f34 |
ARGH. Still no good.
|
I believe this is fixed with 9509795. The problem was negative node heights, which would lead to an attempt to access a negative array index in dQAL (before the first time point when the population sizes are calculated). I put in a check for this so it will use the values from time zero if the time is negative. 30 steps so far no problems. This is not a coherent fix but more of a stopgap just to get it to run. If the node heights are negative, that means the nodes are around before time zero. Since first infected individual appears at time zero, this scenario is impossible, and probably those trees should be thrown away instead. |
I ran the command from issue #29 overnight. In the morning, the log indicated that 6196 steps had completed. The below message was printing to the console about every 2 seconds.
Some googling indicates this is probably an R problem, which I had thought was resolved before in #29.
Python had crashed at some point. Here are some relevant lines from the crash report. I saved the rest of the report.
I killed the program with Control+C. Here's the traceback to show where we were.
So, it appears that next_score was outside the allowed interval for some reason, we tried to dump the proposal and quit, but it failed.
One possibility is that sys.exit() didn't cleanly shut down all the threads and the R instance (instances?) they were attached to. So we still had R trying to pass stuff back to Python when it was shut down.
The text was updated successfully, but these errors were encountered: