-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
File size increase in serialising interrogated agents #567
Comments
Thanks for the report - and just to be clear, this is a regression somewhere between the 0.12.1 and the most recent commit on main a48b998? For the record, here is the full diff between 0.12.1 and now: v0.12.1...main I don't have a good guess at the moment, but would you be open to helping us narrow it down with a crude bisect between 0.12.1 and now?
Also if you could provide any more information about the file size explosion that'd be great. For example, can you tell whether the memory is due to a larger |
That's correct. I can confirm that the issue seems to be in I will try to do the bisect, but am on a work machine without Rtools so may prove difficult. |
Thanks - that helps narrow things down. Could you confirm whether this is just And by "bisect" I just meant "run |
|
I'm puzzled as well! How about further back in time to
|
Right, apologies, something must not have refreshed properly between installs. I can confirm now that the regression actually occurred ~ 1-4 months ago ( |
Got it! So I think that narrows down the problem space to: 374e9f8...51968d8 That's helpful. I'll try to come up with a reprex. Thanks! Sorry 1 more quick question @jl5000: Do you use a |
There are several calls to |
Another bisection...c54b035 works ok. So the problem space is now c54b035...51968d8 |
362b706 also works. Problem space: 362b706...51968d8 |
Great sleuthing! I think I may have the answer, traceable to #543, where we started to track a # In memory
pb_call <- lapply(agent$validation_set$capture_stack, `[[`, "pb_call")
scales::label_bytes()(
as.integer(object.size(pb_call))
)
#> [1] "13 kB"
# Serialized
f <- tempfile()
saveRDS(pb_call, f)
scales::label_bytes()(
file.size(f)
)
#> [1] "208 kB" |
I hadn't considered the memory implications of serializing the call language object, but this makes sense! I'll run a few more things to double check, and if this is correct I'll start working on a (soft) rollback. We definitely don't need to track the full language object in there. |
@jl5000 Can you see if the open PR fixes the issue for you?
|
Yes, that seems to work! |
Awesome! Thanks for your help debugging this :) |
There seems to be some kind of regression in the dev version of {pointblank} that is causing interrogated agents that are saved to disk are over an order of magnitude larger than the ones produced by the version on CRAN.
I am using {pointblank} with {targets} and saving interrogated agents to disk using {qs}. I have a large validation pipeline, which I unfortunately cannot share, with creates a target on disk with size of ~14 MB. When using the dev version, this file size explodes to ~500 MB. In both cases I am ensuring
tbl_checked
is not extracted (either through the non-public API or the new argument ininterrogate()
).Has there been anything added to the validation object which could account for the explosion in file size?
The text was updated successfully, but these errors were encountered: