-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
speed up loading of namespaces: return shallow copy in build_const_args #1103
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## dev #1103 +/- ##
==========================================
- Coverage 88.88% 88.88% -0.01%
==========================================
Files 45 45
Lines 9836 9835 -1
Branches 2795 2795
==========================================
- Hits 8743 8742 -1
Misses 776 776
Partials 317 317 ☔ View full report in Codecov by Sentry. |
The change to a shallow copy should be fine. The only reason I would suspect a deepcopy is needed is if we wanted to modify an independent copy.
lines 276 -282 And also in namespace.py
I would need to dive in deeper to see if the order in which we call these methods should not conflict with us using a shallow copy. I'll tackle this next week when I am back. |
I think this will require careful testing. We should check if/where the spec object is actually being modified and why. If the spec is being modified downstream, then I'd suspect that this could lead to issues when reading multiple files where you could get undesirable side-effects where the spec if modified when reading file A and then when reading file B it would see the modifications made when reading A. I'm not sure whether that is actually the case or whether using a |
Agreed. |
I checked this out over here: #1152 (comment) tl;dr the deepcopy doesn't protect from mutation anyway because of when it is called/what calls it, the main thing deepcopy seems to be doing is giving derived objects a new |
Motivation and description
I am trying to speed up the loading of namespaces in pynwb. Sometimes it takes up to 6 seconds on initial load. I was tracing through the code to see what could be causing the slowness and I came across the a deepcopy in a low level function build_const_args that gets called a lot during namespace loading. I replaced this with a shallow copy and noticed a significant improvement in load time.
IMPORTANT: I am not familiar enough with the code to know whether this change is going to break anything.
This is one of two PRs I am submitting to try and speed things up.
How to test the behavior?
Run this script twice before the change and once after the change. The first time will download the needed data and will save the loaded file segments to a cache directory. The second time and third times it is run, it will not include the download time. On my machine it takes around 4 sec to load before the change and around 1.5 sec after the change.
Checklist
CHANGELOG.md
with your changes?@oruebel @rly