-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Faster parallel processing pipeline #5
base: master
Are you sure you want to change the base?
Conversation
Critical to get the spots to match up correctly in reciprocal space; once again shows that we have some broken data models here. Though could I have used scan slicing for this?
faster parallel processing pipeline
Also fix a bug where total range does not divide nicely into 5° chunks (e.g. 0.15° images)
Does involve some shuffling around between computer numbers and people numbers
Tool to extract a subset of a phil scope which has been parsed back to text
Includes scopes for every DIALS program which is used. For each step will extract relevant PHIL parameters to a file and pass on to the program in question if not empty.
OK, think this is in a state now where it's kinda sorta ready for wider input: two outstanding issues are:
I do however believe that this is a useful starting point and is surprisingly un-gorilla-like, in that there are even a handful of tests. Am approaching this as a two-stage PR: part 1 is fixing up the code for squash merge as a new feature here then part 2 is grabbing those diffs across to DIALS proper for inclusion in a future release once we've shaken it down in real life. |
Merge reflection files and experiments manually, with local implementation
Would particularly welcome opinions on |
Add future parallels wrapper to run each processing task on a given number of cores for a given number of workers - needs cluster addition but gives a framework for parallel operation
pass | ||
|
||
return number_of_processors(return_value_if_unknown=-1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since we have psutil
:
pass | |
return number_of_processors(return_value_if_unknown=-1) | |
return len(psutil.Process().cpu_affinity()) |
gets the number of CPUs that are available to the process, rather than the number of CPUs in the system. This is the Windows-compatible version of len(os.sched_getaffinity(0))
NSLOTS
still takes priority though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Computer says no:
>>> import psutil
>>> psutil.Process().cpu_affinity()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'Process' object has no attribute 'cpu_affinity'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dials.util.mp.available_cores
should handle this correctly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the risk of repeating myself, this is not in a release
>>> dials.util.mp.available_cores
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: module 'dials.util.mp' has no attribute 'available_cores'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#5 (comment) for reference
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not see the earlier comment because it was hidden by being resolved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was 25 days ago. These days you should use dials.util.mp.available_cores()
from dials/dials#1430 instead of your nproc()
implementation
Sorry, was not clear to me that fp3 is already set against a DIALS release, given that it’s in a PR into a scratch space and not laid out as a package yet.
|
Evidently it is not yet a package etc. however we did discuss the other day making it one, so I was kinda trying to avoid picking up too much baggage from master. If we are copying the new bits over to 3.1 series then valid, I guess. |
TODO: modify to use DIALS 3.2 feature set now released; reformat as package @jbeilstenedmands I have asked for your review for "combine_reflections" in @Anthchirp would welcome input on how to close this off as a Python package now I guess... I think it is close to OK for a 0th order implementation (though obviously scope for future improvement) |
TODO: remove duff reflections before manipulations; use vector expressions for calculating weighted sums |
do you want to keep the development history in $newpackage? |
I tried out fp3 on the beta lactamase dataset, however I got an assertion error as part of the combine step:
Not sure at the moment how to address this but I'll have a think. |
@jbeilstenedmands excellent spot thank you - this needs some attention upstream of this then Ah, wait - this is in the matching process, so what that means (I guess) is we need a smarter way of matching reflections which are found in both halves of the data set - if I don't do this (as was before) you end up with two "half-observations" which mess things up... |
I think the matching is correct, i.e. you have found the two halves of same spot, but one is misindexed, due to the difficulty in indexing along a certain direction for a thin slice of data? So does there need to be an extra step in this case to determine which index is correct and use that for both halves? |
Looking at these now and remembered that this data was artificially processed to give it 0.5° images which could have some interesting effects... |
This fails going into block 16 (i.e. the 17th block) however the orientation matrix does not seem that different?
|
Co-authored-by: jbeilstenedmands <[email protected]>
WIP
Parallel processing pipeline - making a PR so work visible.