Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster parallel processing pipeline #5

Open
wants to merge 66 commits into
base: master
Choose a base branch
from
Open

Faster parallel processing pipeline #5

wants to merge 66 commits into from

Conversation

graeme-winter
Copy link
Contributor

WIP

Parallel processing pipeline - making a PR so work visible.

Critical to get the spots to match up correctly in reciprocal space; once again
shows that we have some broken data models here. Though could I have used scan
slicing for this?
faster parallel processing pipeline
Also fix a bug where total range does not divide nicely into 5° chunks
(e.g. 0.15° images)
Does involve some shuffling around between computer numbers and people
numbers
Tool to extract a subset of a phil scope which has been parsed back to text
Includes scopes for every DIALS program which is used. For each step will
extract relevant PHIL parameters to a file and pass on to the program in
question if not empty.
@graeme-winter
Copy link
Contributor Author

graeme-winter commented Jun 30, 2020

OK, think this is in a state now where it's kinda sorta ready for wider input: two outstanding issues are:

  • needing to usefully "sum" the integrated data back into a single scan
  • running the integration blocks in parallel rather than as they are here sequential

I do however believe that this is a useful starting point and is surprisingly un-gorilla-like, in that there are even a handful of tests.

Am approaching this as a two-stage PR: part 1 is fixing up the code for squash merge as a new feature here then part 2 is grabbing those diffs across to DIALS proper for inclusion in a future release once we've shaken it down in real life.

Merge reflection files and experiments manually, with local implementation
@graeme-winter
Copy link
Contributor Author

Would particularly welcome opinions on tools.py::combine_{experiments,reflections}

Comment on lines +76 to +78
pass

return number_of_processors(return_value_if_unknown=-1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since we have psutil:

Suggested change
pass
return number_of_processors(return_value_if_unknown=-1)
return len(psutil.Process().cpu_affinity())

gets the number of CPUs that are available to the process, rather than the number of CPUs in the system. This is the Windows-compatible version of len(os.sched_getaffinity(0))

NSLOTS still takes priority though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Computer says no:

>>> import psutil
>>> psutil.Process().cpu_affinity()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'Process' object has no attribute 'cpu_affinity'

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dials.util.mp.available_cores should handle this correctly

Copy link
Contributor Author

@graeme-winter graeme-winter Oct 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the risk of repeating myself, this is not in a release

>>> dials.util.mp.available_cores
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: module 'dials.util.mp' has no attribute 'available_cores'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#5 (comment) for reference

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not see the earlier comment because it was hidden by being resolved.

Copy link
Member

@Anthchirp Anthchirp Oct 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was 25 days ago. These days you should use dials.util.mp.available_cores() from dials/dials#1430 instead of your nproc() implementation

@Anthchirp
Copy link
Member

Anthchirp commented Oct 2, 2020 via email

@graeme-winter
Copy link
Contributor Author

Sorry, was not clear to me that fp3 is already set against a DIALS release, given that it’s in a PR into a scratch space and not laid out as a package yet.

Evidently it is not yet a package etc. however we did discuss the other day making it one, so I was kinda trying to avoid picking up too much baggage from master. If we are copying the new bits over to 3.1 series then valid, I guess.

@graeme-winter
Copy link
Contributor Author

TODO: modify to use DIALS 3.2 feature set now released; reformat as package

@jbeilstenedmands I have asked for your review for "combine_reflections" in fp3/tools.py

@Anthchirp would welcome input on how to close this off as a Python package now I guess... I think it is close to OK for a 0th order implementation (though obviously scope for future improvement)

@graeme-winter
Copy link
Contributor Author

TODO: remove duff reflections before manipulations; use vector expressions for calculating weighted sums

@Anthchirp
Copy link
Member

do you want to keep the development history in $newpackage?

@graeme-winter
Copy link
Contributor Author

do you want to keep the development history in $newpackage?

Nope, not a chance. Without a decent rewrite the history is close to:

git_commit

@jbeilstenedmands
Copy link
Contributor

I tried out fp3 on the beta lactamase dataset, however I got an assertion error as part of the combine step:
assert d0["miller_index"][i] == d1["miller_index"][j]
Printing out the details of the assertion fail and allowing to continue, it seems that there are cases where the spots that are split across a processing boundary can be misindexed. In this case they are off by 1 in l for some low values of l, the spot positions indicate this is the same spot split across a processing boundary:

Mismatched indices: (-1, 26, 1) (-1, 26, 2)  Spot positions(px): (2113.23,1888.61,159.50), (2113.77,1889.74,160.50)
Mismatched indices: (15, -18, -1) (15, -18, -2)  Spot positions(px): (1197.50,424.83,269.50), (1196.50,420.00,270.50)
Mismatched indices: (19, -13, -1) (19, -13, -2)  Spot positions(px): (1460.98,453.74,269.50), (1460.77,454.77,270.50)
Mismatched indices: (15, -25, 2) (15, -25, 1)  Spot positions(px): (945.68,112.66,289.50), (945.86,112.69,290.50)
Mismatched indices: (7, -29, -3) (7, -29, -4)  Spot positions(px): (464.87,214.76,289.50), (462.00,212.50,290.50)
Mismatched indices: (9, -31, -1) (9, -31, -2)  Spot positions(px): (381.01,3.73,299.50), (380.17,3.60,300.50)
Mismatched indices: (10, -30, 1) (10, -30, 0)  Spot positions(px): (501.12,28.31,299.50), (500.00,28.86,300.50)
Mismatched indices: (7, -33, 1) (7, -33, 0)  Spot positions(px): (99.94,2522.17,519.50), (97.36,2521.79,520.50)
Mismatched indices: (22, -18, -2) (22, -18, -3)  Spot positions(px): (1443.29,2312.19,539.50), (1441.99,2312.68,540.50)
Mismatched indices: (25, -10, 0) (25, -10, -1)  Spot positions(px): (1809.73,2122.88,539.50), (1808.48,2122.59,540.50)
Mismatched indices: (19, -21, -2) (19, -21, -3)  Spot positions(px): (1233.82,2312.61,539.50), (1233.33,2313.42,540.50)
Mismatched indices: (13, -22, 4) (13, -22, 3)  Spot positions(px): (1018.01,2122.39,559.50), (1016.85,2121.21,560.50)
Mismatched indices: (18, -22, 6) (18, -22, 5)  Spot positions(px): (1175.98,2334.86,559.50), (1176.00,2333.00,560.50)
Mismatched indices: (-16, 23, 1) (-16, 23, 2)  Spot positions(px): (1472.50,165.90,639.50), (1472.71,165.03,640.50)
Mismatched indices: (-24, 17, -2) (-24, 17, -1)  Spot positions(px): (978.33,89.52,649.50), (979.07,89.07,650.50)

Not sure at the moment how to address this but I'll have a think.

@graeme-winter
Copy link
Contributor Author

@jbeilstenedmands excellent spot thank you - this needs some attention upstream of this then

Ah, wait - this is in the matching process, so what that means (I guess) is we need a smarter way of matching reflections which are found in both halves of the data set - if I don't do this (as was before) you end up with two "half-observations" which mess things up...

@jbeilstenedmands
Copy link
Contributor

I think the matching is correct, i.e. you have found the two halves of same spot, but one is misindexed, due to the difficulty in indexing along a certain direction for a thin slice of data? So does there need to be an extra step in this case to determine which index is correct and use that for both halves?

@graeme-winter
Copy link
Contributor Author

I think the matching is correct, i.e. you have found the two halves of same spot, but one is misindexed, due to the difficulty in indexing along a certain direction for a thin slice of data? So does there need to be an extra step in this case to determine which index is correct and use that for both halves?

Looking at these now and remembered that this data was artificially processed to give it 0.5° images which could have some interesting effects...

@graeme-winter
Copy link
Contributor Author

This fails going into block 16 (i.e. the 17th block) however the orientation matrix does not seem that different?

Grey-Area fp3 :( $ dials.compare_orientation_matrices integrate0015/indexed.expt integrate0016/indexed.expt 
The following parameters have been modified:

input {
  experiments = integrate0015/indexed.expt
  experiments = integrate0016/indexed.expt
}

Change of basis op: a,b,c
Rotation matrix to transform crystal 1 to crystal 2:
{{1.000, 0.001, -0.000},
 {-0.001, 1.000, 0.000},
 {0.000, -0.000, 1.000}}
Rotation of -0.033 degrees about axis (0.127, 0.299, 0.946)

fp3/tools.py Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants