Faster parallel processing pipeline #5

graeme-winter · 2020-06-29T14:03:23Z

WIP

Parallel processing pipeline - making a PR so work visible.

Critical to get the spots to match up correctly in reciprocal space; once again shows that we have some broken data models here. Though could I have used scan slicing for this?

faster parallel processing pipeline

Also fix a bug where total range does not divide nicely into 5° chunks (e.g. 0.15° images)

Does involve some shuffling around between computer numbers and people numbers

Tool to extract a subset of a phil scope which has been parsed back to text

Includes scopes for every DIALS program which is used. For each step will extract relevant PHIL parameters to a file and pass on to the program in question if not empty.

graeme-winter · 2020-06-30T14:48:15Z

OK, think this is in a state now where it's kinda sorta ready for wider input: two outstanding issues are:

needing to usefully "sum" the integrated data back into a single scan
running the integration blocks in parallel rather than as they are here sequential

I do however believe that this is a useful starting point and is surprisingly un-gorilla-like, in that there are even a handful of tests.

Am approaching this as a two-stage PR: part 1 is fixing up the code for squash merge as a new feature here then part 2 is grabbing those diffs across to DIALS proper for inclusion in a future release once we've shaken it down in real life.

Merge reflection files and experiments manually, with local implementation

graeme-winter · 2020-06-30T16:07:36Z

Would particularly welcome opinions on tools.py::combine_{experiments,reflections}

Add future parallels wrapper to run each processing task on a given number of cores for a given number of workers - needs cluster addition but gives a framework for parallel operation

Anthchirp · 2020-09-07T08:10:48Z

fp3/util.py

+        pass
+
+    return number_of_processors(return_value_if_unknown=-1)


since we have psutil:

Suggested change

pass

return number_of_processors(return_value_if_unknown=-1)

return len(psutil.Process().cpu_affinity())

gets the number of CPUs that are available to the process, rather than the number of CPUs in the system. This is the Windows-compatible version of len(os.sched_getaffinity(0))

NSLOTS still takes priority though.

Computer says no:

>>> import psutil >>> psutil.Process().cpu_affinity() Traceback (most recent call last): File "<stdin>", line 1, in <module> AttributeError: 'Process' object has no attribute 'cpu_affinity'

dials.util.mp.available_cores should handle this correctly

At the risk of repeating myself, this is not in a release

>>> dials.util.mp.available_cores Traceback (most recent call last): File "<stdin>", line 1, in <module> AttributeError: module 'dials.util.mp' has no attribute 'available_cores'

#5 (comment) for reference

I did not see the earlier comment because it was hidden by being resolved.

This was 25 days ago. These days you should use dials.util.mp.available_cores() from dials/dials#1430 instead of your nproc() implementation

Anthchirp · 2020-10-02T10:00:14Z

Sorry, was not clear to me that fp3 is already set against a DIALS release, given that it’s in a PR into a scratch space and not laid out as a package yet.

graeme-winter · 2020-10-02T10:40:15Z

Sorry, was not clear to me that fp3 is already set against a DIALS release, given that it’s in a PR into a scratch space and not laid out as a package yet.

Evidently it is not yet a package etc. however we did discuss the other day making it one, so I was kinda trying to avoid picking up too much baggage from master. If we are copying the new bits over to 3.1 series then valid, I guess.

This reverts commit 299896c. Meant to put on master :-/

Add a check at the start that only one sweep has been passed as input and a collection of FIXME for coping with > 1 sweep / lattice in the longer term

This reverts commit 23d3231.

graeme-winter · 2020-11-03T07:54:57Z

TODO: modify to use DIALS 3.2 feature set now released; reformat as package

@jbeilstenedmands I have asked for your review for "combine_reflections" in fp3/tools.py

@Anthchirp would welcome input on how to close this off as a Python package now I guess... I think it is close to OK for a 0th order implementation (though obviously scope for future improvement)

graeme-winter · 2020-11-03T08:59:29Z

TODO: remove duff reflections before manipulations; use vector expressions for calculating weighted sums

Anthchirp · 2020-11-03T14:17:25Z

do you want to keep the development history in $newpackage?

graeme-winter · 2020-11-03T15:14:12Z

do you want to keep the development history in $newpackage?

Nope, not a chance. Without a decent rewrite the history is close to:

jbeilstenedmands · 2020-11-03T15:50:41Z

I tried out fp3 on the beta lactamase dataset, however I got an assertion error as part of the combine step:
assert d0["miller_index"][i] == d1["miller_index"][j]
Printing out the details of the assertion fail and allowing to continue, it seems that there are cases where the spots that are split across a processing boundary can be misindexed. In this case they are off by 1 in l for some low values of l, the spot positions indicate this is the same spot split across a processing boundary:

Mismatched indices: (-1, 26, 1) (-1, 26, 2)  Spot positions(px): (2113.23,1888.61,159.50), (2113.77,1889.74,160.50)
Mismatched indices: (15, -18, -1) (15, -18, -2)  Spot positions(px): (1197.50,424.83,269.50), (1196.50,420.00,270.50)
Mismatched indices: (19, -13, -1) (19, -13, -2)  Spot positions(px): (1460.98,453.74,269.50), (1460.77,454.77,270.50)
Mismatched indices: (15, -25, 2) (15, -25, 1)  Spot positions(px): (945.68,112.66,289.50), (945.86,112.69,290.50)
Mismatched indices: (7, -29, -3) (7, -29, -4)  Spot positions(px): (464.87,214.76,289.50), (462.00,212.50,290.50)
Mismatched indices: (9, -31, -1) (9, -31, -2)  Spot positions(px): (381.01,3.73,299.50), (380.17,3.60,300.50)
Mismatched indices: (10, -30, 1) (10, -30, 0)  Spot positions(px): (501.12,28.31,299.50), (500.00,28.86,300.50)
Mismatched indices: (7, -33, 1) (7, -33, 0)  Spot positions(px): (99.94,2522.17,519.50), (97.36,2521.79,520.50)
Mismatched indices: (22, -18, -2) (22, -18, -3)  Spot positions(px): (1443.29,2312.19,539.50), (1441.99,2312.68,540.50)
Mismatched indices: (25, -10, 0) (25, -10, -1)  Spot positions(px): (1809.73,2122.88,539.50), (1808.48,2122.59,540.50)
Mismatched indices: (19, -21, -2) (19, -21, -3)  Spot positions(px): (1233.82,2312.61,539.50), (1233.33,2313.42,540.50)
Mismatched indices: (13, -22, 4) (13, -22, 3)  Spot positions(px): (1018.01,2122.39,559.50), (1016.85,2121.21,560.50)
Mismatched indices: (18, -22, 6) (18, -22, 5)  Spot positions(px): (1175.98,2334.86,559.50), (1176.00,2333.00,560.50)
Mismatched indices: (-16, 23, 1) (-16, 23, 2)  Spot positions(px): (1472.50,165.90,639.50), (1472.71,165.03,640.50)
Mismatched indices: (-24, 17, -2) (-24, 17, -1)  Spot positions(px): (978.33,89.52,649.50), (979.07,89.07,650.50)

Not sure at the moment how to address this but I'll have a think.

graeme-winter · 2020-11-03T15:52:46Z

@jbeilstenedmands excellent spot thank you - this needs some attention upstream of this then

Ah, wait - this is in the matching process, so what that means (I guess) is we need a smarter way of matching reflections which are found in both halves of the data set - if I don't do this (as was before) you end up with two "half-observations" which mess things up...

jbeilstenedmands · 2020-11-03T15:54:55Z

I think the matching is correct, i.e. you have found the two halves of same spot, but one is misindexed, due to the difficulty in indexing along a certain direction for a thin slice of data? So does there need to be an extra step in this case to determine which index is correct and use that for both halves?

graeme-winter · 2020-11-04T14:52:26Z

I think the matching is correct, i.e. you have found the two halves of same spot, but one is misindexed, due to the difficulty in indexing along a certain direction for a thin slice of data? So does there need to be an extra step in this case to determine which index is correct and use that for both halves?

Looking at these now and remembered that this data was artificially processed to give it 0.5° images which could have some interesting effects...

graeme-winter · 2020-11-04T15:18:00Z

This fails going into block 16 (i.e. the 17th block) however the orientation matrix does not seem that different?

Grey-Area fp3 :( $ dials.compare_orientation_matrices integrate0015/indexed.expt integrate0016/indexed.expt 
The following parameters have been modified:

input {
  experiments = integrate0015/indexed.expt
  experiments = integrate0016/indexed.expt
}

Change of basis op: a,b,c
Rotation matrix to transform crystal 1 to crystal 2:
{{1.000, 0.001, -0.000},
 {-0.001, 1.000, 0.000},
 {0.000, -0.000, 1.000}}
Rotation of -0.033 degrees about axis (0.127, 0.299, 0.946)

fp3/tools.py

Co-authored-by: jbeilstenedmands <[email protected]>

graeme-winter added 17 commits June 25, 2020 16:23

First cut at a start of a fast_dp with DIALS script

bd6bc14

Forgot critical parameter

5256407

Reassign oscillation range

486cae1

Critical to get the spots to match up correctly in reciprocal space; once again shows that we have some broken data models here. Though could I have used scan slicing for this?

Commentary

327e0eb

Incorporate symmetry, resolution estimation and scaling

3ea2ebe

Rename to fp3 -

cfdd721

faster parallel processing pipeline

First cut at tool docs

700711d

Rewrite

710b9cc

Derpee -> fp3

cff1a0c

Logic: if image[0] != 1

3d38dc1

Start factoring out

ce6c550

Also fix a bug where total range does not divide nicely into 5° chunks (e.g. 0.15° images)

Images for indexing

b10ed7a

Use index_blocks

c79390f

Does involve some shuffling around between computer numbers and people numbers

fp3 -> FP3 because class

cafa707

format_phil_include: workaround

ba96cde

Tool to extract a subset of a phil scope which has been parsed back to text

Allow passing PHIL overrides

c21a0b0

Includes scopes for every DIALS program which is used. For each step will extract relevant PHIL parameters to a file and pass on to the program in question if not empty.

Define work, duh

124b2a8

graeme-winter added 2 commits June 30, 2020 17:04

New implementation of combine

10bc773

Merge reflection files and experiments manually, with local implementation

Use local implementation of combine

6714c3c

graeme-winter added 8 commits July 6, 2020 14:47

Fix off-by-one; use scan slicing

e732087

+= boost_adaptbx.

51b5081

dials.resolutionizer -> dials.estimate_resolution

f6f299a

Parallelism

897d978

Add future parallels wrapper to run each processing task on a given number of cores for a given number of workers - needs cluster addition but gives a framework for parallel operation

WIP: create script to perform the processing for drmaa

0775262

unlink

92ff391

Run shell scripts - useful step along the way to executing on cluster

6a2bf25

Look for the environment script

e4c835f

Anthchirp reviewed Sep 7, 2020

View reviewed changes

graeme-winter added 2 commits October 2, 2020 10:36

Prime factor code for sharing out cores

a151b3e

If nproc and workers == 1 guess sensible division

b219fc0

graeme-winter added 12 commits October 3, 2020 10:11

Cope with data sets > 5000°

43ae020

Add to repo

299896c

Revert "Add to repo"

8368437

This reverts commit 299896c. Meant to put on master :-/

Name it: dials.fp3

4a75fba

Procrunner API change

ef4b08d

More than one experiment

3fa3fc2

Multi-sweeps...

e0165af

Add a check at the start that only one sweep has been passed as input and a collection of FIXME for coping with > 1 sweep / lattice in the longer term

Object files -> ignore

23d3231

Revert "Object files -> ignore"

cf866c7

This reverts commit 23d3231.

WIP: combine partials

365d39e

WIP continued: now more sensible

229ce6b

Remove old implementation

e013fad

graeme-winter requested a review from jbeilstenedmands November 3, 2020 07:53

jbeilstenedmands reviewed Nov 4, 2020

View reviewed changes

fp3/tools.py Outdated Show resolved Hide resolved

graeme-winter and others added 3 commits November 4, 2020 15:27

Remove reflections with -ve variance

e97dccb

Co-authored-by: jbeilstenedmands <[email protected]>

Merge branch 'master' into fp3

432f51b

API change

d2bda55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Faster parallel processing pipeline #5

Faster parallel processing pipeline #5

graeme-winter commented Jun 29, 2020

graeme-winter commented Jun 30, 2020 •

edited

Loading

graeme-winter commented Jun 30, 2020

Anthchirp Sep 7, 2020

graeme-winter Oct 2, 2020

dagewa Oct 2, 2020

graeme-winter Oct 2, 2020 •

edited

Loading

graeme-winter Oct 2, 2020

dagewa Oct 2, 2020

Anthchirp Oct 2, 2020 •

edited

Loading

Anthchirp commented Oct 2, 2020 via email •

edited

Loading

graeme-winter commented Oct 2, 2020

graeme-winter commented Nov 3, 2020

graeme-winter commented Nov 3, 2020

Anthchirp commented Nov 3, 2020

graeme-winter commented Nov 3, 2020

jbeilstenedmands commented Nov 3, 2020

graeme-winter commented Nov 3, 2020

jbeilstenedmands commented Nov 3, 2020

graeme-winter commented Nov 4, 2020

graeme-winter commented Nov 4, 2020

Faster parallel processing pipeline #5

Are you sure you want to change the base?

Faster parallel processing pipeline #5

Conversation

graeme-winter commented Jun 29, 2020

graeme-winter commented Jun 30, 2020 • edited Loading

graeme-winter commented Jun 30, 2020

Anthchirp Sep 7, 2020

Choose a reason for hiding this comment

graeme-winter Oct 2, 2020

Choose a reason for hiding this comment

dagewa Oct 2, 2020

Choose a reason for hiding this comment

graeme-winter Oct 2, 2020 • edited Loading

Choose a reason for hiding this comment

graeme-winter Oct 2, 2020

Choose a reason for hiding this comment

dagewa Oct 2, 2020

Choose a reason for hiding this comment

Anthchirp Oct 2, 2020 • edited Loading

Choose a reason for hiding this comment

Anthchirp commented Oct 2, 2020 via email • edited Loading

graeme-winter commented Oct 2, 2020

graeme-winter commented Nov 3, 2020

graeme-winter commented Nov 3, 2020

Anthchirp commented Nov 3, 2020

graeme-winter commented Nov 3, 2020

jbeilstenedmands commented Nov 3, 2020

graeme-winter commented Nov 3, 2020

jbeilstenedmands commented Nov 3, 2020

graeme-winter commented Nov 4, 2020

graeme-winter commented Nov 4, 2020

graeme-winter commented Jun 30, 2020 •

edited

Loading

graeme-winter Oct 2, 2020 •

edited

Loading

Anthchirp Oct 2, 2020 •

edited

Loading

Anthchirp commented Oct 2, 2020 via email •

edited

Loading