No progress in struct.run #584

Zombiekotze · 2024-11-29T07:21:42Z

Hello,

I have another problem, maybe it's very easy to solve. After running the ipyrad pipeline, i tried to run the structure analysis. Everything seems to work without error messages, but after struct.run nothing happens for days.
When running the structure pipeline i needed to restrict the number of used CPU to 10. With more CPUs there was also no progress. Is there any option to do this for the structure toolkit? Or are there other solution I could try. Maybe I used wrong parameters?!

Best wishes

Robert

>>> import ipyrad.analysis as ipa
>>> import toyplot
>>> data = '/media/workstation/4D1D8B8A4D2AF398/Projects/maturna/E_maturna_projekt/Structure/maturna2.snps.hdf5'
>>> imap = {
... "saxony": ["A11", "A12", "A29", "A30", "A27", "A28", "A33", "A34", "A37", "A38", "A39", "A47", "A48", "A45", "A46", "A13", "A14", "A03", "A04", "A40", "A41", "A42", "A49", "A50", "A71", "A72", "A69", "A70", "A09", "A10", "A53", "A54", "A65", "A66", "A63", "A64", "A59", "A60", "A73"],
... "saxony-anhalt": ["A07", "A08", "A21", "A22", "A23", "A24", "A17", "A18", "A67", "A68", "A31", "A32", "A51",  "A52", "A05", "A06", "A25", "A26", "A19", "A20", "A35", "A36", "A55", "A56", "A61", "A62",  "A57", "A58", "A15", "A16"],
... }
>>> minmap = {i: 0.5 for i in imap}
>>> struct = ipa.structure(
...     name="test",
...     data=data,
...     imap=imap,
...     minmap=minmap,
...     mincov=0.9,
... )
Samples: 69
Sites before filtering: 171460
Filtered (indels): 8619
Filtered (bi-allel): 1962
Filtered (mincov): 170802
Filtered (minmap): 157619
Filtered (subsample invariant): 24
Filtered (minor allele frequency): 0
Filtered (combined): 170873
Sites after filtering: 606
Sites containing missing values: 593 (97.85%)
Missing values in SNP matrix: 2380 (5.69%)
SNPs (total): 606
SNPs (unlinked): 159
>>> struct.mainparams.burnin = 5000
>>> struct.mainparams.numreps = 10000
>>> struct.run(nreps=3, kpop=[2, 3, 4, 5], auto=True)

The text was updated successfully, but these errors were encountered:

isaacovercast · 2024-11-29T14:43:34Z

Hello,
Thanks for sharing all the notebook code you're using, it's helpful to see what you're doing. What I see here is that by setting auto=True in the struct.run call you are actually asking it to automatically launch an ipyparallel cluster using all cores (the default). In some cases this can cause the process to hang, because the timing can sometimes be off when using all cores and it can deadlock. The solution to this is to start an ipyparallel cluster by hand, and then pass in the cluster to the struct.run call, like this:

On the command line on the machine you intend to run structure on, open a terminal and activate the ipyrad environment, then launch an ipcluster instance:

ipcluster start --n=10 --cluster-id='structure' --daemonize

Then inside your notebook, at the top when you load modules do this:

import ipyparallel as ipp

ipyclient = ipp.Client(cluster_id="structure")

In a new cell you can verify it worked by asking how many engines the ipyclient has (this should say ten, and if it doesn't run it again because sometimes it takes a second for all the engines to launch):

len(ipyclient)

Then when you call struct.run, modify the call to tell structure to use the cluster you just launched, like this:

struct.run(nreps=3, kpop=[2,3,4,5], ipyclient=ipyclient

Give that a shot and let me know if it doesn't fix it.

Good luck!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

No progress in struct.run #584

No progress in struct.run #584

Zombiekotze commented Nov 29, 2024

isaacovercast commented Nov 29, 2024

No progress in struct.run #584

No progress in struct.run #584

Comments

Zombiekotze commented Nov 29, 2024

isaacovercast commented Nov 29, 2024