Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No progress in struct.run #584

Open
Zombiekotze opened this issue Nov 29, 2024 · 1 comment
Open

No progress in struct.run #584

Zombiekotze opened this issue Nov 29, 2024 · 1 comment

Comments

@Zombiekotze
Copy link

Hello,

I have another problem, maybe it's very easy to solve. After running the ipyrad pipeline, i tried to run the structure analysis. Everything seems to work without error messages, but after struct.run nothing happens for days.
When running the structure pipeline i needed to restrict the number of used CPU to 10. With more CPUs there was also no progress. Is there any option to do this for the structure toolkit? Or are there other solution I could try. Maybe I used wrong parameters?!

Best wishes

Robert

>>> import ipyrad.analysis as ipa
>>> import toyplot
>>> data = '/media/workstation/4D1D8B8A4D2AF398/Projects/maturna/E_maturna_projekt/Structure/maturna2.snps.hdf5'
>>> imap = {
... "saxony": ["A11", "A12", "A29", "A30", "A27", "A28", "A33", "A34", "A37", "A38", "A39", "A47", "A48", "A45", "A46", "A13", "A14", "A03", "A04", "A40", "A41", "A42", "A49", "A50", "A71", "A72", "A69", "A70", "A09", "A10", "A53", "A54", "A65", "A66", "A63", "A64", "A59", "A60", "A73"],
... "saxony-anhalt": ["A07", "A08", "A21", "A22", "A23", "A24", "A17", "A18", "A67", "A68", "A31", "A32", "A51",  "A52", "A05", "A06", "A25", "A26", "A19", "A20", "A35", "A36", "A55", "A56", "A61", "A62",  "A57", "A58", "A15", "A16"],
... }
>>> minmap = {i: 0.5 for i in imap}
>>> struct = ipa.structure(
...     name="test",
...     data=data,
...     imap=imap,
...     minmap=minmap,
...     mincov=0.9,
... )
Samples: 69
Sites before filtering: 171460
Filtered (indels): 8619
Filtered (bi-allel): 1962
Filtered (mincov): 170802
Filtered (minmap): 157619
Filtered (subsample invariant): 24
Filtered (minor allele frequency): 0
Filtered (combined): 170873
Sites after filtering: 606
Sites containing missing values: 593 (97.85%)
Missing values in SNP matrix: 2380 (5.69%)
SNPs (total): 606
SNPs (unlinked): 159
>>> struct.mainparams.burnin = 5000
>>> struct.mainparams.numreps = 10000
>>> struct.run(nreps=3, kpop=[2, 3, 4, 5], auto=True)
@isaacovercast
Copy link
Collaborator

Hello,
Thanks for sharing all the notebook code you're using, it's helpful to see what you're doing. What I see here is that by setting auto=True in the struct.run call you are actually asking it to automatically launch an ipyparallel cluster using all cores (the default). In some cases this can cause the process to hang, because the timing can sometimes be off when using all cores and it can deadlock. The solution to this is to start an ipyparallel cluster by hand, and then pass in the cluster to the struct.run call, like this:

On the command line on the machine you intend to run structure on, open a terminal and activate the ipyrad environment, then launch an ipcluster instance:

ipcluster start --n=10 --cluster-id='structure' --daemonize

Then inside your notebook, at the top when you load modules do this:

import ipyparallel as ipp

ipyclient = ipp.Client(cluster_id="structure")

In a new cell you can verify it worked by asking how many engines the ipyclient has (this should say ten, and if it doesn't run it again because sometimes it takes a second for all the engines to launch):

len(ipyclient)

Then when you call struct.run, modify the call to tell structure to use the cluster you just launched, like this:

struct.run(nreps=3, kpop=[2,3,4,5], ipyclient=ipyclient

Give that a shot and let me know if it doesn't fix it.

Good luck!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants