Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with compute gene-gene distances #4

Open
chen-peng-1874 opened this issue Apr 19, 2024 · 15 comments
Open

Issue with compute gene-gene distances #4

chen-peng-1874 opened this issue Apr 19, 2024 · 15 comments
Assignees

Comments

@chen-peng-1874
Copy link

chen-peng-1874 commented Apr 19, 2024

I tried to set up a virtualenv using [reticulate], however, I can not find the module.
Here is the output:
> cal_ot_mat_from_numpy <- reticulate::import('gene_trajectory.compute_gene_distance_cmd')$cal_ot_mat_from_numpy
Error: C:/Users/Public/miniconda3/python310.dll - The specified module could not be found.

Should I use Python instead R?

@Fufu-Hu
Copy link

Fufu-Hu commented Apr 19, 2024

Do you install the gene-trajectory module?

You can try below code in R.
reticulate::py_install("gene-trajectory")

@chen-peng-1874
Copy link
Author

Do you install the gene-trajectory module?

You can try below code in R. reticulate::py_install("gene-trajectory")

Yes, I installed it. But the error still present.
I am wondering if there's something wrong with the python.dll.
Although I do have the python310.dll.

@fra-pcmgf
Copy link
Collaborator

Hi, I'm not sure what the issue is, but can you try to run reticulate::py_list_packages() and check the output? You should have a line like

14     gene-trajectory    1.0.0     gene-trajectory=1.0.0        pypi

If gene trajectory is not there, can you try to install it as reticulate::py_install("gene-trajectory", pip = TRUE)? The pip=TRUE option may be needed since we do not have a conda package for gene-trajectory.

@DAOl44732
Copy link

data_S <- GeneTrajectory::RunDM(data_S)
cell.graph.dist <- GetGraphDistance(data_S, K = 10)
cg_output <- CoarseGrain(data_S, cell.graph.dist, genes, N = 1000)
Hello,Because the data is too big to run in R can't the gene distance above be run in python?

@fra-pcmgf
Copy link
Collaborator

yes, it's possible to export the data to a folder and run using Python as described in #3 (comment)

It may be also interesting to reduce the data size as explained in https://klugerlab.github.io/GeneTrajectory/articles/fast_computation.html

@DAOl44732
Copy link

data_S <- GeneTrajectory::RunDM(data_S)
Thank you for your reply.
But the problem occurs in this step, the error shows that the data is greater than 1000GiB, is there a good solution?

@fra-pcmgf
Copy link
Collaborator

I see, it's possible to do the whole analysis in Python (see e.g. https://github.com/KlugerLab/GeneTrajectory-python and https://genetrajectory-python.readthedocs.io/latest/notebooks/tutorial_mouse_dermal.html for a tutorial).

However, I am afraid you will encounter similar issues. Computing the diffusion map in RunDM creates a cell-cell distance matrix, which is quadratic in the number of cells and require a lot of memory and time to run.
How many cells do you have?

@DAOl44732
Copy link

I see,I will try python first.
However,we have about 340,000 cells.Do you have any more suggestions?

@fra-pcmgf
Copy link
Collaborator

fra-pcmgf commented May 6, 2024

I would try randomly subsampling cells to a smaller number (~10k should be manageable, but you can probably do more) or partition the data if you have some meaningful metadata. You can then run runDM and then follow the pipeline (which will use CoarseGrain to 1000-2000 or a procedure like https://klugerlab.github.io/GeneTrajectory/articles/fast_computation.html to further coarse-grain).

Python and R should have similar performances, so use the one you that makes the most sense.

It should be possible to subsample in a better way than random for large datasets, but we haven't investigated that yet. The method we use to coarse-grain cells CoarseGrain is based on having a cell-cell distance matrix. One could probably try a similar knn-based approach on a simpler gene embedding that could handle data of your size, but we haven't tested it and it's hard to predict if it would behave correctly.

@DAOl44732
Copy link

Thank you
I'll try your advice.

@DAOl44732
Copy link

Can I use this code( dm_res = palantir.utils.run_diffusion_maps(ad, n_components=5) )instead of (run_dm(adata) )to calculate the intercellular distance?

@fs-ravenbiosciences
Copy link

fs-ravenbiosciences commented May 6, 2024

I don't have experience with that package but the implementation looks similar. I think you can try it as alternative, just make sure to refer to the layer where the result is put (our package uses "X_dm", change it accordingly).

@OceanLyu
Copy link

Do you install the gene-trajectory module?
You can try below code in R. reticulate::py_install("gene-trajectory")

Yes, I installed it. But the error still present. I am wondering if there's something wrong with the python.dll. Although I do have the python310.dll.

Having the same issue, couldn`t find a solution yet.

@fra-pcmgf
Copy link
Collaborator

Hi @OceanLyu,

That seems to be a reticulate installation issue and I don't really have any experience with Windows.
I saw a similar issue on StackOverflow https://stackoverflow.com/questions/78571615/reticulate-python311-dll-not-found-but-definitely-exists, can you see if any of the suggestions on the thread help?

@OceanLyu
Copy link

OceanLyu commented Aug 16, 2024

Thanks for your timely reply!
I worked it out by installations on a Linux machine. I guess there`re some problems concering Windows usage.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants