Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Padelpy GPU version? #36

Open
AnjaliSetiya opened this issue Apr 26, 2022 · 4 comments
Open

Padelpy GPU version? #36

AnjaliSetiya opened this issue Apr 26, 2022 · 4 comments
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@AnjaliSetiya
Copy link

Hello
I want to know if the library is compatible on GPU? The PadelPy library on CPU is quite slow to generate fingerprints of around ~10000 molecules it takes me around 3-4 hours or even more sometimes. if GPU version isn't available, how can the process be speed up?
Please let me know
Thanks
Anjali

@tjkessler
Copy link
Member

Hi @AnjaliSetiya,

The source code for PaDEL-Descriptor, while open source, is written in Java which, I will not lie, is not a language I have much experience with.

I'm going to leave this issue open, hopefully someone with more familiarity with Java and/or PaDEL-Descriptor's source code can chime in (and let us know if this is possible!).

Best,
Travis

@tjkessler tjkessler added enhancement New feature or request help wanted Extra attention is needed labels Sep 8, 2022
@JacksonBurns
Copy link
Contributor

I don't about GPU programming to accelerate this, which I think would need to be done upstream in the actual PaDEL-Descriptor source code, but what could be done here is using Python's multiprocessing to divide the lists of molecules into as many processes as possible. It won't get anywhere near the speedup of a true GPU implementation of the actual fingerprint calculation algorithm, but it would hopefully cut execution times down quite substantially -- there will be very little communication overhead and I expect that speedup should scale linearly with the number of processes.

Please let me know if this is of any interest and I can open a PR @tjkessler @AnjaliSetiya

@AnjaliSetiya
Copy link
Author

Hi @JacksonBurns, Please let me know what contributes for a PR.

@JacksonBurns
Copy link
Contributor

@AnjaliSetiya after further investigation I realized that padelpy actually has a passthrough to PaDel that takes advantage of multiprocessing. This should buy you some huge speedups if you aren't doing it already. See example:

This code snippet takes about 3.5 minutes to run:

smiles = ['C'*50]*100

from padelpy import from_smiles

for smi in smiles:
    from_smiles(smi)

whereas this takes only 11 seconds:

smiles = ['C'*50]*100

from padelpy import from_smiles

from_smiles(smiles)

As far as a GPU version goes, I'm not sure if that's really possible. I can't even find the source code to begin with, but on top of that the calculation of descriptors is a lot of short, 'bursty' calculations that probably won't benefit much. You could also consider looking at this reimplementation that seems to be much faster. Another compelling option would be to just use PaDel directly, rather than through this Python wrapper, and save the output file to later be read into Python.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

3 participants