Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is there a preferred way to store drfps? #1

Open
cthoyt opened this issue Aug 26, 2021 · 1 comment
Open

Is there a preferred way to store drfps? #1

cthoyt opened this issue Aug 26, 2021 · 1 comment

Comments

@cthoyt
Copy link

cthoyt commented Aug 26, 2021

I've just spent some time to automate generating drfps from the Rhea database (see here) and would like to save a pre-cached version for later use. The word2vec world has a specific format for its embedding vectors - do you have a standard way for storing a set of drfps where each has a string key associated?

I suppose I could just store a pickle of my Pandas dataframe, or some other serialization format, but that seems a bit space inefficient

@daenuprobst
Copy link
Member

Hey @cthoyt, sorry I completely missed this issue. I guess you have solved it but if someone has the same question issue, I usually go with gzip and pickle. Numpy's serialization might be an option as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants