-
Notifications
You must be signed in to change notification settings - Fork 703
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added class gzFileLoader to read models in gzipped files #1403
base: master
Are you sure you want to change the base?
Conversation
Thanks for the contribution! For a while now DyNet has not relied on boost for any of its core functionality, as boost has been a source of compile problems, etc. Because the DyNet model format is actually quite simple, I think it might just be better to create a binary file format if you want faster saving/loading. |
I saw that it already depends on boost (mp.h uses boost algorithm and boost-interprocess), so I though one more library would not make a difference. My concern was not on faster loading, but on saving disk space and bandwidth (which is relevant when you want to distribute the models inside some other software, and don't want to lose users for being too space-greedy ;) In any case, I could make this feature conditional, so it is only built if "ENABLE_BOOST" is set (the same that happens now with mp.cc). Would that suit you? |
Hi @lluisp thanks for the pull request! Sorry for the late reply. I think your suggestion of only building this if ENABLE_BOOST is set is a good compromise.This might be related to the CI errors as well. |
great! It is already fixed in my fork. |
Cool, I fixed that. It is compiling now, let's see how it goes. |
It seems that appveyor can not find boost::iostreams yet. |
Damn this is annoying. My last bet is to put the |
El 17/09/18 a les 16:59, Paul Michel ha escrit:
Damn this is annoying. My last bet is to put the |- cmd: set
PATH=%BOOST_LIBRARYDIR%;%PATH%| line right after the |build:| at line
48. If you don't have time to play around I can probably do it if you
give me push access to this branch:
https://help.github.com/articles/committing-changes-to-a-pull-request-branch-created-from-a-fork/
That did not work, I got a syntax error. It seems it does not like
the "- cmd: ..." inside "build:"
Error parsing appveyor.yml: (Line: 51, Col: 3, Idx: 1806) - (Line: 51,
Col: 3, Idx: 1806): While parsing a block collection, did not find
expected '-' indicator.
May it be that the problem is not the path, but that iostreams is not
installed?
I gave you push acces to my fork, I hope that is enough.
|
OK thanks I will take a look |
You may or may not want to look at https://github.com/kwalcock/dynet/blob/kwalcock-fatdynet/contrib/clulab/src/main/cpp/clulab-zip.cc. This implementation is based on zlib. There is additional access to the functionality through https://github.com/clulab/fatdynet. |
Cool! |
I don't know about master potential. That's probably up to clab and it would take some coordination to get it integrated completely. We've been using the code for about a year and a half in order to access the DyNet models through jar files and it seems to work. |
I added to io.h/io.cc a class gzFileLoader that behaves as textFileLoader but on gzipped model files.
The class uses boost::iostreams
It is not possible to perform tellg and seekg on gziped files, so I had to create a new class (that just reads and ignores the non-relevant sections).
If using tellg and seekg is not a must, both classes could be merged in a single one, since the only difference would be how the stream is opened. (I understand that skipping all the non-needed sections is slower, but loading a model is slower anyway so maybe it is not a big deal).
A single class could just open the stream depending on the file type, and then read the model transparently, regardless of whether it is gzipped or not. (or the derived classes could just open the stream, and all the loading work could be done in the parent class, avoindin code duplication)
Also, since boost::iostreams has other filters apart from gzip, this should allow to easily support other compression formats for model files.
The same idea could be used to textModelSaver to save directly in compressed formats.
Let me know if you are interested in developing this further, and how.
I'll be glad to contribute.