Added class gzFileLoader to read models in gzipped files #1403

lluisp · 2018-06-21T10:31:49Z

I added to io.h/io.cc a class gzFileLoader that behaves as textFileLoader but on gzipped model files.
The class uses boost::iostreams

It is not possible to perform tellg and seekg on gziped files, so I had to create a new class (that just reads and ignores the non-relevant sections).
If using tellg and seekg is not a must, both classes could be merged in a single one, since the only difference would be how the stream is opened. (I understand that skipping all the non-needed sections is slower, but loading a model is slower anyway so maybe it is not a big deal).
A single class could just open the stream depending on the file type, and then read the model transparently, regardless of whether it is gzipped or not. (or the derived classes could just open the stream, and all the loading work could be done in the parent class, avoindin code duplication)
Also, since boost::iostreams has other filters apart from gzip, this should allow to easily support other compression formats for model files.

The same idea could be used to textModelSaver to save directly in compressed formats.

Let me know if you are interested in developing this further, and how.
I'll be glad to contribute.

…::iostreams)

neubig · 2018-06-22T14:18:57Z

Thanks for the contribution! For a while now DyNet has not relied on boost for any of its core functionality, as boost has been a source of compile problems, etc.

Because the DyNet model format is actually quite simple, I think it might just be better to create a binary file format if you want faster saving/loading.

lluisp · 2018-06-22T14:54:18Z

I saw that it already depends on boost (mp.h uses boost algorithm and boost-interprocess), so I though one more library would not make a difference.

My concern was not on faster loading, but on saving disk space and bandwidth (which is relevant when you want to distribute the models inside some other software, and don't want to lose users for being too space-greedy ;)
In my experience, binary formats are only slightly faster, but can not be inspected, while gziped text files are a bit smaller, not much slower, and can be inspected with zless or the like.

In any case, I could make this feature conditional, so it is only built if "ENABLE_BOOST" is set (the same that happens now with mp.cc). Would that suit you?

…BOOST

pmichel31415 · 2018-09-07T14:11:50Z

Hi @lluisp thanks for the pull request! Sorry for the late reply. I think your suggestion of only building this if ENABLE_BOOST is set is a good compromise.This might be related to the CI errors as well.

lluisp · 2018-09-09T08:12:30Z

great! It is already fixed in my fork.
Yes, CI fails because boost_iostreams is not installed on the environment.

pmichel31415 · 2018-09-14T21:42:12Z

OK I think I figured out how this can be fixed:

For travis: add libboost-iostreams1.55-dev to this line
For appveyor (this was harder to spot): move (or copy?) this line up to after this line

Would you mind adding this? I think we'll be good to merge after that once the tests pass.

lluisp · 2018-09-15T00:41:42Z

Cool, I fixed that. It is compiling now, let's see how it goes.
Thanks!

lluisp · 2018-09-17T10:18:35Z

It seems that appveyor can not find boost::iostreams yet.

pmichel31415 · 2018-09-17T14:55:38Z

Damn this is annoying. My last bet is to put the - cmd: set PATH=%BOOST_LIBRARYDIR%;%PATH% line right after the build: at line 48. If you don't have time to play around I can probably do it if you give me push access to this branch: https://help.github.com/articles/committing-changes-to-a-pull-request-branch-created-from-a-fork/

lluisp · 2018-09-17T15:12:20Z

El 17/09/18 a les 16:59, Paul Michel ha escrit:

Damn this is annoying. My last bet is to put the |- cmd: set PATH=%BOOST_LIBRARYDIR%;%PATH%| line right after the |build:| at line 48. If you don't have time to play around I can probably do it if you give me push access to this branch: https://help.github.com/articles/committing-changes-to-a-pull-request-branch-created-from-a-fork/

That did not work, I got a syntax error. It seems it does not like the "- cmd: ..." inside "build:" Error parsing appveyor.yml: (Line: 51, Col: 3, Idx: 1806) - (Line: 51, Col: 3, Idx: 1806): While parsing a block collection, did not find expected '-' indicator. May it be that the problem is not the path, but that iostreams is not installed? I gave you push acces to my fork, I hope that is enough.

pmichel31415 · 2018-09-17T15:27:18Z

OK thanks I will take a look

kwalcock · 2021-01-27T20:37:22Z

You may or may not want to look at https://github.com/kwalcock/dynet/blob/kwalcock-fatdynet/contrib/clulab/src/main/cpp/clulab-zip.cc. This implementation is based on zlib. There is additional access to the functionality through https://github.com/clulab/fatdynet.

lluisp · 2021-01-28T07:48:04Z

Cool!
is it going to be in master at some moment ?

kwalcock · 2021-01-29T06:21:57Z

I don't know about master potential. That's probably up to clab and it would take some coordination to get it integrated completely. We've been using the code for about a year and a half in order to access the DyNet models through jar files and it seems to work.

lluisp added 3 commits June 21, 2018 12:21

Added class gzFileLoader to read models in gzipped files (using boost…

0673a64

…::iostreams)

fixed to compile in msvc

704ee1e

link with boost::iostreams

bf01d4a

compilation of boost depending features is now conditioned to ENABLE_…

7bacc73

…BOOST

test scripts fixed

0de9492

appveyor fixing

2788900

lluisp added 2 commits September 17, 2018 17:01

trying again for appveyor

f1337fe

trying again for appveyor

1d07eef

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added class gzFileLoader to read models in gzipped files #1403

Added class gzFileLoader to read models in gzipped files #1403

lluisp commented Jun 21, 2018

neubig commented Jun 22, 2018

lluisp commented Jun 22, 2018

pmichel31415 commented Sep 7, 2018

lluisp commented Sep 9, 2018

pmichel31415 commented Sep 14, 2018

lluisp commented Sep 15, 2018

lluisp commented Sep 17, 2018

pmichel31415 commented Sep 17, 2018

lluisp commented Sep 17, 2018 via email

pmichel31415 commented Sep 17, 2018

kwalcock commented Jan 27, 2021

lluisp commented Jan 28, 2021

kwalcock commented Jan 29, 2021

Added class gzFileLoader to read models in gzipped files #1403

Are you sure you want to change the base?

Added class gzFileLoader to read models in gzipped files #1403

Conversation

lluisp commented Jun 21, 2018

neubig commented Jun 22, 2018

lluisp commented Jun 22, 2018

pmichel31415 commented Sep 7, 2018

lluisp commented Sep 9, 2018

pmichel31415 commented Sep 14, 2018

lluisp commented Sep 15, 2018

lluisp commented Sep 17, 2018

pmichel31415 commented Sep 17, 2018

lluisp commented Sep 17, 2018 via email

pmichel31415 commented Sep 17, 2018

kwalcock commented Jan 27, 2021

lluisp commented Jan 28, 2021

kwalcock commented Jan 29, 2021