Compress final output files #139

klamike · 2024-10-24T20:02:07Z

Uses the deflate/zlib integration in HDF5.jl to compress the datasets in the final output files. The highest compression level is enabled by default.
Uses gzip compression via OPFGenerator.save_json by default for the reference case.json.gz file.
Run some benchmarks with various compression options

mtanneau · 2024-10-24T20:21:17Z

Do you have a sense of how much space we save with this?

codecov · 2024-10-24T20:23:39Z

Codecov Report

All modified and coverable lines are covered by tests ✅

klamike · 2024-10-25T16:18:11Z

Unfortunately the current approach does not work for the Vector{String} datasets (termination/result status codes). Since these are really enums we can store them as integers instead, then compress works well. We can store the mapping from Integer(instance) to String(instance) in the HDF5 dataset's attributes for readers to be able to convert back even if MOI changes the mapping. We also currently store the formulation name for each sample, which can be dropped.

Storing Vector{Enum} feels like something HDF5 would support natively, so I looked into it... there does seem to be a relevant datatype in HDF5 itself but neither the Julia nor Python interfaces have nice support for it.

With the enum -> Integer change and level 9 compression, an 89_pegase dataset gets about 30% smaller.

mtanneau · 2024-10-25T18:15:48Z

Unfortunately the current approach does not work for the Vector{String} datasets (termination/result status codes).

Is it (technically) possible to compress only the numerical datasets? e.g., doing a an eltype check and compressing only if it's a numerical value.

We can store the mapping from Integer(instance) to String(instance) in the HDF5 dataset's attributes for readers to be able to convert back even if MOI changes the mapping.

Not against storing the integer codes instead of the Strings. It will also save a (tiny) bit of space.
We can ask the JuMP devs whether changing the integer codes of an enum would be considered a breaking change.

With the enum -> Integer change and level 9 compression, an 89_pegase dataset gets about 30% smaller.

That's not bad! For the record: this 30% is to be compared (/combined) with merging some fields (e.g. merging duals of lower/upper bound constraints).

compress final output files

b34e3d2

klamike requested a review from mtanneau October 24, 2024 20:02

gzip ref json by default

e2b0ea7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compress final output files #139

Compress final output files #139

klamike commented Oct 24, 2024 •

edited

Loading

mtanneau commented Oct 24, 2024

codecov bot commented Oct 24, 2024

klamike commented Oct 25, 2024

mtanneau commented Oct 25, 2024

Compress final output files #139

Are you sure you want to change the base?

Compress final output files #139

Conversation

klamike commented Oct 24, 2024 • edited Loading

mtanneau commented Oct 24, 2024

codecov bot commented Oct 24, 2024

Codecov Report

klamike commented Oct 25, 2024

mtanneau commented Oct 25, 2024

klamike commented Oct 24, 2024 •

edited

Loading