Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pure dec enc with gz #143

Merged
merged 26 commits into from
Aug 4, 2024
Merged

Pure dec enc with gz #143

merged 26 commits into from
Aug 4, 2024

Conversation

dinosaure
Copy link
Member

@dinosaure dinosaure commented Feb 7, 2024

/cc @hannesm on top of #140

This is an attempt to implement Tar_gz without requiring I/O. The idea is to describe fold with a GADT instead of using I/O functions directly. Tar_gz then maps these values to introduce the compression level. Finally, Tar_unix and Tar_lwt_unix reuse this GADT with a specific evaluation depending on the backend.

However, as far as Tar_lwt_unix is concerned, the user function cannot currently return an Lwt value. Several solutions exist. The one I've currently used is like the one for awa-ssh, a list of threads that grows (and that we have to resolve) according to the fold computation. Another solution is to make GADT more expressive with the Yallop trick, as I've done here.

lib/tar.ml Outdated
| Really_read : int -> (string, 'err) t
| Read : int -> (string, 'err) t
| Seek : int -> (int, 'err) t
| Bind : ('a, 'err) t * ('a -> ('b, 'err) t) -> ('b, 'err) t
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For Bind (x, f) the values x and f share the same 'err. This is likely just fine, but it could maybe be generalized (though I don't think it's worth it going through those hoops).

else until_full_or_end (Gz.Inf.flush state.gz) (res, len, Bytes.length res - len)

let really_read_through_gz decoder len =
let open Tar in
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we locally open Tar to get let* only? Then maybe

Suggested change
let open Tar in
let ( let* ) = Tar.( let* ) in

lib/tar_gz.ml Outdated Show resolved Hide resolved
lib/tar_gz.ml Outdated Show resolved Hide resolved
reynir and others added 10 commits February 7, 2024 14:23
We can list .tar.gz archives that consists of directories and empty
files \o/ files with content is not possible /o\
The position is not always possible to keep track of, and is not
very useful to begin with.

The documentation better explains the lightweight higher kinded types
trick.

Co-authored-by: Calascibetta Romain <[email protected]>
Co-authored-by: Reynir Björnsson <[email protected]>
Co-authored-by: Calascibetta Romain <[email protected]>
Co-authored-by: Reynir Björnsson <[email protected]>
@dinosaure dinosaure mentioned this pull request May 15, 2024
@dinosaure dinosaure merged commit 890c1fe into main Aug 4, 2024
0 of 3 checks passed
@hannesm hannesm deleted the pure-dec-enc-with-gz branch August 5, 2024 06:33
dinosaure added a commit to dinosaure/opam-repository that referenced this pull request Aug 5, 2024
CHANGES:

- Fix `Header.marshal` and the checksum and the length (@reynir, mirage/ocaml-tar#145)
- Delete a mutable field about the level into the header (@hannesm, mirage/ocaml-tar#141)
- **BREAKING**: de-functorize the package (@hannesm, @reynir, @dinosaure, mirage/ocaml-tar#140, mirage/ocaml-tar#143, mirage/ocaml-tar#146)

  These PRs attempt to de-functorize `Tar` so that users can implement I/O
  themselves, using `Tar`'s own element serialization/deserialization functions
  to take advantage of read/write methods. This avoids imposing on the user the
  implementation of a module that is too rigid in his/her case (which could have
  performance implications).

  `Tar` offers functions for serializing/deserializing tar-specific elements
  from `string`. It is then up to the user to know how to obtain or write these
  `strings`.

  To this, these PRs add "logics" (see `'a Tar.t`) requiring read and/or write
  implementations and describing how to extract all entries from a tar file or
  how to write a tar file according to a "dispenser" (like `Seq.to_dispenser`)
  of entries.

  These logics do not depend on a particular "scheduler", and these PRs propose
  a derivation of these logics with `tar-unix`, `tar-eio` and `tar-mirage`.
  These latter derivations mean that the API for these packages has only been
  extended, and there are no breaking changes as such.

  These logics also make it easy to offer a compression/decompression layer with
  `decompress`, so you can easily manipulate and/or create a .tar.gz file.
avsm pushed a commit to avsm/opam-repository that referenced this pull request Sep 5, 2024
CHANGES:

- Fix `Header.marshal` and the checksum and the length (@reynir, mirage/ocaml-tar#145)
- Delete a mutable field about the level into the header (@hannesm, mirage/ocaml-tar#141)
- **BREAKING**: de-functorize the package (@hannesm, @reynir, @dinosaure, mirage/ocaml-tar#140, mirage/ocaml-tar#143, mirage/ocaml-tar#146)

  These PRs attempt to de-functorize `Tar` so that users can implement I/O
  themselves, using `Tar`'s own element serialization/deserialization functions
  to take advantage of read/write methods. This avoids imposing on the user the
  implementation of a module that is too rigid in his/her case (which could have
  performance implications).

  `Tar` offers functions for serializing/deserializing tar-specific elements
  from `string`. It is then up to the user to know how to obtain or write these
  `strings`.

  To this, these PRs add "logics" (see `'a Tar.t`) requiring read and/or write
  implementations and describing how to extract all entries from a tar file or
  how to write a tar file according to a "dispenser" (like `Seq.to_dispenser`)
  of entries.

  These logics do not depend on a particular "scheduler", and these PRs propose
  a derivation of these logics with `tar-unix`, `tar-eio` and `tar-mirage`.
  These latter derivations mean that the API for these packages has only been
  extended, and there are no breaking changes as such.

  These logics also make it easy to offer a compression/decompression layer with
  `decompress`, so you can easily manipulate and/or create a .tar.gz file.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants