Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Serialisation Support #31

Merged
merged 25 commits into from
Sep 23, 2024
Merged

Add Serialisation Support #31

merged 25 commits into from
Sep 23, 2024

Conversation

Adamtaranto
Copy link
Collaborator

Add serialisation support for KmerCountTable objects using Serde.

New methods:

  • .serialize_json() this will serialise a count table to JSON format, returns raw JSON string. Mostly useful for testing.
  • .save("counts.json.gz") this will convert a count table to JSON format, compress it with gzip, and write to a target file.
  • .load() is a static method that will load a count table saved in json.gz format and return a new KmerCountTable with the loaded properties. Raises a warning if Oxli version of the saved object is different to the current version.

Closes #25

@Adamtaranto Adamtaranto added the enhancement New feature or request label Sep 16, 2024
@Adamtaranto Adamtaranto requested a review from ctb September 16, 2024 09:51
Cargo.toml Outdated
@@ -14,3 +14,10 @@ sourmash = "0.15.1"
anyhow = "1.0.89"
log = "0.4.22"
env_logger = "0.11.5"

# For JSON serialization/deserialization
serde = { version = "1.0", features = ["derive"] }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why version 1? latest is 1.0.210. (presumably dependabot will upgrade, just curious.)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No good reason, just inherited from example code.

Cargo.toml Outdated
serde_json = "1.0"

# For Gzip compression/decompression
flate2 = "1.0"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggest niffler, which allows sniffing/auto-determination of file formats. but we can backport that in if we need.

(I'm using it over in #10)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will look into niffler 👍

Only ever expecting gzipped JSON as input.

@Adamtaranto Adamtaranto requested a review from ctb September 20, 2024 13:48
@Adamtaranto
Copy link
Collaborator Author

Hey @ctb, can you look over the niffler changes. It passes all my tests, but the code is a bit rough, especially with the error handling.

Any additional test case suggestion welcome.

Copy link
Contributor

@ctb ctb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me! great work! I updated a few things & added tests here: #49.

ctb and others added 2 commits September 23, 2024 12:47
* Do format updates on top of current commit rather than head. (#46)

* update serde_json version

* switch to using built-in temp_path

* write two additional tests

* Style fixes by Ruff

* clean up docstrings viz cargo doc --document-private-items

---------

Co-authored-by: Adam Taranto <[email protected]>
Co-authored-by: ctb <[email protected]>
@Adamtaranto Adamtaranto merged commit 7b49c89 into main Sep 23, 2024
15 checks passed
@Adamtaranto Adamtaranto deleted the dev_serialisation branch September 23, 2024 02:54
@ctb ctb mentioned this pull request Sep 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Serialising KmerCountTables
2 participants