Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Parquet reading and writing for efficient storage of PSM lists #81

Merged
merged 6 commits into from
May 1, 2024

Conversation

RalfG
Copy link
Member

@RalfG RalfG commented May 1, 2024

Added

  • io: Read and write support for writing PSMs to Apache Parquet for efficient storage of PSM lists.
  • io.sage: Support for Sage results in Parquet format (new SageParquetReader, renamed SageReader to SageTSVReader).

Changed

  • Upgrade Pydantic dependency to v2. The PSM spectrum_id field is now always coerced to a string.
  • io.proteoscape: Use PyArrow to iteratively read from Parquet instead of first reading an entire dataframe with Pandas.
  • io.sage: Update compatibility to Sage v0.14

Copy link

codecov bot commented May 1, 2024

Codecov Report

Attention: Patch coverage is 83.96226% with 17 lines in your changes are missing coverage. Please review.

Project coverage is 64.19%. Comparing base (2973b67) to head (6a8b51f).

Files Patch % Lines
psm_utils/io/proteoscape.py 37.50% 10 Missing ⚠️
psm_utils/io/parquet.py 94.64% 3 Missing ⚠️
psm_utils/io/__init__.py 50.00% 2 Missing ⚠️
psm_utils/io/sage.py 92.30% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main      #81      +/-   ##
==========================================
+ Coverage   63.27%   64.19%   +0.91%     
==========================================
  Files          25       26       +1     
  Lines        2421     2497      +76     
==========================================
+ Hits         1532     1603      +71     
- Misses        889      894       +5     
Flag Coverage Δ
unittests 64.19% <83.96%> (+0.91%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@RalfG RalfG merged commit dfada94 into main May 1, 2024
7 checks passed
@RalfG RalfG deleted the add-parquet branch May 1, 2024 12:40
@RalfG RalfG added this to the v0.9.0 milestone May 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant