Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PIPELINE-2297 Added process to extract reported vessel info on normalization #23

Conversation

rdgfuentes
Copy link
Contributor

https://globalfishingwatch.atlassian.net/browse/PIPELINE-2297

This PR includes several changes to the normalization process of VMS positions to extract relevant vessel data reported by the providers along with the positions records that are processed.

New normalization processing option to generate vessel_info

  • An additional argument --affected_entities was added to the normalization pipeline that provides the flexibility of stating which entities are produced (output) during the execution of this pipeline. Available options are: positions and vessel_info. Default: positions,vessel_info (both entities are produced and stored in the provided BQ tables.
  • Ensures the output table for vessel_info is created if it does not exists.
  • Clear the vessel_info records for the given date(s) and source_tenant.
  • The process will generate one record with vessel_info data for the period processed per ssvid when multiple records are found then most recent will take precedence.

Improvements to existing normalization

  • Added flag and updated_at fields to normalized position records.

Copy link

nx-cloud bot commented Nov 13, 2024

☁️ Nx Cloud Report

CI is running/has finished running commands for commit 43afc22. As they complete they will appear below. Click to see the status, the terminal output, and the build insights.

📂 See all runs for this CI Pipeline Execution


✅ Successfully ran 2 targets

Sent with 💌 from NxCloud.

Copy link

codecov bot commented Nov 13, 2024

Codecov Report

Attention: Patch coverage is 72.22222% with 30 lines in your changes missing coverage. Please review.

Project coverage is 75.82%. Comparing base (a93cffe) to head (43afc22).
Report is 20 commits behind head on develop.

Files with missing lines Patch % Lines
...-ingestion/vms_ingestion/normalization/pipeline.py 37.50% 15 Missing ⚠️
...tion/transforms/write_sink_reported_vessel_info.py 52.00% 12 Missing ⚠️
...on/vms_ingestion/normalization/pipeline_options.py 81.81% 2 Missing ⚠️
packages/libs/utils/utils/datetime.py 80.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop      #23      +/-   ##
===========================================
- Coverage    76.58%   75.82%   -0.76%     
===========================================
  Files           46       50       +4     
  Lines          756      844      +88     
  Branches        69       78       +9     
===========================================
+ Hits           579      640      +61     
- Misses         159      186      +27     
  Partials        18       18              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link

@andres-arana andres-arana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved, but there were a couple of things I think we should change if we have time before merging this. I know I'm coming very late to the party, so feel free to ignore this if you don't have time for this.

Sorry for the delay.

@rdgfuentes rdgfuentes merged commit 307935e into develop Nov 29, 2024
8 checks passed
@rdgfuentes rdgfuentes deleted the feature/PIPELINE-2297-extract-reported-vessel-info-from-positions branch November 29, 2024 16:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants