Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[8pt] Improved HUC processing duration system #1323

Open
RobHanna-NOAA opened this issue Oct 19, 2024 · 0 comments
Open

[8pt] Improved HUC processing duration system #1323

RobHanna-NOAA opened this issue Oct 19, 2024 · 0 comments
Labels
enhancement New feature or request FIM4

Comments

@RobHanna-NOAA
Copy link
Contributor

RobHanna-NOAA commented Oct 19, 2024

Currently in run_by_unit.sh, it has HUC processing duration system, where each run_unit.sh calculates duration for a huc and updates a shared file in the logs folder. It currently looks like this:

total_duration_display="$hucNumber,$(Calc_Time $huc_start_time),$(Calc_Time_Minutes_in_Percent $huc_start_time)"
echo "$total_duration_display" >> "$outputDestDir/logs/unit/total_duration_run_by_unit_all_HUCs.csv"

However.. As multiple processes all try to write to the same file, colisons occur and the data is always somewhat incomplete and I need more performance data to find bottlenecks

I would like to replace / upgrade this system to:

  1. have run_by_unit.sh continue to track the duration time for the entire huc (including branches), but now also add new parameters for the total number of branches and the total processing time for the branches as a whole. This would include the processing time for both branch zero and its other branches as one number.
  2. the number of branches including branch zero.
  3. put that value a new output text file in the HUC output directory itself.
  4. Add a new column to this output which is the HUC number, careful on zero padding. (make this the first column).
  5. At the start of post processing, create a new tool which simply collects all of the HUCs files and concatenates them.

It should continue to be a csv output and continue with a semi-weird pattern for durations. It currently shows the entire huc processing time in two columns. One is total duration as min, second in normal time format. ie) 2:55. The second duration is a good duration in 10 base (percent), ie: 2.91 (two decimals). That second column makes it easier for averaging and summing later. Ensure this two column pattern for duration is applied to the branch durations columns

Nice to have but optional, but very nice to have.

  • In post processing, when we concatenate all independent HUC duration files, add new rows for:
    • total number of hucs processed
    • sum of the total overall huc processing time
    • sum of branch processing time.
    • sum of the number of branches processed.
@RobHanna-NOAA RobHanna-NOAA self-assigned this Oct 19, 2024
@RobHanna-NOAA RobHanna-NOAA added enhancement New feature or request FIM4 labels Oct 19, 2024
@RobHanna-NOAA RobHanna-NOAA changed the title [5pt] Improved HUC processing duration system [8pt] Improved HUC processing duration system Oct 20, 2024
@RobHanna-NOAA RobHanna-NOAA removed their assignment Oct 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request FIM4
Projects
None yet
Development

No branches or pull requests

1 participant