Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tools: block-generator locked table retry and additional metrics #5653

Merged
merged 12 commits into from
Aug 15, 2023

Conversation

winder
Copy link
Contributor

@winder winder commented Aug 10, 2023

Summary

Add a retry to block-generator when a table lock error is detected.
Add additional block-level metrics and timing information to report.

Test Plan

New unit tests + manual testing.

@codecov
Copy link

codecov bot commented Aug 10, 2023

Codecov Report

Merging #5653 (205a607) into master (d92a3bf) will increase coverage by 0.00%.
Report is 3 commits behind head on master.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master    #5653   +/-   ##
=======================================
  Coverage   55.11%   55.11%           
=======================================
  Files         464      464           
  Lines       64973    64973           
=======================================
+ Hits        35810    35813    +3     
+ Misses      26774    26773    -1     
+ Partials     2389     2387    -2     

see 11 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

)

// ---- templates ----
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved to generate_apps.go

Comment on lines +360 to +361
g.reportData.Counters[CommitWaitTimeMS] += uint64(commitWaitTime.Milliseconds())
g.reportData.Counters[LedgerValidateTimeMS] += uint64((validated.Sub(evaluated) - commitWaitTime).Milliseconds())
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Collecting some additional metrics in the reportData object.

Comment on lines +181 to +192
for {
err := eval.TransactionGroup(txGroup)
if err != nil {
if strings.Contains(err.Error(), "database table is locked") {
time.Sleep(waitDelay)
commitWaitTime += waitDelay
// sometimes the database is locked, so we retry
continue
}
return nil, 0, 0, fmt.Errorf("could not evaluate transaction group %d: %w", i, err)
}
break
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the "bug fix". The error always seems to happen on the first transaction group. Simply retrying until it works seems to allow the tests to complete.

Copy link
Contributor

@tzaffi tzaffi Aug 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider reporting a retry count as well (though I guess this is roughly commitWaitTime / 10 * time.Millisecond)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe more informative would be a retriedBlocks counter. So a contribution of 1 for blocks which had to retry, and 0 for those without any retries.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was mainly looking for magnitude with this. In my first test the importer was waiting for ~5 seconds over the course of a 1h test, so I don't think we need to worry about the wait time too much.

@@ -149,7 +150,10 @@ type assetHolding struct {
}

// Report is the generation report.
type Report map[TxTypeID]TxData
type Report struct {
Counters map[string]uint64 `json:"counters"`
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More data for the report.

Comment on lines +476 to +481
// write report to file
err = writeReport(w, scenario, start, r.RunDuration, generatorReport, collector)
if err != nil {
return fmt.Errorf("problem writing report: %w", err)
}
return nil
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function isn't actually new. I split out the reporting function so that I could test it. Everything above is practically unchanged.

@@ -253,12 +254,12 @@ const (
)

// Helper to record metrics. Supports rates (sum/count) and counters.
func recordDataToFile(start time.Time, entry Entry, prefix string, out *os.File) error {
func recordDataToWriter(start time.Time, entry Entry, prefix string, out io.Writer) error {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switched to io.Writer for testability.

@winder winder self-assigned this Aug 14, 2023
@winder winder requested review from shiqizng and tzaffi August 14, 2023 17:01
@winder winder marked this pull request as ready for review August 14, 2023 17:01
@winder winder changed the title WIP: block-generator wait until ledger has committed tools: block-generator locked table retry and additional metrics Aug 14, 2023
Copy link
Contributor

@tzaffi tzaffi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

approving with a couple of comments/questions.

@winder winder merged commit 2aac402 into algorand:master Aug 15, 2023
@winder winder deleted the will/blockgen-ledger-level branch August 18, 2023 15:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants