-
Notifications
You must be signed in to change notification settings - Fork 476
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tools: block-generator locked table retry and additional metrics #5653
Conversation
Codecov Report
@@ Coverage Diff @@
## master #5653 +/- ##
=======================================
Coverage 55.11% 55.11%
=======================================
Files 464 464
Lines 64973 64973
=======================================
+ Hits 35810 35813 +3
+ Misses 26774 26773 -1
+ Partials 2389 2387 -2 see 11 files with indirect coverage changes 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
) | ||
|
||
// ---- templates ---- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moved to generate_apps.go
g.reportData.Counters[CommitWaitTimeMS] += uint64(commitWaitTime.Milliseconds()) | ||
g.reportData.Counters[LedgerValidateTimeMS] += uint64((validated.Sub(evaluated) - commitWaitTime).Milliseconds()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Collecting some additional metrics in the reportData
object.
for { | ||
err := eval.TransactionGroup(txGroup) | ||
if err != nil { | ||
if strings.Contains(err.Error(), "database table is locked") { | ||
time.Sleep(waitDelay) | ||
commitWaitTime += waitDelay | ||
// sometimes the database is locked, so we retry | ||
continue | ||
} | ||
return nil, 0, 0, fmt.Errorf("could not evaluate transaction group %d: %w", i, err) | ||
} | ||
break |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the "bug fix". The error always seems to happen on the first transaction group. Simply retrying until it works seems to allow the tests to complete.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider reporting a retry count as well (though I guess this is roughly commitWaitTime / 10 * time.Millisecond
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or maybe more informative would be a retriedBlocks
counter. So a contribution of 1 for blocks which had to retry, and 0 for those without any retries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was mainly looking for magnitude with this. In my first test the importer was waiting for ~5 seconds over the course of a 1h test, so I don't think we need to worry about the wait time too much.
@@ -149,7 +150,10 @@ type assetHolding struct { | |||
} | |||
|
|||
// Report is the generation report. | |||
type Report map[TxTypeID]TxData | |||
type Report struct { | |||
Counters map[string]uint64 `json:"counters"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More data for the report.
// write report to file | ||
err = writeReport(w, scenario, start, r.RunDuration, generatorReport, collector) | ||
if err != nil { | ||
return fmt.Errorf("problem writing report: %w", err) | ||
} | ||
return nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function isn't actually new. I split out the reporting function so that I could test it. Everything above is practically unchanged.
@@ -253,12 +254,12 @@ const ( | |||
) | |||
|
|||
// Helper to record metrics. Supports rates (sum/count) and counters. | |||
func recordDataToFile(start time.Time, entry Entry, prefix string, out *os.File) error { | |||
func recordDataToWriter(start time.Time, entry Entry, prefix string, out io.Writer) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Switched to io.Writer for testability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
approving with a couple of comments/questions.
Summary
Add a retry to block-generator when a table lock error is detected.
Add additional block-level metrics and timing information to report.
Test Plan
New unit tests + manual testing.