Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for sampling rate in streamlog #15919

Merged
merged 12 commits into from
Jun 2, 2024

Conversation

timvaillancourt
Copy link
Contributor

@timvaillancourt timvaillancourt commented May 11, 2024

Description

This PR implements the RFC #15909 by adding a flag --querylog-sample-rate float to vtcombo, vtgate and vttablet in order for queries to be sampled randomly without the need for the client to trigger query logging via a query directive/comment == --querylog-filter-tag string

This flag supports values between 0.0 (no logging) and 1.0 (log all queries) to match other 0.0-1.0 float "sample" flags

math/rand/v2 was used because it sounds better

Related Issue(s)

Resolves #15909

Checklist

  • "Backport to:" labels have been added if this change should be back-ported to release branches
  • If this change is to be back-ported to previous releases, a justification is included in the PR description
  • Tests were added or are not required
  • Did the new or modified tests pass consistently locally and on CI?
  • Documentation was added or is not required

Deployment Notes

Signed-off-by: Tim Vaillancourt <[email protected]>
Copy link
Contributor

vitess-bot bot commented May 11, 2024

Review Checklist

Hello reviewers! 👋 Please follow this checklist when reviewing this Pull Request.

General

  • Ensure that the Pull Request has a descriptive title.
  • Ensure there is a link to an issue (except for internal cleanup and flaky test fixes), new features should have an RFC that documents use cases and test cases.

Tests

  • Bug fixes should have at least one unit or end-to-end test, enhancement and new features should have a sufficient number of tests.

Documentation

  • Apply the release notes (needs details) label if users need to know about this change.
  • New features should be documented.
  • There should be some code comments as to why things are implemented the way they are.
  • There should be a comment at the top of each new or modified test to explain what the test does.

New flags

  • Is this flag really necessary?
  • Flag names must be clear and intuitive, use dashes (-), and have a clear help text.

If a workflow is added or modified:

  • Each item in Jobs should be named in order to mark it as required.
  • If the workflow needs to be marked as required, the maintainer team must be notified.

Backward compatibility

  • Protobuf changes should be wire-compatible.
  • Changes to _vt tables and RPCs need to be backward compatible.
  • RPC changes should be compatible with vitess-operator
  • If a flag is removed, then it should also be removed from vitess-operator and arewefastyet, if used there.
  • vtctl command output order should be stable and awk-able.

@vitess-bot vitess-bot bot added NeedsBackportReason If backport labels have been applied to a PR, a justification is required NeedsDescriptionUpdate The description is not clear or comprehensive enough, and needs work NeedsIssue A linked issue is missing for this Pull Request NeedsWebsiteDocsUpdate What it says labels May 11, 2024
@github-actions github-actions bot added this to the v20.0.0 milestone May 11, 2024
@timvaillancourt timvaillancourt marked this pull request as ready for review May 11, 2024 00:21
@timvaillancourt timvaillancourt requested a review from deepthi as a code owner May 11, 2024 00:21
Copy link

codecov bot commented May 11, 2024

Codecov Report

Attention: Patch coverage is 90.90909% with 1 lines in your changes are missing coverage. Please review.

Project coverage is 68.24%. Comparing base (f118ba2) to head (8f1dcd2).
Report is 134 commits behind head on main.

Files Patch % Lines
go/streamlog/streamlog.go 90.90% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #15919      +/-   ##
==========================================
- Coverage   68.40%   68.24%   -0.17%     
==========================================
  Files        1556     1541      -15     
  Lines      195121   197179    +2058     
==========================================
+ Hits       133479   134571    +1092     
- Misses      61642    62608     +966     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@timvaillancourt timvaillancourt removed NeedsDescriptionUpdate The description is not clear or comprehensive enough, and needs work NeedsIssue A linked issue is missing for this Pull Request NeedsBackportReason If backport labels have been applied to a PR, a justification is required labels May 11, 2024
Signed-off-by: Tim Vaillancourt <[email protected]>
Signed-off-by: Tim Vaillancourt <[email protected]>
Signed-off-by: Tim Vaillancourt <[email protected]>
Signed-off-by: Tim Vaillancourt <[email protected]>
Signed-off-by: Tim Vaillancourt <[email protected]>
Signed-off-by: Tim Vaillancourt <[email protected]>
Signed-off-by: Tim Vaillancourt <[email protected]>
Signed-off-by: Tim Vaillancourt <[email protected]>
Signed-off-by: Tim Vaillancourt <[email protected]>
Copy link
Contributor

vitess-bot bot commented May 29, 2024

Hello! 👋

This Pull Request is now handled by arewefastyet. The current HEAD and future commits will be benchmarked.

You can find the performance comparison on the arewefastyet website.

@deepthi deepthi requested a review from frouioui May 30, 2024 14:54
@deepthi
Copy link
Member

deepthi commented May 30, 2024

Flag docs are auto-generated, so we don't need any manual changes on the website for this. @frouioui can you confirm that I'm getting this right?

@frouioui
Copy link
Member

Flag docs are auto-generated, so we don't need any manual changes on the website for this. @frouioui can you confirm that I'm getting this right?

@deepthi according https://github.com/vitessio/vitess-bot/blob/main/README.md#vitess-bot it should generate a PR upon merge to main and upon the next release. If it does not, I will investigate and in the meantime flags docs can be manually generated using the tool.

@frouioui
Copy link
Member

frouioui commented May 30, 2024

After talking with @mattlord, I realized that actually the automation does not work. We should update the website docs manually at the same time as this PR.

@timvaillancourt
Copy link
Contributor Author

Flag docs are auto-generated, so we don't need any manual changes on the website for this. @frouioui can you confirm that I'm getting this right?

Right, I forgot this is automatic now, thanks 👍

Also, based on the fact arewefastyet will not apply the flag that enables this I will remove the benchmark me label for now and add a Go benchmark instead

@timvaillancourt timvaillancourt removed the Benchmark me Add label to PR to run benchmarks label May 31, 2024
@deepthi
Copy link
Member

deepthi commented May 31, 2024

After talking with @mattlord, I realized that actually the automation does not work. We should update the website docs manually at the same time as this PR.

@frouioui can you provide the instructions for that? Or should we simply plan to do that manually for v20 before release? I suspect there may be other PRs which are missing website documentation as well. One example: #16021
cc @shlomi-noach for v20 release decision.

EDIT: There's a Makefile target in the website repo to generate the docs. Do the following in a website branch, and make sure you have the main vitess repo checked out and accessible from there.

export COBRADOC_VERSION_PAIRS="<this_pr_branch_name>:20.0"
export VITESS_DIR=~/go/src/vitess.io/vitess
make generated-docs

It becomes easier if doing after merge into main, because we can use main:20.0

@frouioui
Copy link
Member

Yeah exactly running VITESS_DIR=../vitess ./tools/sync_cobradocs.sh from inside the website repo works fine too to fill all the missed changes.

@deepthi deepthi requested a review from a team May 31, 2024 19:56
@frouioui frouioui added Component: Query Serving Type: Enhancement Logical improvement (somewhere between a bug and feature) labels May 31, 2024
Copy link
Member

@frouioui frouioui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@timvaillancourt
Copy link
Contributor Author

timvaillancourt commented Jun 1, 2024

EDIT: replied to wrong PR/browser tab 🤦

Signed-off-by: Tim Vaillancourt <[email protected]>
@timvaillancourt
Copy link
Contributor Author

timvaillancourt commented Jun 2, 2024

To address my own paranoia regarding this thread I added a go benchmark:

BenchmarkShouldEmitLog
BenchmarkShouldEmitLog/default
BenchmarkShouldEmitLog/default-12         	327598167	         3.739 ns/op
BenchmarkShouldEmitLog/filter_tag
BenchmarkShouldEmitLog/filter_tag-12      	100000000	        11.67 ns/op
BenchmarkShouldEmitLog/50%_sample_rate
BenchmarkShouldEmitLog/50%_sample_rate-12 	79597881	        15.05 ns/op

Where default = no filter-tag or random sampling, filter_tag = query-comment filter tag, 50%_sample_rate is --query-sample-rate 0.5

I think I'm most surprised by filter_tag which essentially adds a string.Contains(sql), actually 😄

@shlomi-noach
Copy link
Contributor

Re docs, good points. My suggestion: after code freeze, I'll generate vtctldclient docs manually. At any case we can decouple the docs discussion from this PR.

queryLogSampleRate = 0
assert.False(t, shouldSampleQuery())

// for test coverage, can't test a random result
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can run 100 consecutive randoms and expect some to pass and some to fail. The false negative case for 1 in 2^100 is lower than any computational problem in modern hardware.

@shlomi-noach shlomi-noach removed the NeedsWebsiteDocsUpdate What it says label Jun 2, 2024
@shlomi-noach shlomi-noach merged commit 377e1dd into vitessio:main Jun 2, 2024
93 of 94 checks passed
@timvaillancourt timvaillancourt deleted the querylog-sampling-rate branch June 2, 2024 22:54
timvaillancourt added a commit to timvaillancourt/vitess that referenced this pull request Jun 4, 2024
timvaillancourt added a commit to slackhq/vitess that referenced this pull request Jun 4, 2024
shlomi-noach pushed a commit that referenced this pull request Jun 4, 2024
timvaillancourt added a commit to slackhq/vitess that referenced this pull request Jun 4, 2024
* `vtgate`: support filtering tablets by tablet-tags  (vitessio#15911)

Signed-off-by: Tim Vaillancourt <[email protected]>
Co-authored-by: Shlomi Noach <[email protected]>

* Add support for sampling rate in `streamlog` (vitessio#15919)

Signed-off-by: Tim Vaillancourt <[email protected]>

* Fix merge conflict resolution

Signed-off-by: Tim Vaillancourt <[email protected]>

* update rand import

Signed-off-by: Tim Vaillancourt <[email protected]>

* Add sql text counts stats to `vtcombo`,`vtgate`+`vttablet` (vitessio#15897)

Signed-off-by: Tim Vaillancourt <[email protected]>
Signed-off-by: Harshit Gangal <[email protected]>
Co-authored-by: Harshit Gangal <[email protected]>
Co-authored-by: Deepthi Sigireddi <[email protected]>

* missing rename

Signed-off-by: Tim Vaillancourt <[email protected]>

* missing rename again

Signed-off-by: Tim Vaillancourt <[email protected]>

---------

Signed-off-by: Tim Vaillancourt <[email protected]>
Signed-off-by: Harshit Gangal <[email protected]>
Co-authored-by: Shlomi Noach <[email protected]>
Co-authored-by: Harshit Gangal <[email protected]>
Co-authored-by: Deepthi Sigireddi <[email protected]>
timvaillancourt added a commit to slackhq/vitess that referenced this pull request Jul 18, 2024
timvaillancourt added a commit to slackhq/vitess that referenced this pull request Jul 18, 2024
* `vtgate`: support filtering tablets by tablet-tags  (vitessio#15911)

Signed-off-by: Tim Vaillancourt <[email protected]>
Co-authored-by: Shlomi Noach <[email protected]>

* Add support for sampling rate in `streamlog` (vitessio#15919)

Signed-off-by: Tim Vaillancourt <[email protected]>

* Add sql text counts stats to `vtcombo`,`vtgate`+`vttablet` (vitessio#15897)

Signed-off-by: Tim Vaillancourt <[email protected]>
Signed-off-by: Harshit Gangal <[email protected]>
Co-authored-by: Harshit Gangal <[email protected]>
Co-authored-by: Deepthi Sigireddi <[email protected]>

---------

Signed-off-by: Tim Vaillancourt <[email protected]>
Signed-off-by: Harshit Gangal <[email protected]>
Co-authored-by: Shlomi Noach <[email protected]>
Co-authored-by: Harshit Gangal <[email protected]>
Co-authored-by: Deepthi Sigireddi <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

RFC: support sampling rate for querylog
4 participants