Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ApplySchema: log selected flags #14798

Merged
merged 3 commits into from
Jan 24, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions go/vt/vtctl/grpcvtctldserver/server.go
Original file line number Diff line number Diff line change
Expand Up @@ -221,6 +221,8 @@ func (s *VtctldServer) ApplyShardRoutingRules(ctx context.Context, req *vtctldat

// ApplySchema is part of the vtctlservicepb.VtctldServer interface.
func (s *VtctldServer) ApplySchema(ctx context.Context, req *vtctldatapb.ApplySchemaRequest) (resp *vtctldatapb.ApplySchemaResponse, err error) {
log.Infof("VtctldServer.ApplySchema: keyspace=%s, migrationContext=%v, ddlStrategy=%v, batchSize=%v", req.Keyspace, req.MigrationContext, req.DdlStrategy, req.BatchSize)

span, ctx := trace.NewSpan(ctx, "VtctldServer.ApplySchema")
Copy link
Contributor

@mattlord mattlord Dec 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO it's better to add these things to the trace span and add stats for specific cases that you want to be able to track.

Our logs are already getting noisy and it's causing practical problems in some cases.

Is a new log message on each invocation really the best option? If so, then of course we can add it. I just wanted to ask.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ApplySchema invocations are sparse; those are normally human generated per schema change. So on most production systems this would amount to a single digit number per day, or per week. It's also possible to use ApplySchema to control migrations (e.g. ALTER VITESS_MIGRATION ... COMPLETE etc.). But again those are a handful of potential extra calls. So in terms of overhead I think the price is insignificant.

Let me look into trace span though, I don't have a good grasp of how that works and how that differs from logging.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The trace span would be useful when tracing, a stat would be useful if we want to track X over the course of a process lifespan (e.g. ApplySchema statements with option X). I'm OK with adding the log message, just wondering if it's really the best option in this case. Let me know what you think.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be honest, I don't dig how tracing or stats would be useful here. I understand that our logging is sometimes bloated, but in this particular case I don't find it to be an issue, as schema changes are relatively rare and human generated.

Copy link
Contributor

@mattlord mattlord Jan 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we think this log message will be helpful/useful, and to whom? What do we hope to get from it? I don't see that stated clearly, but the subject says: log selected flags

So I'm guessing we want to know how many times given flags are specified per keyspace? That's the kind of thing a multi-label stat can provide. But I also don't know if this is thought to be useful in debugging a known issue and may soon be removed (i.e. it's only useful to Vitess developers) or if we think this kind of information is generally usable (useful to Vitess users/administrators). If it's the former then the log makes sense to me, if it's the latter then I think stats are better.

I know we don't think that ApplySchema is called very often, but we've thought this about various things in the past and been wrong -- at least when you consider aggregated logging with up to 1,000s of tablets involved.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, I added no context to this change. The purpose is not to do a weekly summary of how many this and that calls were made. The purpose is to be able to debug migrations that go wrong/fail, and to see what the exact invocation command was used, so we can try and reproduce them.

defer span.Finish()

Expand Down
Loading