Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Setting retry_initial_backoff_secs or retry_max_duration_secs causes vector panic #20890

Closed
dhable opened this issue Jul 19, 2024 · 1 comment · Fixed by #20891
Closed

Setting retry_initial_backoff_secs or retry_max_duration_secs causes vector panic #20890

dhable opened this issue Jul 19, 2024 · 1 comment · Fixed by #20891
Labels
type: bug A code related bug.

Comments

@dhable
Copy link
Contributor

dhable commented Jul 19, 2024

A note for the community

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Problem

The configuration fields for retry_initial_backoff_secs and retry_max_duration_secs in TowerRequestConfig accept any valid u64 values, including 0. Setting either of these fields to 0 causes the FibonacciRetryPolicy to panic with a divide by 0 error.

The crash when retry_initial_backoff_secs is set to 0 happens on startup with a vague error message.

thread 'main' panicked at 'attempt to calculate the remainder with a divisor of zero', src/sinks/util/retries.rs:93:22
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
thread '<unnamed>' panicked at 'index out of bounds: the len is 4096 but the index is 16960', src/internal_telemetry/allocations/mod.rs:116:13
fatal runtime error: failed to initiate panic, error 5 

The panic when retry_max_duration_secs is 0 is worse since it only happens once the sink attempts a retry, which could be some time later.

2024-07-19T15:47:17.885922Z  WARN sink{component_kind="sink" component_id=http_sink component_type=http}:request{request_id=1}:http: vector::internal_events::http_client: HTTP error. error=error trying to connect: tcp connect error: Connection refused (os error 61) error_type="request_failed" stage="processing" internal_log_rate_limit=true
2024-07-19T15:47:17.886483Z  WARN sink{component_kind="sink" component_id=http_sink component_type=http}:request{request_id=1}: vector::sinks::util::retries: Retrying after error. error=Failed to make HTTP(S) request: error trying to connect: tcp connect error: Connection refused (os error 61) internal_log_rate_limit=true
thread 'vector-worker' panicked at src/sinks/util/retries.rs:101:22:
attempt to calculate the remainder with a divisor of zero
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
2024-07-19T15:47:17.968208Z ERROR sink{component_kind="sink" component_id=http_sink component_type=http}: vector::topology: An error occurred that Vector couldn't handle: the task panicked and was aborted.
2024-07-19T15:47:17.968534Z  INFO vector: Vector has stopped.

It seems like using NonZeroU64 for these fields in TowerRequestConfig would prevent the issue when the config is parsed and provide a more helpful error message to the user.

Configuration

[sources.http_source]
type = "http_server"
address = "0.0.0.0:8088"

[sinks.http_sink]
type = "http"
inputs = ["http_source"]
uri = "http://localhost:9099/data"

[sinks.http_sink.encoding]
codec = "json"

[sinks.http_sink.request]
# uncomment to cause a panic on startup 
# retry_initial_backoff_secs = 0

# uncomment to crash but only when a HTTP request is handled by the sink
retry_max_duration_secs = 0

Version

0.35.1

Debug Output

No response

Example Data

No response

Additional Context

Zero values were tolerated in the config prior to the introduction of FibonacciRetryPolicy since those arithmetic operations didn't depend on division with the configuration values.

References

No response

@dhable dhable added the type: bug A code related bug. label Jul 19, 2024
dhable added a commit to dhable/vector that referenced this issue Jul 19, 2024
Change `retry_initial_backoff_secs` and `retry_initial_backoff_secs`
from `u64` to `NonZeroU64` to prevent divide by zero panic at runtime.

Fixes: vectordotdev#20890
dhable added a commit to dhable/vector that referenced this issue Jul 19, 2024
Change `retry_initial_backoff_secs` and `retry_initial_backoff_secs`
from `u64` to `NonZeroU64` to prevent divide by zero panic at runtime.

Fixes: vectordotdev#20890
@jszwedko
Copy link
Member

Thanks for the report and PR @dhable !

dhable added a commit to dhable/vector that referenced this issue Jul 22, 2024
Change `retry_initial_backoff_secs` and `retry_initial_backoff_secs`
from `u64` to `NonZeroU64` to prevent divide by zero panic at runtime.

Fixes: vectordotdev#20890
github-merge-queue bot pushed a commit that referenced this issue Jul 24, 2024
Change `retry_initial_backoff_secs` and `retry_initial_backoff_secs`
from `u64` to `NonZeroU64` to prevent divide by zero panic at runtime.

Fixes: #20890
ym pushed a commit to ym/vector that referenced this issue Aug 18, 2024
Change `retry_initial_backoff_secs` and `retry_initial_backoff_secs`
from `u64` to `NonZeroU64` to prevent divide by zero panic at runtime.

Fixes: vectordotdev#20890
AndrooTheChen pushed a commit to discord/vector that referenced this issue Sep 23, 2024
Change `retry_initial_backoff_secs` and `retry_initial_backoff_secs`
from `u64` to `NonZeroU64` to prevent divide by zero panic at runtime.

Fixes: vectordotdev#20890
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: bug A code related bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants