Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fluent-bit init uses cloudwatch plugin not specified in the config #835

Open
borkod opened this issue Jun 13, 2024 · 2 comments
Open

fluent-bit init uses cloudwatch plugin not specified in the config #835

borkod opened this issue Jun 13, 2024 · 2 comments

Comments

@borkod
Copy link

borkod commented Jun 13, 2024

Describe the question/issue

  • Issue 1

Fluent bit logs show error AccessDeniedException because it tries to create a log group that it is not allowed / is not configured:

time="2024-06-13T18:48:52Z" level=error msg="AccessDeniedException: User: arn:aws:sts::xxxxxxxxxxxx:assumed-role/fluentbit-task-role/xxxxxxx is not authorized to perform: logs:CreateLogGroup on resource: arn:aws:logs:us-east-1:xxxxxxxxxxxx:log-group:fluent-bit-cloudwatch:log-stream: because no identity-based policy allows the logs:CreateLogGroup action\n\tstatus code: 400, request id: xxxxxxxx"

However, our output plugin setting is:


[OUTPUT]
  Name              cloudwatch_logs
  Match             *
  region            ca-central-1
  log_group_name    testname
  log_stream_name   teststream
  auto_create_group  false
  Retry_Limit   no_limits

During fluent-bit startup we see following logs:

[2024/06/13 19:22:22] [ info] cloudwatch.0
...
time="2024-06-13T19:22:22Z" level=info msg="[cloudwatch 0] plugin parameter auto_create_stream = 'true'"
time="2024-06-13T19:22:22Z" level=info msg="[cloudwatch 0] plugin parameter auto_create_group = 'true'"
...
time="2024-06-13T19:22:22Z" level=info msg="[cloudwatch 0] plugin parameter region = 'us-east-1'"
...
time="2024-06-13T19:22:22Z" level=info msg="[cloudwatch 0] plugin parameter default_log_group_name = 'fluentbit-default'"
time="2024-06-13T19:22:22Z" level=info msg="[cloudwatch 0] plugin parameter log_group_name = 'fluent-bit-cloudwatch'"

Our configuration only uses the newer cloudwatch_logs plugin. We do not specify or use the cloudwatch plugin.

It seems that the cloudwatch plugin is being used for some reason as well, even though it is not being specified by us. It is using some config that specifies us-east-1 region and fluent-bit-cloudwatch log group, as shown in the logs. This then causes the denied exception error.

In regards to our specified cloudwatch_logs plugin - we are seeing logs written to the specified log group / log stream correctly.

  • Issue 2

As shown above in the output config, we set the Retry_Limit to no_limits.

However, logs show:

[2024/06/13 19:31:07] [ warn] [engine] chunk '1-1718307049.471694794.flb' cannot be retried: task_id=0, input=syslog.1 > output=cloudwatch.0
[2024/06/13 19:31:07] [debug] [task] task_id=0 reached retry-attempts limit 1/1

Earlier startup logs show:

[2024/06/13 19:30:50] [debug] [output:cloudwatch_logs:cloudwatch_logs.1] task_id=0 assigned to thread #0

It's not completely clear to me whether the task_id=0 reached retry-attempts limit 1/1 is referencing cloudwatch_logs plugin. If so, then why is it not respecting our Retry_Limit no_limits setting? (We've also tried different settings, e.g. 5 instead of no_limits). Or is the task_id=0 reached retry-attempts limit 1/1 related to the previous error line that references cloudwatch.0, which means that it is also related to our mysterious cloudwatch plugin.

Configuration

ECS Config:

resource "aws_ecs_service" "fluentbit" {
  name            = "fluentbit"
  task_definition = aws_ecs_task_definition.fluentbit.arn
  cluster = aws_ecs_cluster.fluentbit.id
  launch_type = "FARGATE"
  desired_count = 2
  enable_execute_command = true

  network_configuration {
    assign_public_ip = false

    security_groups = [
      aws_security_group.fluentbit-container-sg.id,
    ]

    subnets = [
      data.aws_ssm_parameter.subnet1.value,
      data.aws_ssm_parameter.subnet2.value,
    ]
  }

  load_balancer {
    target_group_arn = aws_lb_target_group.fluentbit_ecs_syslog_tg.arn
    container_name   = "fluentbit"
    container_port   = "5140"
  }
}

resource "aws_ecs_task_definition" "fluentbit" {
  family = "fluentbit"
  
  container_definitions = jsonencode([{
    name = "fluentbit"
    essential = true
    #readonlyRootFilesystem = true    can't be enabled because AWS fargate in the s3 init files https://github.com/fluent/fluent-bit/issues/7308
    image = "${data.aws_ssm_parameter.fluent-latest-image.value}"
    entrypoint = ["/bin/sh","-c"]
    command = ["/init/fluent_bit_init_entrypoint.sh"]
    environment = [
      {
        name = "aws_fluent_bit_init_s3_1"
        value = "${aws_s3_bucket.syslog-config.arn}/fluent/syslog-fluent-base.conf"
      },
      {
        name = "aws_fluent_bit_init_s3_2"
        value = "${aws_s3_bucket.syslog-config.arn}/fluent/syslog-fluent-input.conf"
      },
      {
        name = "aws_fluent_bit_init_s3_3"
        value = "${aws_s3_bucket.syslog-config.arn}/fluent/syslog-fluent-parser.conf"
      },
      {
        name = "aws_fluent_bit_init_s3_4"
        value = "${aws_s3_bucket.syslog-config.arn}/fluent/syslog-fluent-output.conf"
      }
    ] 
    portMappings = [{
      containerPort = 5140
      hostPort = 5140
      protocol = "tcp"
    },{
      containerPort = 2020
      hostPort = 2020
      protocol = "tcp"
    }]
    healthcheck = {
      command = ["CMD-SHELL","curl -f http://localhost:2020/api/v1/health || exit 1"] 
      interval = 60
      timeout = 5
      retries = 3
      start_period = 90
    } 
    logConfiguration = {
      logDriver = "awslogs"
      options = {
        "awslogs-region" = "ca-central-1"
        "awslogs-group" = "${aws_cloudwatch_log_group.ecs_fluentbit_service.id}"
        "awslogs-stream-prefix" = "ecs"
      }
    }
  }])

Fluent Bit Log Output

See above.

Fluent Bit Version Info

Container: aws-for-fluent-bit:init-latest
Fluent-bit version: Fluent Bit v1.9.10

Cluster Details

Application Details

Steps to reproduce issue

Related Issues

@MrHash
Copy link

MrHash commented Nov 5, 2024

@swapneils
Copy link
Contributor

In addition to what your output .conf file is doing, aws-for-fluent-bit generates a default output plugin which is configured based on your task definition (see this sample JSON task-definition).

To get rid of the error message, you can move the output config you shared into your taskdef, in the logConfiguration section of your application container definition (see the above sample task definition).
That way you have a single correctly-configured OUTPUT plugin, rather than two of them trying to publish the same logs to different places.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants