Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding geolocation attrs to app_logs #11699

Merged
merged 12 commits into from
Sep 26, 2023

Conversation

LikeTheSalad
Copy link
Contributor

@LikeTheSalad LikeTheSalad commented Sep 22, 2023

Motivation/summary

We need geolocation information on mobile events in order to properly display location-related charts in Kibana's mobile dashboard.

These changes aim to add the client.geo* attrs to logs that contain the attr client.ip, which at the moment is only set to mobile-related logs.

Checklist

For functional changes, consider:

  • Is it observable through the addition of either logging or metrics?
  • Is its use being published in telemetry to enable product improvement?
  • Have system tests been added to avoid regression?

How to test these changes

Related issues

@mergify
Copy link
Contributor

mergify bot commented Sep 22, 2023

This pull request does not have a backport label. Could you fix it @LikeTheSalad? 🙏
To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-7.17 is the label to automatically backport to the 7.17 branch.
  • backport-8./d is the label to automatically backport to the 8./d branch. /d is the digit.

NOTE: backport-skip has been added to this pull request.

@mergify mergify bot added the backport-skip Skip notification from the automated backport with mergify label Sep 22, 2023
@LikeTheSalad LikeTheSalad marked this pull request as ready for review September 22, 2023 14:35
@LikeTheSalad LikeTheSalad requested a review from a team as a code owner September 22, 2023 14:35
Copy link
Member

@carsonip carsonip left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • A larger problem now that I've checked apm-data is that we are never parsing/populating client.ip for logs, but only for errors. This will need to be solved before this PR works.
  • It will also be helpful to have a system test so that this doesn't break in the future.

database_file: GeoLite2-City.mmdb
on_failure:
- remove:
field: client.ip
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have to remove the field on failure?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see why we should tbh, although it seems like the same is done in other places when using the geoip processor which made me think that there might be a reason for it, so I just left it to keep consistency with all other usages, but I'm ok to remove this part if it's not needed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a strong opinion on this. Actually, it may be helpful to just use a predefined geo ip pipeline here.

  - pipeline:
      name: client_geoip

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with using that predefined pipeline, however, doing so would append geo attrs to all logs, instead of only the ones with the attribute event.category set to device. I'm not sure is that could cause problems to the apm server.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@LikeTheSalad you can execute the pipeline conditionally as well:

- pipeline: 
      if: ...
      name: client_geoip

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've just discussed this with Silvia. We think that unless there's a specific reason, we'd prefer running the pipeline without the if check. Since we are already running client_geoip ingest pipeline for most of the other data streams, the fact that it is missing in this app_logs data stream appears to be only an oversight. Also, client.ip is only populated for swift / android otel agents AND capture_personal_data: true, so it should be fine to run client_geoip whenever client.ip exists.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool! That makes things simpler, cheers!

Copy link
Contributor Author

@LikeTheSalad LikeTheSalad Sep 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I finished creating the system test for this use case and I also applied the changes to use the existing client_geoip pipeline without conditions.

I found out a couple of things while I was working on the test:

  • The capture_personal_data flag seems to be enabled by default (at least for the system tests), because when I was adding a new config inside systemtest/apmservertest/config.go, I printed some logs in here where we're reading the config and realized that the if body was always getting executed no matter if I added the new config or not.
  • There were 2 reasons why I wasn't able to see geo attributes locally: The first one is that the agent.name has to be either android/java or iOS/swift for the client.ip attr to be populated (as mentioned by @carsonip), and the second one was that the client.ip value received by the server during the tests was 127.0.0.1 which the server doesn't know how to translate into a location (which makes sense), so I had to override the local IP during testing by setting a real one using the X-Forwarded-For header so that the server could find a location from the client.ip attr.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • You're right. capture_personal_data is default to true for both apm-server standalone and integration.
  • Overriding X-Forwarded-For sounds good.

@LikeTheSalad
Copy link
Contributor Author

Thanks for taking a look @carsonip !

A larger problem now that I've checked apm-data is that we are never parsing/populating client.ip for logs, but only for errors. This will need to be solved before this PR works.

I'm a bit confused about this because I'm taking a look at this log in edge-lite that's not an error, which has the client.ip attribute. That's the kind of log I'm trying to append geo attrs with these changes so it seems like we might be able to work with those?

It will also be helpful to have a system test so that this doesn't break in the future.

I'm up for adding those kinds of tests, although I'm struggling a bit with finding similar tests (that check log's enhancements) in the current tests for ingest processes, and without some similar tests to take as a base for mine, I'm not sure how to set things up on my own in order to do the logs validations. Is there any other test file I'm missing where I could find some ingest tests that verify attrs appended to logs? Otherwise, I'm also up to jump on a call to discuss what steps I can follow to set up these kind of tests for logs.

@carsonip
Copy link
Member

That's the kind of log I'm trying to append geo attrs with these changes so it seems like we might be able to work with those?

You're right if the apm-server is appropriately configured. When the capture_personal_data is set in apm-server, Client.IP from OTLP will be parsed and recorded. Here's the relevant line of code: https://github.com/elastic/apm-server/blob/main/internal/beater/server.go#L198

I believe this is also why you're having difficulties finding such a test, because it is 1. requires a config, 2. uses otlp and only works on swift and android, 3. needs to be a system test. I did a quick search and otlp_test.go would be a good starting point. Think of a mix of TestOTLPAnonymous (the part where we override agent name), TestOTLPGRPCLogs (how we generate logs and assert them).

@LikeTheSalad
Copy link
Contributor Author

You're right if the apm-server is appropriately configured. When the capture_personal_data is set in apm-server, Client.IP from OTLP will be parsed and recorded. Here's the relevant line of code: https://github.com/elastic/apm-server/blob/main/internal/beater/server.go#L198

I believe this is also why you're having difficulties finding such a test, because it is 1. requires a config, 2. uses otlp and only works on swift and android, 3. needs to be a system test. I did a quick search and otlp_test.go would be a good starting point. Think of a mix of TestOTLPAnonymous (the part where we override agent name), TestOTLPGRPCLogs (how we generate logs and assert them).

Cool! Thanks for taking the time 👍 I'll take a look at those tests to try and create some for this use case then

@carsonip
Copy link
Member

Also remember to set capture_personal_data in config before srv.Start() like what we did in TestOTLPAnonymous. We never use this setting in system tests so there's no existing field in systemtest/apmservertest/config.go. Under Config please add something like

AugmentEnabled                       *bool      `json:"apm-server.capture_personal_data,omitempty"`

apmpackage/apm/changelog.yml Outdated Show resolved Hide resolved
systemtest/ingest_test.go Outdated Show resolved Hide resolved
Copy link
Member

@carsonip carsonip left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One nit, otherwise looks good!

@@ -300,3 +307,60 @@ func TestIngestPipelineBackwardCompatibility(t *testing.T) {
require.NoError(t, err)
}
}

func TestMobileLogsWithGeoLocation(t *testing.T) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: imo the test fits marginally better in otlp_test.go although this specific PR is related to ingest pipeline. This is just a special case of TestOTLPGRPCLogs. Maybe call it TestOTLPGRPCLogsClientIP.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean to rename it and also move it to otlp_test.go or to just rename it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move + rename

Copy link
Member

@carsonip carsonip left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also please update PR title and description, thanks

@LikeTheSalad LikeTheSalad changed the title Adding conditional geolocation to mobile event logs Adding geolocation attrs to logs Sep 26, 2023
carsonip
carsonip previously approved these changes Sep 26, 2023
Copy link
Member

@carsonip carsonip left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks perfect! thanks for adding the missing test

@carsonip carsonip changed the title Adding geolocation attrs to logs Adding geolocation attrs to app_logs Sep 26, 2023
@carsonip carsonip enabled auto-merge (squash) September 26, 2023 12:14
@kruskall
Copy link
Member

This affects GRPC so I tried to create a deployment in elastic cloud and edit to system test to use the remote APM Server to reproduce the issue on an 8.10 deployment and validate that it's fixed on a 8.11 deployment.

After a lot of debugging, I updated the grpc.Dial call to:

conn, err := grpc.Dial( <apm_url>+":443", grpc.WithTransportCredentials(credentials.NewTLS(&tls.Config{})), grpc.WithReturnConnectionError(), grpc.FailOnNonTempDialError(true), grpc.WithBlock(), grpc.WithTimeout(10*time.Second))

The dial call fails with:

                Error:          Received unexpected error:
                                rpc error: code = Unauthenticated desc = unexpected HTTP status code received from server: 401 (Unauthorized); transport: received unexpected content-type "application/json"
                Test:           TestOTLPGRPCLogsClientIP

Note: I've updated the allowed agent by adding android/java

@axw
Copy link
Member

axw commented Oct 25, 2023

Worked for me. I added android/java to allowed anonymous agents, and used this code:

package main

import (
        "context"
        "crypto/tls"
        "log"
        "time"

        "go.opentelemetry.io/collector/pdata/pcommon"
        "go.opentelemetry.io/collector/pdata/plog"
        "go.opentelemetry.io/collector/pdata/plog/plogotlp"
        semconv "go.opentelemetry.io/collector/semconv/v1.5.0"
        "google.golang.org/grpc"
        "google.golang.org/grpc/credentials"
)

func main() {
        conn, err := grpc.Dial(
                "cc6895f8c08348a4ab785f00f15e971b.apm.us-west2.gcp.elastic-cloud.com:443",
                grpc.WithBlock(),
                grpc.WithTransportCredentials(credentials.NewTLS(&tls.Config{})),
        )
        if err != nil {
                log.Fatal(err)
        }
        defer conn.Close()

        logsClient := plogotlp.NewGRPCClient(conn)

        logs := plog.NewLogs()
        resourceLogs := logs.ResourceLogs().AppendEmpty()
        resourceAttrs := logs.ResourceLogs().At(0).Resource().Attributes()
        resourceAttrs.PutStr(semconv.AttributeTelemetrySDKLanguage, "java")
        resourceAttrs.PutStr(semconv.AttributeTelemetrySDKName, "android")

        scopeLogs := resourceLogs.ScopeLogs().AppendEmpty()
        otelLog := scopeLogs.LogRecords().AppendEmpty()
        otelLog.SetTraceID(pcommon.TraceID{1})
        otelLog.SetSpanID(pcommon.SpanID{2})
        otelLog.SetSeverityNumber(plog.SeverityNumberInfo)
        otelLog.SetSeverityText("Info")
        otelLog.SetTimestamp(pcommon.NewTimestampFromTime(time.Unix(1, 0)))
        otelLog.Attributes().PutStr("key", "value")
        otelLog.Attributes().PutDouble("numeric_key", 1234)
        otelLog.Body().SetStr("a log message")

        _, err = logsClient.Export(context.Background(), plogotlp.NewExportRequestFromLogs(logs))
        if err != nil {
                log.Fatal(err)
        }
}

@kruskall
Copy link
Member

Got the same issue with your code above:

rpc error: code = Unauthenticated desc = unexpected HTTP status code received from server: 401 (Unauthorized); transport: received unexpected content-type "application/json"
exit status 1

Further testing reveals that the behaviour is different between 8.10 and 8.11.

On a 8.10 deployment I get the above error
On a 8.11 deployment I get no error but there's no data in discover

@axw
Copy link
Member

axw commented Oct 27, 2023

@kruskall in discover, make sure you're looking back far enough. The test program creates a log in 1970.

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-skip Skip notification from the automated backport with mergify test-plan test-plan-ok v8.11.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants