Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spurious "Failed to identify trace type" error messages #247

Open
jlewi opened this issue Sep 18, 2024 · 0 comments
Open

Spurious "Failed to identify trace type" error messages #247

jlewi opened this issue Sep 18, 2024 · 0 comments

Comments

@jlewi
Copy link
Owner

jlewi commented Sep 18, 2024

Our logs are showing error messages like the following.

{
  "insertId": "p01ulcfteojll",
  "jsonPayload": {
    "caller": "analyze/analyzer.go:328",
    "function": "github.com/jlewi/foyle/app/pkg/analyze.(*Analyzer).combineAndCheckpoint",
    "targetTraceId": "8d588abbd5b62374306f3f664f339922",
    "severity": "error",
    "errorVerbose": "Failed to identify trace type\ngithub.com/jlewi/foyle/app/pkg/analyze.combineEntriesForTrace\n\t/Users/jlewi/git_foyle/app/pkg/analyze/analyzer.go:741\ngithub.com/jlewi/foyle/app/pkg/analyze.(*Analyzer).buildTrace\n\t/Users/jlewi/git_foyle/app/pkg/analyze/analyzer.go:523\ngithub.com/jlewi/foyle/app/pkg/analyze.(*Analyzer).combineAndCheckpoint\n\t/Users/jlewi/git_foyle/app/pkg/analyze/analyzer.go:327\ngithub.com/jlewi/foyle/app/pkg/analyze.(*Analyzer).processLogFile\n\t/Users/jlewi/git_foyle/app/pkg/analyze/analyzer.go:307\ngithub.com/jlewi/foyle/app/pkg/analyze.(*Analyzer).handleLogFileEvents.func1\n\t/Users/jlewi/git_foyle/app/pkg/analyze/analyzer.go:205\ngithub.com/jlewi/foyle/app/pkg/analyze.(*Analyzer).handleLogFileEvents\n\t/Users/jlewi/git_foyle/app/pkg/analyze/analyzer.go:218\nruntime.goexit\n\t/opt/homebrew/opt/go/libexec/src/runtime/asm_arm64.s:1222",
    "time": 1726687817.459125,
    "message": "Error building trace",
    "error": "Failed to identify trace type"
  },
  "resource": {
    "type": "project",
    "labels": {
      "project_id": "foyle-dev"
    }
  },
  "timestamp": "2024-09-18T19:30:17.459125041Z",
  "severity": "ERROR",
  "logName": "projects/foyle-dev/logs/foyle",
  "receiveTimestamp": "2024-09-18T19:30:18.112128745Z"
}

If we search for that trace we see that it has an AgentGenerate message.

{
  "insertId": "3weoyfeyrt2a",
  "jsonPayload": {
    "contextId": "01J7Z2GJWGJKHTKD4N2X5RJ25X",
    "severity": "info",
    "message": "Agent.Generate",
    "time": 1726545957.47473,
    "caller": "agent/agent.go:105",
    "evalMode": false,
    "streamTraceId": "f8f2b5f7c233c88c1114c0a0ad623223",
    "request": {
      "doc": {
        "blocks": [
          {
            "outputs": [],
            "trace_ids": [],
            "language": "markdown",
            "contents": "# Controlling Anthropic Costs",
            "id": "",
            "kind": "MARKUP"
          },
          {
            "language": "markdown",
            "kind": "MARKUP",
            "contents": "* Why are we spending so much on Anthropic?\n* How many LLM calls are we making?\n* We are logging data to Honeycomb but\n\n   * I'm not seeing any traces for Generate; only StreamGenerate\n   * I'm not seeing any traces for Anthropic requests\n\n* I tried setting the DefaultTransport for the client but I'm still not seeing events.\n* Interestingly we are getting events for\n   google.logging.v2.LoggingServiceV2/WriteLogEntries\n* I tried skipping the retryable client and it still didn't work.\n* I think Google Cloud Logging is using gRPC\n* The OTEL roundtripper is being invoked\n* I checked the Anthropic client and the body is being closed?\n* When I directly called generate it worked\n\n   * I think the problem is when I call GenerateCells from StreamGenerate there is no context\n\n* I take that back;\n\n   * Here's an event; https://ui.honeycomb.io/autobuilder/environments/prod/datasets/foyle/result/aFeSuUieHzs/trace/qwdytg1Mdgp?fields[]=s_name&fields[]=s_serviceName&span=12449317177fbeee\n   * It looks like I was looking at the wrong root event\n   * Here's where we generate a new trace; https://github.com/jlewi/foyle/blob/5ab52e5ed4e97437fd7e42dfe2d88ef07dfbd864/app/pkg/agent/agent.go#L471",
            "trace_ids": [],
            "outputs": [],
            "id": ""
          },
          {
            "contents": "## Generate a Honeycomb Query",
            "id": "",
            "trace_ids": [],
            "language": "markdown",
            "outputs": [],
            "kind": "MARKUP"
          },
          {
            "outputs": [],
            "trace_ids": [],
            "contents": "* Generate a honeycomb query to find the traces named CreateCompletion\n* Use the hccli tool to run the query and print the link",
            "kind": "MARKUP",
            "id": "",
            "language": "markdown"
          },
          {
            "trace_ids": [],
            "id": "01J71GB2FJS4GMAQAM5HY7DTG8",
            "contents": "# Generate the Honeycomb query\nQUERY='{\"calculations\":[{\"op\":\"COUNT\"}],\"filters\":[{\"column\":\"name\",\"op\":\"=\",\"value\":\"CreateCompletion\"}],\"time_range\":86400}'\n\n# Run the query using hccli\nhccli querytourl --dataset foyle --query \"$QUERY\" --base-url https://ui.honeycomb.io/autobuilder/environments/prod",
            "language": "bash",
            "kind": "CODE",
            "outputs": []
          },
          {
            "contents": "Generate a query to count llm.input_tokens",
            "trace_ids": [],
            "language": "markdown",
            "kind": "MARKUP",
            "outputs": [],
            "id": ""
          },
          {
            "id": "01J71HWHYNVHXN9ATDJDCT480W",
            "contents": "QUERY='{\"calculations\":[{\"op\":\"COUNT\"},{\"op\":\"SUM\",\"column\":\"llm.input_tokens\"},{\"op\":\"SUM\",\"column\":\"llm.output_tokens\"}],\"filters\":[{\"column\":\"name\",\"op\":\"=\",\"value\":\"CreateCompletion\"}],\"time_range\":86400}'\nhccli querytourl --dataset foyle --query \"$QUERY\" --base-url https://ui.honeycomb.io/autobuilder/environments/prod",
            "kind": "CODE",
            "outputs": [],
            "trace_ids": [],
            "language": "bash"
          },
          {
            "trace_ids": [],
            "contents": "* So two questions we want to answer are\n\n   * How many LLM calls are we making per cell generation?\n   * How many Input/Output tokens are we consuming per cell?\n\n* Traces have a complete ID field so we can use that to group by cell",
            "kind": "MARKUP",
            "language": "markdown",
            "id": "",
            "outputs": []
          },
          {
            "contents": "QUERY='{\"calculations\":[{\"op\":\"COUNT\", \"alias\": \"Total_LLM_Calls\"},{\"op\":\"SUM\",\"column\":\"llm.input_tokens\", \"alias\": \"Total_Input_Tokens\"},{\"op\":\"SUM\",\"column\":\"llm.output_tokens\", \"alias\": \"Total_Output_Tokens\"}],\"filters\":[{\"column\":\"name\",\"op\":\"=\",\"value\":\"CreateCompletion\"}],\"time_range\":86400}'\nhccli querytourl --dataset foyle --query \"$QUERY\" --base-url https://ui.honeycomb.io/autobuilder/environments/prod",
            "language": "bash",
            "outputs": [],
            "kind": "CODE",
            "trace_ids": [],
            "id": "01J71JP3RGMMVRP982YA47HNT7"
          },
          {
            "outputs": [],
            "language": "markdown",
            "id": "",
            "contents": "* The sum of the llm tokens is showing zero\n* I think this is because the llm tokens and output tokens are set on the Complete span\n* So we need to query for the Complete Span",
            "kind": "MARKUP",
            "trace_ids": []
          },
          {
            "id": "01J71JT2H2R4TZ0ZAVSVRH87D8",
            "trace_ids": [],
            "kind": "CODE",
            "language": "bash",
            "outputs": [
              {
                "items": [
                  {
                    "mime": "stateful.runme/terminal",
                    "text_data": "{\n\t\"type\": \"stateful.runme/terminal\",\n\t\"output\": {\n\t\t\"runme.dev/id\": \"01J71JT2H2R4TZ0ZAVSVRH87D8\",\n\t\t\"content\": \"\",\n\t\t\"initialRows\": 10,\n\t\t\"enableShareButton\": true,\n\t\t\"isAutoSaveEnabled\": false,\n\t\t\"isPlatformAuthEnabled\": false,\n\t\t\"isSessionOutputsEnabled\": true,\n\t\t\"backgroundTask\": true,\n\t\t\"nonInteractive\": false,\n\t\t\"interactive\": true,\n\t\t\"fontSize\": 16,\n\t\t\"fontFamily\": \"Menlo, Monaco, 'Courier New', monospace\",\n\t\t\"rows\": 10,\n\t\t\"cursorStyle\": \"bar\",\n\t\t\"cursorBlink\": true,\n\t\t\"cursorWidth\": 1,\n\t\t\"smoothScrollDuration\": 0,\n\t\t\"scrollback\": 1000,\n\t\t\"closeOnSuccess\": true\n\t}\n}"
                  },
                  {
                    "text_data": "",
                    "mime": "application/vnd.code.notebook.stdout"
                  }
                ]
              }
            ],
            "contents": "QUERY='{\n  \"calculations\": [\n    {\"op\": \"COUNT\", \"alias\": \"LLM_Calls_Per_Cell\"},\n    {\"op\": \"SUM\", \"column\": \"llm.input_tokens\", \"alias\": \"Input_Tokens_Per_Cell\"},\n    {\"op\": \"SUM\", \"column\": \"llm.output_tokens\", \"alias\": \"Output_Tokens_Per_Cell\"}\n  ],\n  \"filters\": [\n    {\"column\": \"name\", \"op\": \"=\", \"value\": \"Complete\"}\n  ],\n  \"breakdowns\": [\"trace.trace_id\"],\n  \"time_range\": 86400,\n  \"order_by\": [{\"op\": \"COUNT\", \"order\": \"descending\"}]\n}'\n\nhccli querytourl --dataset foyle --query \"$QUERY\" --base-url https://ui.honeycomb.io/autobuilder/environments/prod --open"
          },
          {
            "outputs": [],
            "id": "",
            "language": "markdown",
            "kind": "MARKUP",
            "trace_ids": [],
            "contents": "* So Input tokens is quite large O(10K-15K) output is O(100-500)\n* I suspect the variability of the input tokens is due to the context coming in from RAG\n* Anthropic is \\$3 per 1M input tokens or about $.03 per 10K toens\n* Output is about .0015 per 100 tokens at $15 per 1M tokens\n* Group the Complete traces by the contextID field and count them"
          },
          {
            "contents": "# Query to get the top 100 traces by input token usage\nQUERY='{\n  \"calculations\": [\n    {\"op\": \"COUNT\", \"alias\": \"LLM_Calls_Per_Cell\"}\n  ],\n  \"filters\": [\n    {\"column\": \"name\", \"op\": \"=\", \"value\": \"Complete\"}\n  ],\n  \"breakdowns\": [\"contextID\"],\n  \"time_range\": 86400,  \n  \"limit\": 100\n}'\n\nhccli querytourl --dataset foyle --query \"$QUERY\" --base-url https://ui.honeycomb.io/autobuilder/environments/prod",
            "id": "01J71K6Q383MQEQ32A592Y8S3W",
            "kind": "CODE",
            "language": "bash",
            "trace_ids": [],
            "outputs": []
          },
          {
            "kind": "MARKUP",
            "language": "markdown",
            "contents": "* Looks like we are at 4-10 calls per contextID\n* 10 calls that's about $.30 per cell\n* copilot is $10 per month\n* That's approximately 333 cells per month\n* My token usage right now is between 1million and 4million tokens per day and 20Kto50K output\n* So right now I'm roughly 10x too expensive",
            "trace_ids": [],
            "outputs": [],
            "id": ""
          },
          {
            "outputs": [],
            "kind": "MARKUP",
            "contents": "## Acceptance events",
            "language": "markdown",
            "id": "",
            "trace_ids": []
          },
          {
            "trace_ids": [],
            "id": "",
            "kind": "MARKUP",
            "contents": "* Are we logging any LOG events\n* I'm not seeing any traces for LogEvents\n* Lets query the logs using JQ to check for any LogEvent messages",
            "outputs": [],
            "language": "markdown"
          },
          {
            "trace_ids": [],
            "kind": "CODE",
            "contents": "LASTLOG=$(ls -t ~/.foyle/logs/raw/*.json | head -n 1)\necho ${LASTLOG}\njq 'select(has(\"eventType\"))' \"$LASTLOG\"",
            "id": "01J7279EYWEDWGQJPM4DW3136D",
            "language": "bash",
            "outputs": []
          },
          {
            "contents": "* So indeed we have no LOGEVENTS so maybe we need to install and build a newer version of the frontend\n* Looks like I had 3.7.3 installed; probably auto upgraded to non dev build. So I built and installed the latest from main.\n* v3.7.4-edge.1-jlewi\n* So lets rerun the query to check logs for even types",
            "language": "markdown",
            "kind": "MARKUP",
            "id": "",
            "outputs": [],
            "trace_ids": []
          },
          {
            "kind": "CODE",
            "id": "01J7281Q3QHJMGH6NH7J6XT4J5",
            "contents": "LASTLOG=$(ls -t ~/.foyle/logs/raw/*.json | head -n 1)\necho \"Using log file: ${LASTLOG}\"\njq 'select(has(\"eventType\"))' \"${LASTLOG}\"",
            "trace_ids": [],
            "outputs": [],
            "language": "bash"
          },
          {
            "contents": "* Success!\n* Now generate a honeycomb query to look for traces LogEvents RPC",
            "id": "",
            "outputs": [],
            "language": "markdown",
            "kind": "MARKUP",
            "trace_ids": []
          },
          {
            "outputs": [],
            "language": "bash",
            "contents": "QUERY='{\n  \"calculations\": [\n    {\"op\": \"COUNT\", \"alias\": \"Event_Count\"}\n  ],\n  \"filters\": [\n     {\"column\": \"name\", \"op\": \"=\", \"value\": \"LogEvents\"}\n  ],\n  \"breakdowns\": [\"eventType\"],\n  \"time_range\": 86400,\n  \"order_by\": [{\"op\": \"COUNT\", \"order\": \"descending\"}]\n}'\n\nhccli querytourl --dataset foyle --query \"$QUERY\" --base-url https://ui.honeycomb.io/autobuilder/environments/prod",
            "trace_ids": [],
            "id": "01J7282S2VFGE8DSYHY5NV0044",
            "kind": "CODE"
          },
          {
            "contents": "echo 'https://ui.honeycomb.io/autobuilder/environments/prod/datasets/foyle?query={\"breakdowns\":[\"eventType\"],\"calculations\":[{\"op\":\"COUNT\"}],\"filters\":[{\"op\":\"=\",\"column\":\"name\",\"value\":\"LogEvents\"}],\"time_range\":86400}'",
            "outputs": [],
            "kind": "CODE",
            "language": "bash",
            "trace_ids": [],
            "id": "01J7287NENGHATDN4RGN1VSA3C"
          },
          {
            "id": "",
            "trace_ids": [],
            "kind": "MARKUP",
            "language": "markdown",
            "contents": "* Now generate a honeycomb query to look at traces named LogEvent which I just added\n* We want to veriy the spans got set",
            "outputs": []
          },
          {
            "contents": "QUERY='{\n  \"calculations\": [\n    {\"op\": \"COUNT\", \"alias\": \"Event_Count\"}\n  ],\n  \"filters\": [\n    {\"column\": \"name\", \"op\": \"=\", \"value\": \"LogEvent\"}\n  ],\n  \"breakdowns\": [\"eventType\"],\n  \"time_range\": 86400,\n  \"order_by\": [{\"op\": \"COUNT\", \"order\": \"descending\"}]\n}'\n\nhccli querytourl --dataset foyle --query \"$QUERY\" --base-url https://ui.honeycomb.io/autobuilder/environments/prod",
            "kind": "CODE",
            "outputs": [],
            "id": "01J7298DV5Q01TCMV9H8SBM2Q0",
            "trace_ids": [],
            "language": "bash"
          },
          {
            "trace_ids": [],
            "outputs": [],
            "language": "markdown",
            "contents": "## Measure Efficacy",
            "id": "",
            "kind": "MARKUP"
          },
          {
            "outputs": [],
            "trace_ids": [],
            "contents": "* So to measure efficacy we want to look at the number of accepted events",
            "language": "markdown",
            "id": "",
            "kind": "MARKUP"
          },
          {
            "language": "bash",
            "kind": "CODE",
            "outputs": [
              {
                "items": [
                  {
                    "mime": "stateful.runme/terminal",
                    "text_data": "{\n\t\"type\": \"stateful.runme/terminal\",\n\t\"output\": {\n\t\t\"runme.dev/id\": \"01J729C3MV620S8ND840TZ16G2\",\n\t\t\"content\": \"\",\n\t\t\"initialRows\": 10,\n\t\t\"enableShareButton\": true,\n\t\t\"isAutoSaveEnabled\": false,\n\t\t\"isPlatformAuthEnabled\": false,\n\t\t\"isSessionOutputsEnabled\": true,\n\t\t\"backgroundTask\": true,\n\t\t\"nonInteractive\": false,\n\t\t\"interactive\": true,\n\t\t\"fontSize\": 16,\n\t\t\"fontFamily\": \"Menlo, Monaco, 'Courier New', monospace\",\n\t\t\"rows\": 10,\n\t\t\"cursorStyle\": \"bar\",\n\t\t\"cursorBlink\": true,\n\t\t\"cursorWidth\": 1,\n\t\t\"smoothScrollDuration\": 0,\n\t\t\"scrollback\": 1000,\n\t\t\"closeOnSuccess\": true\n\t}\n}"
                  },
                  {
                    "mime": "application/vnd.code.notebook.stdout",
                    "text_data": ""
                  }
                ]
              }
            ],
            "contents": "QUERY='{\n  \"calculations\": [\n    {\"op\": \"COUNT\", \"alias\": \"Event_Count\"}\n  ],\n  \"filters\": [\n    {\"column\": \"name\", \"op\": \"=\", \"value\": \"LogEvent\"},\n    {\"column\": \"eventType\", \"op\": \"=\", \"value\": \"ACCEPTED\"}\n  ],\n  \"time_range\": 86400,\n  \"order_by\": [{\"op\": \"COUNT\", \"order\": \"descending\"}]\n}'\n\nhccli querytourl --dataset foyle --query \"$QUERY\" --base-url https://ui.honeycomb.io/autobuilder/environments/prod --open",
            "id": "01J729C3MV620S8ND840TZ16G2",
            "trace_ids": []
          },
          {
            "trace_ids": [],
            "kind": "MARKUP",
            "outputs": [],
            "id": "",
            "language": "markdown",
            "contents": "* Interesting it looks like when I paste the URL into honeycomp `ACCEPTED` ends up being lowercased and as a result there are no matches to the query.\n* Is that a bug in the honeycomb UI?"
          },
          {
            "contents": "## Controlling costs\n\n* Input tokens is still very large.\n* Is that due to multiple requests per cell?\n* How would we measure the number of requests generated for each cell?\n\n",
            "id": "",
            "kind": "MARKUP",
            "trace_ids": [],
            "language": "markdown",
            "outputs": []
          },
          {
            "trace_ids": [],
            "language": "markdown",
            "kind": "MARKUP",
            "outputs": [],
            "id": "",
            "contents": "## Don't generate completions for code cells\n\n* As one way to control costs, we could avoid generating completions for code cells."
          },
          {
            "trace_ids": [],
            "contents": "QUERY='{\n  \"calculations\": [\n    {\"op\": \"COUNT\", \"alias\": \"Request_Count\"},\n    {\"op\": \"AVG\", \"column\": \"llm.input_tokens\", \"alias\": \"Avg_Input_Tokens\"},\n    {\"op\": \"P90\", \"column\": \"llm.input_tokens\", \"alias\": \"P90_Input_Tokens\"},\n    {\"op\": \"MAX\", \"column\": \"llm.input_tokens\", \"alias\": \"Max_Input_Tokens\"},\n    {\"op\": \"SUM\", \"column\": \"llm.input_tokens\", \"alias\": \"Total_Input_Tokens\"}\n  ],\n  \"filters\": [\n    {\"column\": \"name\", \"op\": \"=\", \"value\": \"Complete\"}\n  ],\n  \"breakdowns\": [\"contextID\"],\n  \"time_range\": 86400,\n  \"order_by\": [{\"column\": \"Total_Input_Tokens\", \"order\": \"descending\"}],\n  \"limit\": 100\n}'\n\nhccli querytourl --dataset foyle --query \"$QUERY\" --base-url https://ui.honeycomb.io/autobuilder/environments/prod --open",
            "kind": "CODE",
            "outputs": [],
            "id": "01J7Z2G58QE24ZZ2ZGKBFWH5TA",
            "language": "bash"
          }
        ]
      }
    },
    "function": "github.com/jlewi/foyle/app/pkg/agent.(*Agent).Generate",
    "traceId": "8d588abbd5b62374306f3f664f339922"
  },
  "resource": {
    "type": "project",
    "labels": {
      "project_id": "foyle-dev"
    }
  },
  "timestamp": "2024-09-17T04:05:57.474730014Z",
  "severity": "INFO",
  "logName": "projects/foyle-dev/logs/foyle",
  "receiveTimestamp": "2024-09-17T04:05:57.955291283Z"
}

So combineTraces should correctly identify it as a generate trace. We periodically call combineAndCheckpoint which does the combine

a.combineAndCheckpoint(ctx, path, offset, traceIDs)

So I think the problem is we are calling combine for the trace but we haven't recieved all of the log messages for that trace.
This should be fine because when the log messages do arrive for that trace they will trigger a combine.
So we should probably update the code to not report an error in this case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant