Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[8.x] [Automatic Import ] Improve KV and log type detection prompt improvements (#193136) #193178

Merged
merged 1 commit into from
Sep 17, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -68,15 +68,19 @@ export const KV_HEADER_PROMPT = ChatPromptTemplate.fromMessages([
],
[
'human',
`Looking at the multiple syslog samples provided in the context, our goal is to identify which RFC they belog to. Then create a regex pattern that can separate the header and the structured body.
`Looking at the multiple syslog samples provided in the context, your task is to separate the "header" and the "message body" from this log. Our goal is to identify which RFC they belong to. Then create a regex pattern that can separate the header and the structured body.
You then have to create a grok pattern using the regex pattern.
You are given a log entry in a structured format.

Follow these steps to identify the header pattern:
1. Identify if the log samples fall under RFC5424 or RFC3164. If not, return 'Custom Format'.
2. The log samples contain the header and structured body. The header may contain any or all of priority, timestamp, loglevel, hostname, ipAddress, messageId or any free-form text or non key-value information etc.,
3. Make sure the regex and grok pattern matches all the header information. Only the structured message body should be under GREEDYDATA in grok pattern.

You ALWAYS follow these guidelines when writing your response:
<guidelines>
- If you cannot match all the logs to the same RFC, return 'Custom Format' for RFC and provide the regex and grok patterns accordingly.
- If the message part contains any unstructured data , make sure to add this in regex pattern and grok pattern.
- Do not parse the message part in the regex. Just the header part should be in regex nad grok_pattern.
- Make sure to map the remaining message part to \'message\' in grok pattern.
- Make sure to map the remaining message body to \'message\' in grok pattern.
- Do not respond with anything except the processor as a JSON object enclosed with 3 backticks (\`), see example response above. Use strict JSON response format.
</guidelines>

Expand Down Expand Up @@ -110,12 +114,15 @@ export const KV_HEADER_ERROR_PROMPT = ChatPromptTemplate.fromMessages([
{errors}
</errors>

You ALWAYS follow these guidelines when writing your response:
Follow these steps to fix the errors in the header pattern:
1. Identify any mismatches, incorrect syntax, or logical errors in the pattern.
2. The log samples contain the header and structured body. The header may contain any or all of priority, timestamp, loglevel, hostname, ipAddress, messageId or any free-form text or non key-value information etc.,
3. The message body may start with a description, followed by structured key-value pairs.
4. Make sure the regex and grok pattern matches all the header information. Only the structured message body should be under GREEDYDATA in grok pattern.
You ALWAYS follow these guidelines when writing your response:
<guidelines>
- Identify any mismatches, incorrect syntax, or logical errors in the pattern.
- If the message part contains any unstructured data , make sure to add this in grok pattern.
- Do not parse the message part in the regex. Just the header part should be in regex nad grok_pattern.
- Make sure to map the remaining message part to \'message\' in grok pattern.
- Make sure to map the remaining message body to \'message\' in grok pattern.
- Do not respond with anything except the processor as a JSON object enclosed with 3 backticks (\`), see example response above. Use strict JSON response format.
</guidelines>

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -20,15 +20,19 @@ Here is some context for you to reference for your task, read it carefully as yo
[
'human',
`Looking at the log samples , our goal is to identify the syslog type based on the guidelines below.
Follow these steps to identify the log format type:
1. Go through each log sample and identify the log format type.
2. If the samples have any or all of priority, timestamp, loglevel, hostname, ipAddress, messageId in the beginning information then set "header: true".
3. If the samples have a syslog header then set "header: true" , else set "header: false". If you are unable to determine the syslog header presence then set "header: false".
4. If the log samples have structured message body with key-value pairs then classify it as "log_type: structured". Look for a flat list of key-value pairs, often separated by spaces, commas, or other delimiters.
5. Consider variations in formatting, such as quotes around values ("key=value", key="value"), special characters in keys or values, or escape sequences.
6. If the log samples have unstructured body like a free-form text then classify it as "log_type: unstructured".
7. If the log samples follow a csv format then classify it as "log_type: csv".
8. If the samples are identified as "csv" and there is a csv header then set "header: true" , else set "header: false".
9. If you do not find the log format in any of the above categories then classify it as "log_type: unsupported".

You ALWAYS follow these guidelines when writing your response:
<guidelines>
- Go through each log sample and identify the log format type.
- If the samples have a timestamp , loglevel in the beginning information then set "header: true".
- If the samples have a syslog header then set "header: true" , else set "header: false". If you are unable to determine the syslog header presence then set "header: false".
- If the syslog samples have structured body then classify it as "log_type: structured".
- If the syslog samples have unstructured body then classify it as "log_type: unstructured".
- If the syslog samples follow a csv format then classify it as "log_type: csv".
- If the samples are identified as "csv" and there is a csv header then set "header: true" , else set "header: false".
- If you do not find the log format in any of the above categories then classify it as "log_type: unsupported".
- Do not respond with anything except the updated current mapping JSON object enclosed with 3 backticks (\`). See example response below.
</guidelines>

Expand Down