Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix flint skipping index syntax issues #1846

Merged

Conversation

YANG-DB
Copy link
Member

@YANG-DB YANG-DB commented May 22, 2024

Description

update vpc flow logs & cloud trail flint related integration content changes

Issues Resolved

VPC

Minor fix for the VPC Flow Log Integration for Flint Version 1.1.0
Fix skipping index related issue
Fix table definition from json to parquet to match the VPC log based producer protocol

CloudTrail

Minor fix for the Amazon Log Integration for Flint Version 1.1.0
Fix skipping index related issue
Update table creation statement according to Athena DDL Statement
See related Athena S3 setup tutorial

Check List

  • New functionality includes testing.
    • All tests pass, including unit test, integration test and doctest
  • New functionality has been documented.
    • New functionality has javadoc added
    • New functionality has user manual doc added
  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

YANG-DB added 6 commits May 22, 2024 11:32
 - vpc flow
 - cloud trail

Signed-off-by: YANGDB <[email protected]>
 - vpc flow
 - cloud trail

Signed-off-by: YANGDB <[email protected]>
 - vpc flow
 - cloud trail

Signed-off-by: YANGDB <[email protected]>
 - vpc flow
 - cloud trail

Signed-off-by: YANGDB <[email protected]>
 - vpc flow
 - cloud trail

Signed-off-by: YANGDB <[email protected]>
 - vpc flow
 - cloud trail
 - multiple records protocol support

Signed-off-by: YANGDB <[email protected]>
},
{
"name": "dashboards-flint-records",
"label": "Dashboards & Visualizations adapted to Flint",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same label as above, will this be confusing to the user?

Copy link
Collaborator

@RyanL1997 RyanL1997 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In generally LGTM. Just left some minor thoughts.

"version": "1.0.0",
"extension": "sql",
"type": "query",
"workflows": ["dashboards-flint"]
},
{
"name": "create_mv_cloud-trail",
"name": "create_mv_cloud-trail-records",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for my knowledge, what is the record referencing to?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extended format for multiple records shown here

Copy link
Collaborator

@Swiddis Swiddis May 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do I understand right that we only have acceleration for single-record, and multi-record has queries without acceleration? LGTM from a technical standpoint, but I wonder how common multi-record is compared to single-record/what the impact of that is.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes - this is due to the existing limitation caused by the skipping index creation statement

@@ -58,5 +58,5 @@ CREATE EXTERNAL TABLE IF NOT EXISTS {table_name} (
accountid STRING,
eventday STRING
)
USING json
USING parquet
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. Unrelated to this change, but I do notice that there is an empty file called "create_mv_vpc-1.0.0.sql" in the vpc asset directory. Should we also remove that?

 - vpc flow
 - cloud trail
 - multiple records protocol support

Signed-off-by: YANGDB <[email protected]>
@YANG-DB YANG-DB added integrations Used to denote items related to the Integrations project ux-integration ux related integration issues labels May 22, 2024
 - vpc flow
 - cloud trail
 - multiple records protocol support

Signed-off-by: YANGDB <[email protected]>
rec.tlsDetails.clientProvidedHostHeader AS `aws.cloudtrail.tlsDetailsclient_provided_host_header`
FROM
{table_name}
LATERAL VIEW explode(Records) myTable AS rec
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there issues with having the table alias default to myTable? Looks like it probably won't cause issues since it's just a virtual table but I'm not sure where the virtual table actually gets stored or if this should be more descriptively named.

Copy link
Collaborator

@Swiddis Swiddis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know we discussed at one point enabling the create table query by default so it doesn't cause an error if running with only queries selected and no other flows -- would now be a good time to implement that? OTOH I don't want to rock the boat since it's not in-scope for the immediate issue.

Aside from that, there's one unescaped field still -- did a full diff with the known-working version we made yesterday and that's the only delta so I can approve after that.

`src_endpoint.ip` BLOOM_FILTER,
`dst_endpoint.ip` BLOOM_FILTER,
`src_endpoint.svc_name` VALUE_SET,
`dst_endpoint.svc_name` VALUE_SET,
traffic.bytes MIN_MAX
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

`traffic.bytes` ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch - thanks

@Swiddis Swiddis added the bug Something isn't working label May 23, 2024
 - vpc flow
 - cloud trail
 - multiple records protocol support

Signed-off-by: YANGDB <[email protected]>
@YANG-DB YANG-DB requested review from amsiglan and RyanL1997 May 23, 2024 02:38
Comment on lines +5 to +9
`src_endpoint.ip` BLOOM_FILTER,
`dst_endpoint.ip` BLOOM_FILTER,
`src_endpoint.svc_name` VALUE_SET,
`dst_endpoint.svc_name` VALUE_SET,
`traffic.bytes` MIN_MAX
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we removing the field request_processing_time MIN_MAX,?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't exist in the data schema, so it wasn't supposed to be there anyways

OPTIONS (
compression='gzip',
recursivefilelookup='true',
multiline 'true'
Copy link
Member

@ps48 ps48 May 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question? Should this be multiLine='true' .

recursivefilelookup='true'
PATH '{s3_bucket_location}',
recursivefilelookup='true',
multiline 'true'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is multiline part case sensitive. I've usually seen commands with multiLine L captialised.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PATH='{s3_bucket_location}',
recursivefilelookup='true',
multiLine='true'

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think both multiline and multiLine are be accepted

@YANG-DB YANG-DB merged commit 0d2a1c7 into opensearch-project:main May 28, 2024
13 of 19 checks passed
opensearch-trigger-bot bot pushed a commit that referenced this pull request May 28, 2024
* update flint related issues for
 - vpc flow
 - cloud trail
 - multiple records protocol support

Signed-off-by: YANGDB <[email protected]>

* update flint vega ip sankey visualization query

Signed-off-by: YANGDB <[email protected]>

* update flint vega ip sankey visualization query

Signed-off-by: YANGDB <[email protected]>

---------

Signed-off-by: YANGDB <[email protected]>
(cherry picked from commit 0d2a1c7)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
@opensearch-trigger-bot
Copy link
Contributor

The backport to 2.13 failed:

The process '/usr/bin/git' failed with exit code 128

To backport manually, run these commands in your terminal:

# Navigate to the root of your repository
cd $(git rev-parse --show-toplevel)
# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/dashboards-observability/backport-2.13 2.13
# Navigate to the new working tree
pushd ../.worktrees/dashboards-observability/backport-2.13
# Create a new branch
git switch --create backport/backport-1846-to-2.13
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 0d2a1c7361520e52a4f5a014e19ee3d38bb91eeb
# Push it to GitHub
git push --set-upstream origin backport/backport-1846-to-2.13
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/dashboards-observability/backport-2.13

Then, create a pull request where the base branch is 2.13 and the compare/head branch is backport/backport-1846-to-2.13.

YANG-DB pushed a commit that referenced this pull request May 29, 2024
* update flint related issues for
 - vpc flow
 - cloud trail
 - multiple records protocol support

* update flint vega ip sankey visualization query



---------


(cherry picked from commit 0d2a1c7)

Signed-off-by: YANGDB <[email protected]>
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x backport 2.13 backport-failed bug Something isn't working integrations Used to denote items related to the Integrations project ux-integration ux related integration issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants