Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Filtering Options for Scanning (Tags, Extensions, Paths, Size) #1205

Open
pgagnidze opened this issue May 23, 2024 · 3 comments

Comments

@pgagnidze
Copy link

pgagnidze commented May 23, 2024

Improve the cdk-serverless-clamscan construct with a filter property for scanning S3 objects based on tags, file extensions, S3 paths, and object size. Additionally, introduce configurable logic for both overall filtering criteria and tag-specific filtering, allowing different filters per bucket. These filters should also be configurable when dynamically adding buckets using the addSourceBucket method.

Proposed filter Property

The filter property will be an object applied per bucket, with the following sections:

  1. Tags: Check if the object is tagged with specific key-value pairs, with a configurable logic operator to determine matching criteria.
  2. File Extensions: Specific file types to scan.
  3. S3 Paths: Targeted S3 prefixes or paths.
  4. Object Size: Conditions to scan objects larger or smaller than specified sizes.
  5. Logic Operator: Defines the overall logic to combine the specified filters (default: ALL).

Configuration Example

Here’s an organized example showing the filter property per bucket:

Example:

new ServerlessClamscan(this, 'rClamscan', {
  buckets: [
    {
      bucket: bucket_1,
      filter: {
        tags: {
          criteria: { 
            "ScanRequired": "true",
            "Priority": "high"
          },
          logicOperator: 'ANY' // Can be 'ANY' or 'ALL' (default: ANY)
        },
        extensions: ['.mp4', '.jpeg', '.png'],
        paths: ['uploads/images/', 'uploads/videos/'],
        objectSize: {
          greaterThanBytes: 1024, // 1 KB, optional
          lessThanBytes: 10485760 // 10 MB, optional
        },
        logicOperator: 'ALL' // Can be 'ANY' or 'ALL' (default: ALL)
      }
    },
    {
      bucket: bucket_2,
      filter: {
        extensions: ['.exe', '.zip'],
        logicOperator: 'ALL' // Can be 'ANY' or 'ALL' (default: ALL)
      }
    }
  ]
});

// Adding a source bucket with filters dynamically
const sc = new ServerlessClamscan(this, 'rClamscan', { /* initial configuration */ });
sc.addSourceBucket(bucket_3, {
  filter: {
    tags: {
      criteria: { 
        "ScanRequired": "true"
      },
      logicOperator: 'ANY' // Can be 'ANY' or 'ALL' (default: ANY)
    },
    extensions: ['.docx', '.pdf'],
    paths: ['uploads/documents/'],
    objectSize: {
      lessThanBytes: 5242880 // 5 MB, optional
    },
    logicOperator: 'ALL' // Can be 'ANY' or 'ALL' (default: ALL)
  }
});

Scanning Behavior

  • Overall Logic Operator (default: ALL): If set to ALL, only objects meeting all specified criteria will be scanned. If set to ANY, an object meeting any of the specified criteria will be scanned.
  • Tag Logic Operator (default: ANY): Determines if any specified tags must match. If set to ALL, all specified tags must match.
  • Object Size Conditions: Users can specify either greaterThanBytes or lessThanBytes, or both, depending on their needs.

This feature maintains backward compatibility by ensuring that if no filter is specified, all objects are scanned.

Benefits

  • Cost Efficiency: Lower Lambda invocation costs by skipping unnecessary scans.
  • Flexibility: Multiple filters to meet diverse needs, all within a single, unified configuration.
  • Targeted Security: An organization can focus on scanning only certain paths where sensitive documents are uploaded.

Looking forward to your feedback and thank you for considering this feature request!

@pgagnidze
Copy link
Author

The addEventNotification method on the bucket already supports prefix and suffix filters, which can be used for S3 path and file extension filtering. This setup ensures that the Lambda function is triggered only for relevant objects. The Lambda function can then handle additional checks for object size and tags.

@dontirun
Copy link
Contributor

I like the idea. A few initial comments

  • the construct will need a new filteredBuckets property to maintain backwards compatibility
  • the overall ANY logic operator can not be implemented if using S3 notification logic for both prefix and suffix filters

@pgagnidze
Copy link
Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants