Skip to content

cloudaware/aws-cur-filter

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 

Repository files navigation

aws-cur-filter

NOTE: Dependencies contains parquet-hadoop:1.13.1-ca-SNAPSHOT this version from parquet-mr fork https://github.com/cloudaware/parquet-mr/tree/apache-parquet-1.13.1-ca

NOTE: at 1.0.2 version build process was change to fat jar, so before an update clean s3://outputFolder/aws-cur-filter/

Usage:

java -jar aws-cur-filter/aws-cur-filter-1.0.5-jar-with-dependencies.jar "${REPORT_NAME}" "${REPORT_PREFIX}" "${INPUT_BUCKET}" "${OUTPUT_BUCKET}"  "${LINKED_ACCOUNT_IDS}" ["periodPrefix"]

Sample IAM Role Policy for EC2 Instance

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "s3:*",
            "Resource": [
                "arn:aws:s3:::inputBucket",
                "arn:aws:s3:::outputBucket"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "s3:GetObject"
            ],
            "Resource": [
                "arn:aws:s3:::inputBucket/*"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "s3:PutObject",
                "s3:GetObject",
                "s3:DeleteObject"
            ],
            "Resource": [
                "arn:aws:s3:::outputBucket/*"
            ]
        }
    ]
}

Sample user-data for EC2 instance, for this script folder JAR file aws-cur-filter-1.0.5-jar-with-dependencies.jar should be placed at s3://outputFolder/aws-cur-filter/

#! /bin/bash
INPUT_BUCKET="inputBucket"
OUTPUT_BUCKET="outputBucket"
REPORT_NAME="reportName"
REPORT_PREFIX="reportPrefix/reportName/"
LINKED_ACCOUNT_IDS="0000000000,11111111111"

sudo yum install -y java-1.8.0-openjdk

aws s3 sync s3://${OUTPUT_BUCKET}/aws-cur-filter aws-cur-filter

java -jar aws-cur-filter/aws-cur-filter-1.0.5-jar-with-dependencies.jar "${REPORT_NAME}" "${REPORT_PREFIX}" "${INPUT_BUCKET}" "${OUTPUT_BUCKET}"  "${LINKED_ACCOUNT_IDS}"

#if needed add EC2 instance self terminating, for example use Shutdown behavior - Terminaton
shutdown -h now

Changelog

  • 1.0.5 - Increase spark and hadoop versions, retry logic
  • 1.0.4 - Increase spark and hadoop versions
  • 1.0.3 - Added generating new AssemblyId (inputAssemblyId + linkedAccountIds) at output Manifest file. With change of Linked Account IDs all output periods will be processed again.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages