Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Write the 2nd Lamda function that matches the data through Exp_ID #35

Open
6 tasks done
Selkubi opened this issue Oct 1, 2024 · 3 comments
Open
6 tasks done
Assignees

Comments

@Selkubi
Copy link

Selkubi commented Oct 1, 2024

This function should get the data in a metadata format, matches it with the Exp_ID that is manually entered in the metadata and appends all the new information under a dictionary called "meta". This way we can separate dat and metadata when querying and this is especially helpful when we dont know the names of the attributes that will be used in the queries.

  • write a read function
  • write a Exp_ID matching condition
  • write an example script for nested querying/scanning that will be needed with the current setup
  • write a function that appends the metadata to the right experiment~Ligand# with a List Comprehension ?
  • Write tests that would run on an example metadata
    - [x] csv read test from S3
    - [ ] write test to dynamodb
  • Write an ordered exp_ID generator
@Selkubi Selkubi self-assigned this Oct 1, 2024
@Selkubi
Copy link
Author

Selkubi commented Oct 10, 2024

Problems and current solutions
1- The S3 bucket does not process UTF8 CSV as good as simple CSV (at least the ones coming out of my excel. Try to use always the csv(comma delimited) when converting as possible (this can not be fixed by defining type in the lambda function using open(file, mode='r', encoding='utf-8'))

@Selkubi
Copy link
Author

Selkubi commented Oct 11, 2024

Setting up the second lamda function attaching the metadata

  • Below I wrote a guide for setting up the second lambda function with the correct read and write access. Note that for this function we wont be using SNS service.
  • The python code for attaching the metadata to the main data table in dynamoDB is in the attachments. Make sure to update the data model to the format indicated below since this works better in matching the experiment IDs of the metadata to the original db table. Note that I have mostly stick to the data model that we have discussed but you have to check if I have assign the correct data type to each variable (and its NA condition).
  • Note that I have changed the dynamoDB data model to include a sort key. You have to update the dyanmodb table (keep the name the same for minimum change) as well as the 1st lambda code according to this new data model.

Step 1: Create an S3 Bucket

  1. Go to the S3 Console: Open the AWS Management Console, then search for and select S3.
  2. Create a New Bucket:
    • Click Create bucket.
    • Enter a unique bucket name (e.g., s3-metadata-obelix).
    • Choose the appropriate region (e.g., eu-central-1).
    • Click Create bucket.

Step 2: Create a DynamoDB Table

  1. Go to the DynamoDB console:
  2. Check if the table which will be used for appending data has the correct data model:
    • The primary key is Exp_ID and sort key is Ligand# as a string or number depending on your use case.
    • Make note of the table name. this will be required in the role creation s to give the correct write access.

Step 3: Create an IAM Role for Lambda

  1. Go to the IAM Console.
  2. Create a New Role:
    • Select Lambda as the trusted entity type, then click Next: Permissions.
  3. Attach Policies:
    • Attach the following managed policies:
      • AmazonS3ReadOnlyAccess (for read access to the S3 bucket).
      • AmazonDynamoDBFullAccess (for read and write access to DynamoDB).
  4. Create a Custom S3 Policy for Restricted Access:
    • Open the IAM console in another tab and click 'policies' in the sidepanel
    • Click Create policy and switch to the JSON editor.
    • Use this policy to grant the Lambda function access to the specific S3 bucket:
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "logs:CreateLogGroup",
                "logs:CreateLogStream",
                "logs:PutLogEvents"
            ],
            "Resource": "arn:aws:logs:*:*:*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "s3:GetObject",
                "s3:ListBucket"
            ],
            "Resource": [
                "arn:aws:s3:::<s3-bucket-name>/*",
                "arn:aws:s3:::<s3-bucket-name>"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "dynamodb:*"
            ],
            "Resource": "arn:aws:dynamodb:eu-central-1:058264498638:table/<db-table-name>"
        },
        {
            "Effect": "Allow",
            "Action": [
                "dynamodb:ListTables",
                "dynamodb:Scan",
                "dynamodb:UpdateItem"
            ],
            "Resource": "arn:aws:dynamodb:eu-central-1:058264498638:table/<db-table-name>/*"
        }
    ]
} 
  • Save the policy and attach it to the IAM role.
  • make sure to update the s3 bucket names and db table names as well as the correct arn numbers that belongs to the account making use of the lambda function
  1. Name the Role (e.g., lambda_import_csv) and click Create role.

Step 4: Create the Lambda Function

  1. Go to the Lambda Console.
  2. Create a New Lambda Function:
    • Click Create function.
    • Choose Author from scratch.
    • Enter a name (e.g., s3-metadata-obelix).
    • Select Python 3.8 or later as the runtime.
  3. Set Permissions:
    • Under Permissions, select Use an existing role and choose the IAM role created in Step 3 (lambda_import_csv).
    • Click Create function.
  4. Add Lambda Code:
    • Copy the provided Lambda function code into the editor, ensuring it references the correct S3 bucket (s3-metadata-obelix) and DynamoDB table (eg. obelixtest_sort_key).
    • Click Deploy.
  5. Publish the Function Version:
    • Go to Actions > Publish new version to ensure a versioned deployment of your Lambda function.

Step 5: Set up the S3 Trigger

  1. In the Lambda Function Configuration, go to Add trigger.
  2. Choose S3:
    • Select S3 as the trigger source.
    • Choose the bucket name (e.g., s3-metadata-obelix).
    • Set Event type to All object create events.
  3. Save the Trigger by clicking Add.

Step 6: Configure the S3 Bucket Policy

  1. Go to the S3 Bucket Permissions:
    • In the Permissions tab, select Bucket policy.
  2. Add the Bucket Policy:
    • Use the following JSON to allow the Lambda function to access the bucket. Replace YOUR_ACCOUNT_ID with the AWS Account ID and update the bucket name if necessary:
 {
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::058264498638:role/lambda_import_csv"
            },
            "Action": [
                "s3:GetObject",
                "s3:ListBucket"
            ],
            "Resource": [
                "arn:aws:s3:::s3-metadata-obelix",
                "arn:aws:s3:::s3-metadata-obelix/*"
            ]
        }
    ]
}    
  1. Save Changes.

Step 7: Test the Function

  1. Upload a Test CSV File to the S3 bucket (s3-metadata-obelix).
  2. Verify Lambda Execution:
    • Check CloudWatch logs to confirm that the function processed the file.
    • Verify that the expected data has been added to or updated in DynamoDB (obelixtest_sort_key table).
  3. You can also create a test event that is based on an existing file in the S3 bucket
    -copy the code below as a test event into your aws code editor
{
  "Records": [
    {
      "s3": {
        "bucket": {
          "name": "s3-metadata-obelix"
        },
        "object": {
          "key": "metadata_all.csv"
        }
      }
    }
  ]
}

S3-to-dynamodb-sns-metadata-f37bb411-26a1-4833-909c-ee93bf358218.zip

@Selkubi
Copy link
Author

Selkubi commented Nov 13, 2024

To test querying from the DynamoDB table from your own environment, you need boto3. with that you can query according to any attribute or key. I've attached an example py script, ObelixNestedQuery.zip, to do simple queries. Make sure to write your own dynamodb Table name for the table variable and have the right authentications to be able to query from your IDE (for Vs code, you can use AWS CLI extension for this among others).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant