Skip to content
This repository has been archived by the owner on Dec 17, 2021. It is now read-only.

Making the trustymail PSL cache read-only in Lambda #215

Merged
merged 5 commits into from
Mar 25, 2018

Conversation

jsf9k
Copy link
Collaborator

@jsf9k jsf9k commented Mar 22, 2018

This is necessary because the local filesystem when running in AWS Lambda is read-only.

The PSL cache file is also in a different location when running under Lambda, and this pull request takes care of that as well.

jsf9k added 2 commits March 22, 2018 14:40
The changes to trustymail don't take effect they are made after
trustymail.trustymail is imported.
@jsf9k jsf9k requested a review from a team March 22, 2018 20:52
trustymail.PublicSuffixListFilename = './public-suffix-list.txt'
# Monkey patching trustymail to make the PSL cache read-only
trustymail.PublicSuffixListReadOnly = True
import trustymail.trustymail as tmail
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why move the import statements to within the code, instead of at the top of the file?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The monkey patching can only happen here, since at the top of the file we don't know if we're running locally or in Lambda. The import trustymail.trustymail as tmail statement has to happen after the monkey patching, since otherwise it doesn't appear to have any effect on tmail.scan. (I spent quite a while debugging this very issue yesterday!)

I could put import trustymail and monkey patch for running locally at the top of the file, then re-patch inside the function if we're running in Lambda. The import trustymail.trustymail as tmail statement would still have to be inside the function though (since it has to come after the monkey patching), and I thought the code would be easiest to understand if I just kept everything together.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, got it.

This is your scanner and I'm happy to see this merged to support the change, but I left a comment over at cisagov/trustymail#74 about whether a new flag is necessary for this purpose.

We could also look at aligning the placement of the PSL to be inside cache/ for the Lambda function, which would eliminate the need for special-case code inside the trustymail scanner at all.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I responded to your comment over on cisagov/trustymail#74, but I'll include it here too:

The caching that trustymail uses (which is useful when trustymail is run standalone) also checks the PSL file to see if it's less than 24 hours old. So it would still try to update the Lambda PSL file if we didn't have the extra flag.

I agree that moving the PSL to be inside of the cache directory would be a good change to make. I can add that to this pull request or make a separate one if you like.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To copy my latest comment from that thread here too, for those reading:

With pshtt, I worked hard to ensure that we never had to download third party data, either from the internet or from S3. And what do you know, it looks like I also use some lambda/local special-case logic to do that in pshtt. So I should probably drop my complaint about having any Lambda-specific logic in the trustymail scanner.

I still think, just as advice to trustymail, that having trustymail manage a flag discussing the state of the read only file system is exposing the wrong layer of abstraction. I recommend having a flag which disables the 24-hour cache feature, rather than a flag which relates to a low-level implementation detail that conflicts with that feature.

I also filed #219 to capture some comments around repackaging the PSL and other third party data sources in scanners.

@jsf9k
Copy link
Collaborator Author

jsf9k commented Mar 25, 2018

@konklone, the CircleCI status seems to be hung up. I'm not sure what is going on with that.

@konklone konklone merged commit 1401bd1 into master Mar 25, 2018
@konklone
Copy link
Contributor

I'm fine merging this one even with the weird Circle CI build failure (next to the other intended successful ones) since this PR is earlier than #216, which migrated the repo to CircleCI 2.0. However, the same thing is happening in #217, so I think there is a problem somewhere.

@konklone konklone deleted the feature/tmail_psl_cache_readonly_in_lambda branch March 25, 2018 17:32
@konklone
Copy link
Contributor

Ah, I found the cause - when I set up protected branches, it cached any status checks it saw in the last week, and when I marked circleci as required it kept that requirement (even if they were no longer set up to be run).

I just switched the requirements to be the new ones in the protected branches screen:
screen shot 2018-03-25 at 1 33 12 pm

And #217 is now validating properly.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants