Skip to content
This repository has been archived by the owner on Dec 17, 2021. It is now read-only.

Making the trustymail PSL cache read-only in Lambda #215

Merged
merged 5 commits into from
Mar 25, 2018
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 16 additions & 5 deletions scanners/trustymail.py
Original file line number Diff line number Diff line change
@@ -1,12 +1,8 @@
import logging

import trustymail.trustymail as tmail
import trustymail
# Monkey patching trustymail to make it cache the PSL where we want
trustymail.PublicSuffixListFilename = 'cache/public-suffix-list.txt'

###
# Inspect a site's DNS Mail configuration using DHS NCATS' trustymail tool.
###

# default to a long timeout
default_timeout = 30
Expand Down Expand Up @@ -60,6 +56,21 @@ def scan(domain, environment, options):
'dmarc': options.get('dmarc', False)
}

import trustymail
if environment['scan_method'] == 'local':
# Local scanning
#
# Monkey patching trustymail to make it cache the PSL where we want
trustymail.PublicSuffixListFilename = 'cache/public-suffix-list.txt'
else:
# Lambda scanning
#
# Monkey patching trustymail to make it cache the PSL where we want
trustymail.PublicSuffixListFilename = './public-suffix-list.txt'
# Monkey patching trustymail to make the PSL cache read-only
trustymail.PublicSuffixListReadOnly = True
import trustymail.trustymail as tmail
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why move the import statements to within the code, instead of at the top of the file?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The monkey patching can only happen here, since at the top of the file we don't know if we're running locally or in Lambda. The import trustymail.trustymail as tmail statement has to happen after the monkey patching, since otherwise it doesn't appear to have any effect on tmail.scan. (I spent quite a while debugging this very issue yesterday!)

I could put import trustymail and monkey patch for running locally at the top of the file, then re-patch inside the function if we're running in Lambda. The import trustymail.trustymail as tmail statement would still have to be inside the function though (since it has to come after the monkey patching), and I thought the code would be easiest to understand if I just kept everything together.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, got it.

This is your scanner and I'm happy to see this merged to support the change, but I left a comment over at cisagov/trustymail#74 about whether a new flag is necessary for this purpose.

We could also look at aligning the placement of the PSL to be inside cache/ for the Lambda function, which would eliminate the need for special-case code inside the trustymail scanner at all.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I responded to your comment over on cisagov/trustymail#74, but I'll include it here too:

The caching that trustymail uses (which is useful when trustymail is run standalone) also checks the PSL file to see if it's less than 24 hours old. So it would still try to update the Lambda PSL file if we didn't have the extra flag.

I agree that moving the PSL to be inside of the cache directory would be a good change to make. I can add that to this pull request or make a separate one if you like.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To copy my latest comment from that thread here too, for those reading:

With pshtt, I worked hard to ensure that we never had to download third party data, either from the internet or from S3. And what do you know, it looks like I also use some lambda/local special-case logic to do that in pshtt. So I should probably drop my complaint about having any Lambda-specific logic in the trustymail scanner.

I still think, just as advice to trustymail, that having trustymail manage a flag discussing the state of the read only file system is exposing the wrong layer of abstraction. I recommend having a flag which disables the 24-hour cache feature, rather than a flag which relates to a low-level implementation detail that conflicts with that feature.

I also filed #219 to capture some comments around repackaging the PSL and other third party data sources in scanners.


data = tmail.scan(domain, timeout, smtp_timeout, smtp_localhost, smtp_ports, smtp_cache, scan_types, dns_hostnames).generate_results()

if not data:
Expand Down