Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BNGP-5504: Correctly check type of uploaded file #899

Draft
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

StuAA78
Copy link
Collaborator

@StuAA78 StuAA78 commented Oct 15, 2024

https://eaflood.atlassian.net/browse/BNGP-5504

We were given advance notice that the IT health check may raise the issue that there is no checking that an uploaded file is actually the file type it says it is -- we only check the file extension. For example, if a file that isn't a spreadsheet is renamed .xlsx then it can be uploaded as a metric file and processing will be attempted. This could cause a problem if a malicious file is uploaded that isn't otherwise caught by malware scanning -- say for example a vulnerability is found in the spreadsheet processing module we use whereby attempting to process an image file causes a memory leak, then someone could rename a .jpeg file to .xlsx and upload it, with the subsequent processing potentially causing issues. This isn't a "malicious" file as such as the image file is otherwise normal, it's just that processing it as a spreadsheet results in an issue (I stress that this is just a theoretical vulnerability to illustrate that a file could be "malicious" and still not be picked up by malware scanning!)

In advance of the full health check being delivered to us, we started to look at what we could do to mitigate this. The package file-type can be used to determine the type of a file and its expected file extension. We have implemented a simple check that uses file-type to determine the file's expected extension, and compares it with the actual file extension; if they don't match then the file is rejected.

At present, we haven't yet received the full health check so don't know the scope of any recommendations. We will therefore be leaving this check part-implemented for now, with the expectation that the feature can be completed once the full health check is received and the full scope of the recommendation is known.

The checks can be enabled for a given upload page as follows:

  • In the upload page's call to buildConfig(), add checkFileType: true, ie:
    const config = buildConfig({
      sessionId: request.yar.id,
      uploadType: constants.uploadTypes.METRIC_UPLOAD_TYPE,
      fileExt: constants.metricFileExt,
      checkFileType: true, // This is new
      maxFileSize: parseInt(process.env.MAX_METRIC_UPLOAD_MB) * 1024 * 1024,
      postProcess: true
    })
  • In the upload page's call to processErrorUpload(), add invalidFileTypeErrorMessage, ie.:
      return processErrorUpload({
        err,
        h,
        route: constants.views.UPLOAD_METRIC,
        elementID: UPLOAD_METRIC_ID,
        noFileErrorMessage: 'Select a statutory biodiversity metric',
        unsupportedFileExtErrorMessage: 'The selected file must be an XLSM or XLSX',
        invalidFileTypeErrorMessage: 'The selected file must be a valid XLSM or XLSX', // This is new
        maximumFileSize: process.env.MAX_METRIC_UPLOAD_MB
      })

A couple of caveats to bear in mind with the current implementation:

  • We generated a test .doc file in Word for Mac 16.90 by saving in the format it refers to as "Word 97-2004 (.doc)". However, on attempting to upload we found that this was being detected as a file of mime type application/x-cfb, extension .cfb, aka Compound File Binary Format. It's unknown whether this is a result of saving in a legacy format from a new version of Word; or from saving in a legacy format from Word for Mac; or if it's simply a bug in file-type that cannot detect .doc files. Further testing may be required, or an exception made to allow files of this type.
  • Only .xlsx, .docx and .pdf files have been tested at present. When enabling a page with types other than this, an upload of each file type should be attempted to ensure they are correctly detected.

@StuAA78 StuAA78 self-assigned this Oct 15, 2024
Our approach of using PassThrough streams created issues elsewhere. We therefore refactor to manually collect the first chunk of data in order to detect the file type based on this.
@StuAA78 StuAA78 force-pushed the bngp-5504-check-uploaded-file-type branch from 9ea77d7 to a0be743 Compare October 17, 2024 08:47
Since the `file-type` module gives us the expected file extension, we can simply rework the check to confirm that the file's extension matches what it's expected to be.

NOTE: we tried saving out a .doc file from MS Word for Mac 16.90 and the resulting file was identified on uploading as `cfb` -- so further investigation may be required.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant