Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validation of documents and protocols #397

Closed
JanHoefelmeyer opened this issue Jul 7, 2023 · 11 comments · Fixed by #443
Closed

Validation of documents and protocols #397

JanHoefelmeyer opened this issue Jul 7, 2023 · 11 comments · Fixed by #443

Comments

@JanHoefelmeyer
Copy link
Contributor

  • Add STRICT-mode, which only saves CSAF-documents with no validation errors.
  • Add UNSAFE-mode, which saves CSAF-documents with validation errors within a seperate folder

All validation errors need to be logged within a configurable file.
A new function should be added that counts and returns the number of found, downloaded and potentially skipped files.
These options need to be configurable via a config file and as command line parameters.

@bernhardreiter
Copy link
Member

bernhardreiter commented Aug 22, 2023

design proposal

We propose to implement the functionality as following:

Use slog for logging

  • Logging is done with the new https://pkg.go.dev/log/slog from the Go https://go.dev/doc/go1.21 standard library.
  • Go 1.21 was released this August, using it means that the build requirements is raised, which also means that a different product, using us as a library has their build requirements raised as well. It it is the best solution for maintainability.
  • Logging will be structured by a json object on a single line in a text file. This is a standard handler.
    A custom handler is possible in the future.
  • We write into one logging file. Which can be filtered by text matching or structured methods by log file monitoring.
  • Adding two config options:
    • log_location which is an absolute path or relative the base downloading dir. Default downloader.log. (config file only)
    • log_level, default INFO

Use two directories for failures

If a document file itself cannot be fully downloaded, e.g. we only get 23 bytes, it does not make sense to safe it. So this failure will only be logged, but no file safed.

If we have the main document but anything does wrong with the validation, e.g. the schema or the checksum or the openpgp signature, we consider the document UNSAFE. In case that unsafe mode is explicitly enabled, we write the files in the directory failed_download/ within the directory where it would be saved to. (The default dirname can be overridden in the config file only). The failure is logged.

If forwarding goes wrong, the document is written to the directory failed_forward (The default dirname can be overridden in the config file only). The failure is logged.

forwarding uses in-memory queue

If forwarding is configured, documents will be held in an internal, size limited queue in memory. (The size will probably be made configurable).

Downloads and validation continues concurrently and blocks when the queue is full. This is to protect against congestion and too high memory usage.

Forwarding tries each document from the queue once concurrently and saves it to disk if forwarding fails. Failure is logged.

@s-l-teichmann
Copy link
Contributor

@tschmidtb51 does this match your expectations?

@tschmidtb51
Copy link
Collaborator

tschmidtb51 commented Aug 23, 2023

@s-l-teichmann Please see my remarks inline:

design proposal

We propose to implement the functionality as following:

Use slog for logging

✔️

  • Go 1.21 was released this August, using it means that the build requirements is raised, which also means that a different product, using us as a library has their build requirements raised as well. It it is the best solution for maintainability.

👁️ Please make sure to increase the version number according to semver as our build requirements change.

  • Logging will be structured by a json object on a single line in a text file. This is a standard handler.
    A custom handler is possible in the future.

✔️

  • We write into one logging file. Which can be filtered by text matching or structured methods by log file monitoring.

✔️

  • Adding two config options:

    • log_location which is an absolute path or relative the base downloading dir.

✔️

Default `downloader.log`. 

👁️ According to the Unix log file naming (e.g. redis, nginx, etc.): Shouldn't that be csaf_downloader.log 🤔
👁️ New values are only appended (but the file is created if it does not exist), correct?

(config file only)

⚠️ According to the description this should be a command line option as well.

* log_level, default `INFO`

✔️

Use two directories for failures

If a document file itself cannot be fully downloaded, e.g. we only get 23 bytes, it does not make sense to safe it. So this failure will only be logged, but no file safed.

✔️

If we have the main document but anything does wrong with the validation, e.g. the schema or the checksum or the openpgp signature, we consider the document UNSAFE. In case that unsafe mode is explicitly enabled, we write the files in the directory failed_download/ within the directory where it would be saved to. (The default dirname can be overridden in the config file only). The failure is logged.

👁️ Generally, ok. However, the folder name failed_download/ seems a bit confusing (since we where able to download it). Shouldn't it be failed_validation/ or failed_verification/? 🤔

If forwarding goes wrong, the document is written to the directory failed_forward (The default dirname can be overridden in the config file only). The failure is logged.

✔️

forwarding uses in-memory queue

If forwarding is configured, documents will be held in an internal, size limited queue in memory. (The size will probably be made configurable).

👁️ Please make the size configurable and set it to a reasonable default 😄 IMHO this must be >50MiB as the current default limit in the validator is 50 MiB...

Downloads and validation continues concurrently and blocks when the queue is full. This is to protect against congestion and too high memory usage.

✔️

Forwarding tries each document from the queue once concurrently and saves it to disk if forwarding fails. Failure is logged.

✔️

Here is an explanation of the signs:

✔️ Ok
👁️ Ok, but keep an eye out to address the remarks
⚠️ Remarks

@tschmidtb51
Copy link
Collaborator

Also a function has to be added that outputs the number of downloaded CSAF/signature/SHA256/SHA512 files (according to the error class and in total).

@bernhardreiter
Copy link
Member

@tschmidtb51 thanks for the rewiew. A few responses:

  • Semver does not mandate a version increase for build requirements. But as we are aiming for major version 3, we could change everything.
  • Yes, we append to the log files.
  • If we are in a specific subfolder possibly prefixes like csaf_ are unnecessary I guess. Do happen to have a link to POSIX or GNU/Linux standards on log names?
  • We suggest to not expose the log_location as command line option, because we believe it is used rarely and would keep the command line options simpler. (Or what would be your use case?)
  • Yes failed_validation sounds like and improvement, thanks!
  • Maybe CSAF should consider upper limits on file sizes to prevent DOS attacks.

@tschmidtb51
Copy link
Collaborator

Answers inline:

  • Semver does not mandate a version increase for build requirements. But as we are aiming for major version 3, we could change everything.

✔️

  • Yes, we append to the log files.

✔️

  • If we are in a specific subfolder possibly prefixes like csaf_ are unnecessary I guess. Do happen to have a link to POSIX or GNU/Linux standards on log names?

👁️ I'm currently not aware of that - it was just a feeling from my look into /var/log/.

  • We suggest to not expose the log_location as command line option, because we believe it is used rarely and would keep the command line options simpler. (Or what would be your use case?)

⚠️ We do manual checks on specific domains and compare the results against each other. Having to split the log files, rename them or change the config file for each case seem an unnecessary effort to me.

  • Yes failed_validation sounds like and improvement, thanks!

✔️

  • Maybe CSAF should consider upper limits on file sizes to prevent DOS attacks.

The CSAF standard already suggested a (theoretical) soft limit, which has been overtaken by practice... 50 MiB for a single file seems to be a reasonable approach so far...

@tschmidtb51
Copy link
Collaborator

My colleagues mentioned that according to the XDG Base Directory Specification user-specific logs should default to ~/.local/share/<app-name>/. The use of downloader.log seem reasonable, if we are in a specific folder like ~/.local/share/csaf_downloader/.

We also have to consider Windows... (Nevertheless, I don't see an issue here to let Linux be the driver in this decision.)

@bernhardreiter
Copy link
Member

where to place the log file

From the proposal:

log_location which is an absolute path or relative to the base downloading dir. Default downloader.log. (config file only)

  • We suggest to not expose the log_location as command line option, because we believe it is used rarely and would keep the command line options simpler. (Or what would be your use case?)

⚠️ We do manual checks on specific domains and compare the results against each other. Having to split the log files, rename them or change the config file for each case seem an unnecessary effort to me.

Being relative to the downloading directory (as originally suggested)
would mean that if you use different downloading base dirs with --directory,
you get downloader.log files below that directory by default.
This would support your use case in so far how I have understood it.

My colleagues mentioned that according to the XDG Base Directory Specification user-specific logs should default to ~/.local/share/<app-name>/. The use of downloader.log seem reasonable, if we are in a specific folder like ~/.local/share/csaf_downloader/.

Thanks for the hint, we know about the XDG specifications in general.
Our idea is that the logs of the downloader are more like a protocol of the operation
and thus tend to be more a wanted output file that shall be placed in a visible location.
(As opposed to a classic "user-specific" log that just runs in the background and is consulted rarely if things go wrong. )

Still if you want the location you can set an absolute path to it.
(BTW: The location of config files is more interesting for XDG specs, but this is something we could discuss in a different issue.)

We also have to consider Windows... (Nevertheless, I don't see an issue here to let Linux be the driver in this decision.)

As far as I remember there are also recommendations for Windows to place files, but again if we consider it output of a specific command, it feels more adequate to put them in a subfolder as part of a created or updated output. That shall work well on Windows as well.

@tschmidtb51
Copy link
Collaborator

tschmidtb51 commented Aug 25, 2023

To quickly resolve the issue and following your suggestion:

  1. The log file will be named donwloader.log by default.
  2. There will be a command line option
  3. The log file will be relative to the downloading directory (as originally suggested).

@bernhardreiter
Copy link
Member

bernhardreiter commented Aug 25, 2023

The log file will be named donwload.log by default.

We have understood downloader.log.

  1. There will be a command line option

Will or will not be? (As we were suggesting without command line option.)

We have understood will be and act accordingly

@tschmidtb51
Copy link
Collaborator

tschmidtb51 commented Aug 25, 2023

We have understood downloader.log.

Correct. I corrected my spelling mistake.

Will or will not be? (As we were suggesting without command line option.)

will is correct.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants