Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update sectxt to 0.9.0 #1046

Closed
bwbroersma opened this issue Aug 16, 2023 · 14 comments · Fixed by #1051
Closed

Update sectxt to 0.9.0 #1046

bwbroersma opened this issue Aug 16, 2023 · 14 comments · Fixed by #1051
Assignees
Milestone

Comments

@bwbroersma
Copy link
Collaborator

DigitalTrustCenter/sectxt released 0.9.0 with has quite a few parser improvements, especially on PGP.

The only one I'm not sure about is the stripping of the BOM (DigitalTrustCenter/sectxt#57 (comment)). I interpret the RFC 9116 - File Format Description and ABNF Grammar:

The file format of the "security.txt" file MUST be plain text (MIME type "text/plain") as defined in Section 4.1.3 of [RFC2046] and MUST be encoded using UTF-8 [RFC3629] in Net-Unicode form [RFC5198].

RFC 5198 states:

  1. Net-Unicode Definition
    The Network Unicode format (Net-Unicode) is defined as follows. Parts of this definition are deliberately informal, providing guidance for specific profiles or rules in the protocols that reference this one rather than firm rules that apply globally.

    5. As suggested in Section 6 of RFC 3629, the Byte Order Mark ("BOM") signature MUST NOT appear at the beginning of these text strings.

Especially in combination with signing maybe a ⚠️ warning or ℹ️ notice should be shown. Although it's outside of the PGP block, a file with BOM is no longer recognized with file in Linux as a PGP signed file.

@bwbroersma bwbroersma added the discuss Requires further team discussion and decisions label Aug 16, 2023
@bwbroersma bwbroersma changed the title Update sectct to 0.9.0 Update sectxt to 0.9.0 Aug 16, 2023
@bwbroersma bwbroersma removed the discuss Requires further team discussion and decisions label Aug 22, 2023
@bwbroersma bwbroersma added this to the v1.8 milestone Aug 22, 2023
@mxsasha
Copy link
Collaborator

mxsasha commented Aug 22, 2023

I'll find out which new content labels we need.

@baknu baknu assigned baknu and unassigned bwbroersma Aug 22, 2023
mxsasha added a commit that referenced this issue Aug 23, 2023
@mxsasha mxsasha linked a pull request Aug 23, 2023 that will close this issue
@baknu baknu modified the milestones: v1.8, v1.9 Oct 7, 2023
@mxsasha
Copy link
Collaborator

mxsasha commented Feb 15, 2024

DigitalTrustCenter/sectxt#65 is a blocker for this

@mxsasha
Copy link
Collaborator

mxsasha commented Apr 5, 2024

Content still needs to be checked: all labels in https://github.com/DigitalTrustCenter/sectxt/ readme need to be in our content too.

@mxsasha mxsasha reopened this Apr 5, 2024
@mxsasha mxsasha removed their assignment Apr 5, 2024
@bwbroersma
Copy link
Collaborator Author

bwbroersma commented Apr 5, 2024

Crappy one-liner check (formatted on 3 lines for readability 😅):

$ diff \
   <(grep -oP '"\K[a-z0-9]+_[a-z0-9_]+(?=")' sectxt/sectxt/__init__.py | sort -u) \
   <(ls internet.nl_content/detail/tech/data/http-securitytxt/ | sed 's/_..\.md$//g' | sort -u)
1d0
< bom_in_file
5,6c4
< field_name
< invalid_cert
---
> expired
12c10
< invalid_uri_scheme
---
> location
26c24,25
< no_security_txt
---
> no_security_txt_404
> no_security_txt_other
31d29
< pgp_envelope
33a32,33
> requested-from
> retrieved-from
35a36
> utf8

At least for sure currently these are missing:

  • bom_in_file
  • invalid_cert
  • invalid_uri_scheme
  • pgp_envelope

At a manual inspection of sectxt I however see that invalid_uri_scheme and bom_in_file are in the SecurityTXT class, not in the Parser class that internet.nl uses. I'm don't see why bom_in_file is not checked in the Parser class.
Created issue upstream:

@bwbroersma
Copy link
Collaborator Author

Upstream solved it in the 0.9.3 release.

@bwbroersma
Copy link
Collaborator Author

Although this is in milestone v1.9, it is already included and deployed in the 'batch' release v1.8.7.

@baknu
Copy link
Contributor

baknu commented Nov 28, 2024

@bwbroersma I just copied the English texts from https://github.com/DigitalTrustCenter/sectxt/blob/main/README.md to new text files for:

  • bom_in_file
  • invalid_cert
  • invalid_uri_scheme

I will check these texts and translate them into Dutch.

Couldn't find pgp_envelope in the above mentioned README though. Could you check?

@baknu
Copy link
Contributor

baknu commented Nov 29, 2024

Changed and translated texts for:

  • bom_in_file
  • invalid_cert
  • invalid_uri_scheme

Made a note in the pgp_envelope text files.

@baknu baknu closed this as completed Nov 29, 2024
@bwbroersma
Copy link
Collaborator Author

pgp_envelope is not an error but a type (like comment and field).
So no translation is needed.

@baknu
Copy link
Contributor

baknu commented Nov 29, 2024

@mxsasha Double-checking with you:

  • bom_in_file is part of the sectxt parser that Internet.nl uses. Correct?
  • invalid_cert and invalid_uri_scheme are both not part of the sectxt parser but part of the sectxt retrieval code that Internet.nl does not use. Have you mimicked these checks and message labels in our code?

@baknu baknu reopened this Nov 29, 2024
@bwbroersma
Copy link
Collaborator Author

@baknu:

I think it's best to close this issue (as solved for v1.9) and fix these minor edge cases another release.

@baknu
Copy link
Contributor

baknu commented Dec 2, 2024

@bwbroersma:

Ok, thanks! Agreed.

Still two questions:

  1. So the content for both invalid_cert and invalid_uri_scheme is currently not used. Shall I leave these text labels in the content repo?
  2. Regarding invalid_cert: Internet.nl currently evaluates the validity of the domain tested under "Secure connection (HTTPS)" --> "Certificate" --> "Trust chain of certificate" and "Domain name on certificate". However, these subtests do not evaluate domains to which the tested domain redirects. So this means that in a redirect chain to a security.txt, an invalid certificate can still be used that Internet.nl does not detect. Shouldn't we also open an issue for this?

@bwbroersma
Copy link
Collaborator Author

@baknu:

  1. Yes, see comment, this issue is covered in security.txt redirect to domain with TLS issues (expire / name / CA) #1562.

@baknu
Copy link
Contributor

baknu commented Dec 2, 2024

Ad 1: I left the respective text files in the content repo, and added the following note to each of them:

[Note: this content label is currently not used. See #1046.]

@baknu baknu closed this as completed Dec 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging a pull request may close this issue.

3 participants