Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dns: fix name parse/serialization and support escapes (big fix 1/4) #83

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

pinheadmz
Copy link
Member

@pinheadmz pinheadmz commented Dec 22, 2021

Refactored out of bloated #76

Implement escaped characters when reading and writing names. This means we can lose the weird internal "special bytes" 0xff and 0xfe which were used to represent \0 and . respectively, internally, as a "hack" around C strings.

So:

\! -> !
\061 -> a
\000 -> 0x00
\\ -> \

...etc

I based this patch on the writeName and readName functions in bns encoding.js

I noticed that knot resolver uses is_punc() but JJ used a switch with a list of the special characters 🤷

Another quirk is that hnsd hsk_dns_name_serialize() serves two functions, depending on if char *name is passed, which is report size, or actually write the name. In the PR right now I have basically the same code block twice which seems smelly, could use some inspiration on how or if to optimize that.

@pinheadmz pinheadmz changed the title dns: fix name parse/serialization and support escapes (big fix 1/n) dns: fix name parse/serialization and support escapes (big fix 1/4) Dec 22, 2021
@pinheadmz pinheadmz added this to the v1.1.0 milestone Dec 22, 2021
@nodech
Copy link
Contributor

nodech commented Dec 23, 2021

Unfortunately, this is not gonna work.

Internal structures that use char *name have specific lengths predefined, you can see all of those in src/dns.h and it is 256.
That is already a limit of allowed name length (dns allowed 255 octets + 0x00).
This means that any attempt to "extend" the data further will not fit in the worst case 255 octet.
E.g. punctuation/special characters will need +1 bytes where non printable characters will take +3 bytes.
So super worst case scenario (255 non-printable characters) will just take 4 times more space (1020).

Current solution (replacing 0x00 by 0xff and 0x2e('.') by 0xfe wont have these issues, even though it will not correctly serialize/deserialize any name that contains 0xff and 0xfe bytes in it.

Side notes:

  • hsk_dns_name_sanitize would have needed upgrade as well (as it depends on 0xff and 0xfe logic.
  • I was thinking if we want to add compressed message tests here well, but it's better to do it in a different PR.

knot:
yeah, they use is_punct - their defined function, instead of ispunct
from libc, I guess because they dont want to deal with locale stuff.

@buffrr
Copy link
Contributor

buffrr commented Dec 23, 2021

Internal structures that use char *name have specific lengths predefined, you can see all of those in src/dns.h and it is 256.
That is already a limit of allowed name length (dns allowed 255 octets + 0x00).
This means that any attempt to "extend" the data further will not fit in the worst case 255 octet.
E.g. punctuation/special characters will need +1 bytes where non printable characters will take +3 bytes.
So super worst case scenario (255 non-printable characters) will just take 4 times more space (1020).

Most DNS libs don't store the name in presentation format (with literal \ddd as string) internally. Knot stores it as uncompressed, dynamically allocated, uint8_t *name in wire format (predefined 256 wastes space but eh). ldns does something similar. A null terminator is a valid DNS octet and we already know the length of every label ... we should avoid C strings. Also, wire format means no issues with zero octet or 0x2e ('.'). Ideally, that's how we should do it not necessarily in this PR. Long term this is a worthwhile refactor. Current serialization in hnsd is broken for no good reason.

The current predefined size of 256 in internal structures should still fit any valid DNS name containing any byte 000 to 255 as long as names are not stored in presentation format. Presentation format foo\ddd can be converted before use. However, we lose information about the length and can't use strlen without the current 0x00 -> 0xff hack, or length must be stored somewhere.

We could also avoid supporting presentation format entirely and use C escape sequences \nnn for any proofs that use non-printable chars internally (or even just a byte array). They're in octal so DNS name name = "hello\\255" (length = 9) would use name = "hello\377" (length is only 6). We don't need presentation format except for printing human readable names in log output? still needs the hack for null terminator without storing length.

Current solution (replacing 0x00 by 0xff and 0x2e('.') by 0xfe wont have these issues, even though it will not correctly serialize/deserialize any name that contains 0xff and 0xfe bytes in it.

Hm I thought the issue with0x00 -> 0xff was only about incorrectly using C string functions for DNS names. This is unrelated to names in presentation format not fitting in the internal structures correct?

@nodech
Copy link
Contributor

nodech commented Dec 23, 2021

Most DNS libs don't store the name in presentation format (with literal \ddd as string) internally. Knot stores it as uncompressed, dynamically allocated, uint8_t *name in wire format (predefined 256 wastes space but eh). ldns does something similar. A null terminator is a valid DNS octet and we already know the length of every label ... we should avoid C strings. Also, wire format means no issues with zero octet or 0x2e ('.'). Ideally, that's how we should do it not necessarily in this PR. Long term this is a worthwhile refactor. Current serialization in hnsd is broken for no good reason.

Yeah, it makes sense. Not using wire format internally only helps with some functions (makes them easier to read/write about ? e.g. compression), but I am not 100% sure why was this decision made.

The current predefined size of 256 in internal structures should still fit any valid DNS name containing any byte 000 to 255 as long as names are not stored in presentation format. Presentation format foo\ddd can be converted before use. However, we lose information about the length and can't use strlen without the current 0x00 -> 0xff hack, or length must be stored somewhere.
We could also avoid supporting presentation format entirely and use C escape sequences \nnn for any proofs that use non-printable chars internally (or even just a byte array). They're in octal so DNS name name = "hello\255" (length = 9) would use name = "hello\377" (length is only 6). We don't need presentation format except for printing human readable names in log output? still needs the hack for null terminator without storing length.

I was wondering if we could just use first byte for size. It can store 0-255 and that's enough to measure rest of 255 bytes -- 256 bytes size of the structs wont change. With custom typedef and custom strlen usage will be the same. Unfortunately, I am not that familiar with the usage of these structs and can't say this would work. Maybe just plan for the refactor already.

Hm I thought the issue with0x00 -> 0xff was only about incorrectly using C string functions for DNS names. This is unrelated to names in presentation format not fitting in the internal structures correct?

Yeah, if original content of the wire contains 0xff and/or 0xfe on parse nothing will happen, but on serialize they will be converted to 0x00 and . incorrectly.

@pinheadmz
Copy link
Member Author

I see what you guys are saying, there's no need to use presentation format internally except maybe the log... and we should check the compression functions but I don't think there's a need for strings there either. The best solution to this is stick to wire format internally and use byte arrays instead of strings. DNS wire format has a length byte anyway so we should just stick to that convention.

@pinheadmz
Copy link
Member Author

Taking another look at this and getting a bit worried -- char *name is ubiquitous in this codebase. Everything expects names to be strings: the cache, the readers and writers, all the utilities that count labels and check for _synth and decode each record type and functions that look for '.' in strings...

I wonder what the performance cost is to just change the 256's in dns.h (in the typedefs for all the record types) to 1021... It's not like these strings stay in memory that long anyway. What really sucks up our resources in hnsd right now is 100,000 236-byte chain headers

diff --git a/src/dns.h b/src/dns.h
index 9716d15..bf241b4 100644
--- a/src/dns.h
+++ b/src/dns.h
@@ -6,8 +6,10 @@
 #include <stdlib.h>
 #include "map.h"
 
+#define HSK_DNS_MAX_STRING (255 * 4) + 1
+
 typedef struct hsk_dns_rr_s {
-  char name[256];
+  char name[HSK_DNS_MAX_STRING];
   uint16_t type;
   uint16_t class;
   uint32_t ttl;
@@ -57,7 +59,7 @@ typedef struct {
 } hsk_dns_unknown_rd_t;
 
 typedef struct {
-  char ns[256];
+  char ns[HSK_DNS_MAX_STRING];
   char mbox[256];
   uint32_t serial;
   uint32_t refresh;
@@ -85,15 +87,15 @@ typedef struct {
 } hsk_dns_loc_rd_t;
 
 typedef struct {
-  char target[256];
+  char target[HSK_DNS_MAX_STRING];
 } hsk_dns_cname_rd_t;
 
 typedef struct {
-  char target[256];
+  char target[HSK_DNS_MAX_STRING];
 } hsk_dns_dname_rd_t;
 
 typedef struct {
-  char ns[256];
+  char ns[HSK_DNS_MAX_STRING];
 } hsk_dns_ns_rd_t;
 
 typedef struct {
@@ -167,7 +169,7 @@ typedef struct {
   uint32_t expiration;
   uint32_t inception;
   uint16_t key_tag;
-  char signer_name[256];
+  char signer_name[HSK_DNS_MAX_STRING];
   size_t signature_len;
   uint8_t *signature;
 } hsk_dns_rrsig_rd_t;
@@ -185,7 +187,7 @@ typedef struct {
 } hsk_dns_rp_rd_t;
 
 typedef struct {
-  char next_domain[256];
+  char next_domain[HSK_DNS_MAX_STRING];
   size_t type_map_len;
   uint8_t *type_map;
 } hsk_dns_nsec_rd_t;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants