RGS v2: NodeAnnouncement Delta Serialization #76

arik-so · 2024-03-19T20:18:01Z

This PR aims to start serializing changes to nodes' socket address and feature sets. Doing so comprises two main components: calculating the delta, and encoding it in a new version of the RGS protocol, while also maintaining the ability to serialize old versions.

The lookup of new node details is primarily accomplished in lookup.rs, and the main details of the new serialization format are in lib.rs.

src/config.rs

src/lib.rs

TheBlueMatt · 2024-05-14T20:00:41Z

src/lib.rs

+	let mut versioned_output = GOSSIP_PREFIX.to_vec();
+	versioned_output.append(&mut unversioned_output.clone());
+
+	let mut compressed_unversioned_output = rle::encode(&unversioned_output);


Do you have any data on how much savings the RLE will give us vs gzip? I wonder if we RLE first does gzip end up not working as well and we end up with larger gzip data vs originally.

so I just ran some tests. RLE saves about 10% over not-having it — without gzip. Post-gzip, not having RLE prior actually results in a smaller file size. So I think we should do away with RLE altogether and let gzip do its job.

Yea, that is what I was kinda figuring would happen...I think I agree we should do away with our internal RLE here. Just saving 10% isn't really huge at least compared to gzip, so we'd still kinda be reliant on gzip.

All right, I removed RLE. Still gotta massage the commit history a bit, but it's gone

Pity, I liked the RLE idea, but if gzip is better and the new default, all the better!

We're ultimately making a tradeoff - we always do gzip on the HTTP end, pre-gzip -9'ing on our reference server, even. We assume most clients will do gzip in their HTTP client (which is part of the reason the API pushes the user to fetch the blob, rather than us bundling a lightweight HTTP client ourselves) so don't want to slow that down, but on the flip side some users may use a lightweight HTTP client that doesn't do gzip and we end up slowing them down.

TheBlueMatt · 2024-05-15T17:01:18Z

We need the ability to generate both formats for some time.

src/serialization.rs

src/lookup.rs

TheBlueMatt · 2024-05-16T01:50:19Z

src/lib.rs

+		current_node_id.write(&mut current_node_delta_serialization).unwrap();
+
+		if let Some(node_delta) = node_delta_set.get(&current_node_id) {
+			if node_delta.has_feature_set_changed {


It looks like this commit is changing the v1 format only for a later commit to change the format back. Can you move this commit later and squash it into the format change PR so that we don't have fixups in the same PR?

src/lib.rs

src/lookup.rs

src/serialization.rs

src/lib.rs

TheBlueMatt

A few nits and a few real comments.

src/serialization.rs

TheBlueMatt · 2024-05-19T21:56:46Z

src/tests/mod.rs

 		receiver.send(GossipMessage::NodeAnnouncement(announcement.clone(), Some(timestamp - 10))).await.unwrap();
 		receiver.send(GossipMessage::NodeAnnouncement(announcement.clone(), Some(timestamp - 8))).await.unwrap();

+		{
+			let mut current_announcement = generate_node_announcement(Some(SecretKey::from_slice(&[2; 32]).unwrap()));


Please don't include commits that are effectively fixups for earlier commits, if we swap the ordering of this commit and the one adding this test it would be less total diff :)

what's the effective fixup here? Am I understanding correctly that the secret key parameter should be in the same commit as the generate_node_announcement method?

I rearranged and squashed some. Hope it's sufficiently non-effectively-fixuppy.

src/lib.rs

TheBlueMatt · 2024-05-19T22:05:57Z

src/lib.rs

+				}
+
+				if has_update {
+					let extension_length = current_node_delta_extension.len() as u16;


Why always use this? We set other bits which indicate we have addresses/features, which are all length-prefixed AFAICT, so we should be always leaving the top bit clear because we don't have any "new" data.

This is for future-proofing extensibility. It will allow us to push custom data absent any features or addresses that need reading.

Right, my point is that we don't have to use this currently, rather we can use the fact that the data we're writing already has lengths in it and we can just have this be a future-extensibility thing without actually using it at all today.

but the data we're writing right now is typed. We know to read it based on specific flags set beforehand, and then we read it according to the SocketAddress or NodeFeature spec. For the client to be able to read arbitrary data, or not to error when the SocketAddress or Feature is invalid, we need some higher-level length encoding. Basically, it's technically doable from the server's perspective, but decoding is where I think it would become challenging.

For the client to be able to read arbitrary data

Right, if we add new data in the future we should use the length bytes here

not to error when the SocketAddress or Feature is invalid

So don't write invalid data? I don't see how this is challenging.

ah gotcha, I see what you meant

Note that we will have to handle new gossip address types - those we don't know the length for in advance so we'll probably want to write the total gossip data address data length rather than address count.

I think we wanna do a per-address byte count prefix such that future address types don't break the deserialization of older clients if they're followed by already supported address types.

Alternatively we could sort the socket addresses by when the enum value they assume got introduced, and have the client break out of the loop as soon as it encounters an address it doesn't understand.

src/lookup.rs

src/serialization.rs

jkczyz · 2024-05-21T14:02:58Z

src/lookup.rs

+		let unsigned_node_announcement = NodeAnnouncement::read(&mut readable).unwrap().contents;
+
+		let node_id = unsigned_node_announcement.node_id;
+		let is_previously_processed_node_id = Some(node_id) == previous_node_id;


Can't we skip the rest of the loop iteration if this is true (other than updating previous_node_id) since we are iterating by descending timestamp? Could we also use SELECT DISTINCT ON then?

Nope, we need to find all pairwise mutation possibilities.

Could you add a comment why this is the case? It isn't readily obvious.

hmm, I think such a comment probably belongs in the README

src/lookup.rs

src/lib.rs

TheBlueMatt · 2024-05-28T19:01:00Z

Otherwise I think LGTM.

TheBlueMatt · 2024-05-28T19:04:58Z

src/lib.rs

+			if let Some(node_delta) = serialization_details.node_mutations.get(&current_node_id) {
+				/*
+				Bitmap:
+				7: expect extra data after the pubkey (a u16 for the count, and then that number of bytes)


so bit 6 is now unused? Should we clearly define it to mean something? Not sure what it would mean but worth mentioning.

we could use it for more feature defaults? Frankly, I think we can leave it unused on the server, and decide its interpretation on the client.

Sure, we definitely don't have to interpret it here before merging this PR, I'm just suggesting we think on it.

src/lib.rs

TheBlueMatt

LGTM, I think, I'll do a full 'nother pass tomorrow. LMK If you had any other feedback @jkczyz but I'll merge it either way, we can handle any comments you had later.

TheBlueMatt · 2024-05-29T01:53:31Z

src/lib.rs

+					let mut address_serialization = Vec::new();
+
+					// we don't know a priori how many are <= 255 bytes
+					let mut total_address_count = 0;


nit

Suggested change

let mut total_address_count = 0;

let mut total_address_count: u8 = 0;

TheBlueMatt

Ship it

jkczyz

LGTM, I think, I'll do a full 'nother pass tomorrow. LMK If you had any other feedback @jkczyz but I'll merge it either way, we can handle any comments you had later.

Just a couple nits for a follow-up.

jkczyz · 2024-05-29T14:55:08Z

src/lib.rs

+	let mut delta_set = DeltaSet::new();
+	lookup::fetch_channel_announcements(&mut delta_set, network_graph, &client, last_sync_timestamp, snapshot_reference_timestamp, logger.clone()).await;
+	log_info!(logger, "announcement channel count: {}", delta_set.len());
+	lookup::fetch_channel_updates(&mut delta_set, &client, last_sync_timestamp, logger.clone()).await;
+	log_info!(logger, "update-fetched channel count: {}", delta_set.len());
+	let node_delta_set = lookup::fetch_node_updates(&client, last_sync_timestamp, logger.clone()).await;
+	log_info!(logger, "update-fetched node count: {}", node_delta_set.len());
+	lookup::filter_delta_set(&mut delta_set, logger.clone());
+	log_info!(logger, "update-filtered channel count: {}", delta_set.len());
+	serialization::serialize_delta_set(delta_set, node_delta_set, last_sync_timestamp)


nit: Probably best to return the sets in all of these instead of passing in a &mut, but can be done in a follow-up.

jkczyz · 2024-05-29T14:59:28Z

src/serialization.rs

+pub(super) fn find_leading_histogram_entries(histogram: HashMap<&NodeFeatures, usize>, count: usize) -> Vec<NodeFeatures> {
+	let mut entry_counts: Vec<_> = histogram.iter().filter(|&(_, &count)| count > 1).collect();
+	entry_counts.sort_by(|a, b| b.1.cmp(&a.1));
+	entry_counts.into_iter().take(count).map(|(&features, _count)| features.clone()).collect()


Shouldn't need a clone if you don't use a reference?

But the reference avoids a clone to insert every key.

Oh, I just noticed the originating HashMap contains references as keys, so the clone is indeed needed. I was actually referring to &features when I said reference. Anyhow, the code could use fewer explicit references as:

let mut entry_counts: Vec<_> = histogram.into_iter().filter(|&(_, count)| count > 1).collect(); entry_counts.sort_by(|a, b| b.1.cmp(&a.1)); entry_counts.into_iter().take(count).map(|(features, _count)| features.clone()).collect();

But that's more a stylistic nit, so feel free to ignore.

arik-so mentioned this pull request Mar 19, 2024

Serialize NodeAnnouncement address data #75

Closed

4 tasks

TheBlueMatt reviewed Mar 20, 2024

View reviewed changes

src/config.rs Outdated Show resolved Hide resolved

src/config.rs Outdated Show resolved Hide resolved

arik-so force-pushed the arik/2024/02/node-gossip branch from 282b03b to 3ced9be Compare March 28, 2024 06:57

arik-so force-pushed the arik/2024/02/node-gossip branch from a43cfa7 to c813aed Compare May 13, 2024 07:21

TheBlueMatt reviewed May 14, 2024

View reviewed changes

arik-so changed the title ~~Persist NodeAnnouncement gossip~~ RGS v2: NodeAnnouncement Delta Serialization May 14, 2024

arik-so force-pushed the arik/2024/02/node-gossip branch 3 times, most recently from e336464 to a95e06d Compare May 15, 2024 16:45

arik-so force-pushed the arik/2024/02/node-gossip branch 2 times, most recently from ca193d1 to c58c88c Compare May 15, 2024 22:15

arik-so marked this pull request as ready for review May 15, 2024 22:16

TheBlueMatt reviewed May 16, 2024

View reviewed changes

arik-so force-pushed the arik/2024/02/node-gossip branch 2 times, most recently from 088f6b0 to 5245c27 Compare May 16, 2024 16:42

TheBlueMatt reviewed May 17, 2024

View reviewed changes

src/lookup.rs Outdated Show resolved Hide resolved

src/lookup.rs Outdated Show resolved Hide resolved

src/lookup.rs Outdated Show resolved Hide resolved

src/lookup.rs Show resolved Hide resolved

src/serialization.rs Outdated Show resolved Hide resolved

src/lib.rs Show resolved Hide resolved

arik-so force-pushed the arik/2024/02/node-gossip branch 2 times, most recently from a413e34 to db3fbac Compare May 17, 2024 18:10

TheBlueMatt reviewed May 19, 2024

View reviewed changes

arik-so force-pushed the arik/2024/02/node-gossip branch from db3fbac to 2ffc9e6 Compare May 20, 2024 06:22

arik-so added 2 commits May 19, 2024 23:25

Test NodeAnnouncement persistence.

1fd6c67

Track db schema version 14.

529c666

arik-so force-pushed the arik/2024/02/node-gossip branch from 2ffc9e6 to 2c96530 Compare May 20, 2024 06:26

jkczyz self-requested a review May 20, 2024 18:42

jkczyz reviewed May 20, 2024

View reviewed changes

src/lookup.rs Show resolved Hide resolved

src/lookup.rs Show resolved Hide resolved

src/lookup.rs Outdated Show resolved Hide resolved

src/lookup.rs Outdated Show resolved Hide resolved

arik-so force-pushed the arik/2024/02/node-gossip branch from 2c96530 to 7a2fa18 Compare May 21, 2024 00:27

tnull reviewed May 21, 2024

View reviewed changes

src/serialization.rs Show resolved Hide resolved

jkczyz reviewed May 21, 2024

View reviewed changes

arik-so force-pushed the arik/2024/02/node-gossip branch from 7a2fa18 to 1be317e Compare May 21, 2024 20:21

arik-so force-pushed the arik/2024/02/node-gossip branch from 1be317e to 516f143 Compare May 21, 2024 20:30

Introduce method for comparing node announcements.

74dd28d

arik-so force-pushed the arik/2024/02/node-gossip branch 2 times, most recently from 1248538 to a5fab3f Compare May 22, 2024 21:20

TheBlueMatt reviewed May 28, 2024

View reviewed changes

src/lib.rs Outdated Show resolved Hide resolved

src/lib.rs Outdated Show resolved Hide resolved

TheBlueMatt reviewed May 28, 2024

View reviewed changes

arik-so force-pushed the arik/2024/02/node-gossip branch 2 times, most recently from ae0e7f2 to 567e986 Compare May 28, 2024 21:06

TheBlueMatt reviewed May 28, 2024

View reviewed changes

src/lib.rs Outdated Show resolved Hide resolved

arik-so force-pushed the arik/2024/02/node-gossip branch from 567e986 to 91d91b7 Compare May 28, 2024 22:33

TheBlueMatt reviewed May 29, 2024

View reviewed changes

arik-so added 2 commits May 28, 2024 19:16

Version-aware serialization of node features and addresses in delta.

dcec600

Store v1 and v2 snapshots.

fe33964

arik-so force-pushed the arik/2024/02/node-gossip branch from 91d91b7 to fe33964 Compare May 29, 2024 02:19

TheBlueMatt approved these changes May 29, 2024

View reviewed changes

TheBlueMatt merged commit b13e988 into lightningdevkit:main May 29, 2024
4 checks passed

jkczyz reviewed May 29, 2024

View reviewed changes

	let mut total_address_count = 0;
	let mut total_address_count: u8 = 0;

RGS v2: NodeAnnouncement Delta Serialization #76

RGS v2: NodeAnnouncement Delta Serialization #76

Conversation

arik-so commented Mar 19, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TheBlueMatt commented May 15, 2024

Choose a reason for hiding this comment

TheBlueMatt left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jkczyz May 21, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TheBlueMatt commented May 28, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TheBlueMatt left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TheBlueMatt left a comment

Choose a reason for hiding this comment

jkczyz left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

arik-so commented Mar 19, 2024 •

edited

Loading

jkczyz May 21, 2024 •

edited

Loading