Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: remove mDNS #2630

Open
wants to merge 10 commits into
base: main
Choose a base branch
from
Open

refactor: remove mDNS #2630

wants to merge 10 commits into from

Conversation

b-zee
Copy link
Contributor

@b-zee b-zee commented Jan 14, 2025

Builds upon #2595. (Merge that first.)

@b-zee b-zee marked this pull request as draft January 14, 2025 16:02
@b-zee b-zee force-pushed the refactor-remove-mdns branch from af8b2b2 to b4c26e6 Compare January 15, 2025 09:40
@b-zee b-zee marked this pull request as ready for review January 15, 2025 09:50
Comment on lines +977 to +981
# - name: Verify the routing tables of the nodes
# run: cargo test --release -p ant-node --test verify_routing_table -- --nocapture
# env:
# CARGO_TARGET_DIR: ${{ matrix.os == 'windows-latest' && './test-target' || '.' }}
# timeout-minutes: 5
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How bad is the error here? Maybe would it be good to add some tolerance to the test?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I've seen it's not too bad. Worst I've seen is one node having 21 nodes in their RT. And usually about 3-6 nodes that are missing one or two (23/24) in their RT.

@@ -330,6 +306,22 @@ impl SwarmDriver {
// all addresses are effectively external here...
// this is needed for Kad Mode::Server
self.swarm.add_external_address(address.clone());

// If we are local and the first node, add our own address(es) to cache
// if self.first {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we uncomment the if statement here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or do we write the listen addr for all local nodes, if so the code comment is misleading ig.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we gotta clone the cache before add_address(), I've pasted the code from the main node handling loop. The ergonomic is really bad here and that is my fault, I had to modify the cache entirely like 3 times and there are a lot of artifacts from previous versions.

let config = bootstrap_cache.config().clone();
                    let mut old_cache = bootstrap_cache.clone();

                    let new = match BootstrapCacheStore::new(config) {
                        Ok(new) => new,
                        Err(err) => {
                            error!("Failed to create a new empty cache: {err}");
                            continue;
                        }
                    };
                    *bootstrap_cache = new;

                    // save the cache to disk
                    spawn(async move {
                        if let Err(err) = old_cache.sync_and_flush_to_disk(true) {
                            error!("Failed to save bootstrap cache: {err}");
                        }
                    });

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I first added a first field, but later removed it again and just thought why not just write all nodes addresses. In theory it might be more realistic to let only the genesis write its address though.

Comment on lines +281 to +283
if opt.peers.first {
bootstrap_cache.write()?;
} else {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if a node is running as a service it could restart and it would still retain the --first flag.
Thus, the entire cache would be wiped.

This is only really a problem for our genesis VM. But a wiped cache there is not really a big deal. If there is a better way to solve this, then great, else no issues tbh.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I edited the RPC restart to remove the --first flag in ff94c00

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No I mean, a --first node started using antnodemanager would run as a service.
And services preserve their args in the service definition file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants