Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sequenced vectors with efficient replication #239

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

csibbitt
Copy link

  • Adds sp::SeqVector for tracking elements that have a sequence number
  • Tracks last sequence # to replicate
  • Provides significant performance improvements for EE ships_log

* Adds sp::SeqVector for tracking elements that have a sequence number
* Tracks last sequence # to replicate
* Provides significant performance improvements for EE ships_log
@csibbitt
Copy link
Author

csibbitt commented Oct 15, 2023

I did these tests

  • Set the EE ships_log to use a SeqVector with 10,000 entries
  • Scenario producing 10 logs/s
  • Connected 2 new clients, full back replication worked
  • Added 10,000 messages in two batches of 5000 from web UI
  • Start and end of log work as expected before and after 10,000 on all 3 sessions
  • Connected a 3rd client, full replication occurs
  • Debug meter shows 0.4 kbps per client (from the 10 logs/s)
    • Previously I was seeing numbers more like 900kbps

Copy link
Owner

@daid daid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you are very brave that you dove into this code. It's old. It's not pretty. And it's not how I would attempt to do it now if I would rebuild it.

However, I'm not sure if I like these changes, it's quite a hack.

uint16_t count = ptr->size();
unsigned int to_send = std::min(ptr->back().seq - ptr->last_seq, static_cast<unsigned int>(count));

ptr->last_seq = ptr->back().seq;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The process of sending shouldn't modify the vector state. The concept of this code is that the isChanged call will check if it is changed and update data a the prev_data_ptr to know there is no change anymore.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with you in principle ("The process of sending shouldn't modify the vector state"). We need somewhere to track what has been sent so that we don't resend the entire vector (say, of a sequence of log entries) every time we sync. A class that wraps the vector (whose state doesn't change) seemed like the right place. I experimented with just using a struct with a vector member and a last_seq member for clearer separation, but this was a cleaner fit for the templating. Maybe I've overlooked a simpler way.

I understand the prev_data approach of keeping a copy and comparing them, and started that way originally since that follows the pattern of the rest of the code. In addition to needlessly storing a second copy, I didn't come up with an easy solution to new client sync. Basically, instead of last_seq, I just used prev_data_ptr->back()->seq in it's place. The isChanged code was if (ptr->back().seq != prev_data_ptr->back().seq), and the send code was to_send = ... (ptr->back().seq - prev_data_ptr->back().seq)... and they worked fine, but of course would not sync the backlog to a new client connection. I agree that the last_seq-shuffle in sendWholeDataSeqVector is a bit hacky, but I think it's pretty clear at least?

This solution reduces memory footprint and makes for an easy way to sync new clients. The vector itself doesn't get modified by sending, just the super-class that is specifically FOR tracking. If there is a clearer way of expressing this within the framework we have here, I'm happy to learn it. :)

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I don't think it has to be a huge issue to send the whole log each time it changes (high logs/sec would be unreadable no mater the log size, so not really a realistic scenario). The main issue I see with the master code is that it's CPU heavy if you increase the log size. Keeping track of the sequence number inside the prev_data would solve that part.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants