Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved replication #116

Draft
wants to merge 4 commits into
base: master
Choose a base branch
from
Draft

Improved replication #116

wants to merge 4 commits into from

Conversation

gcask
Copy link
Contributor

@gcask gcask commented May 22, 2021

This build's on @daid 's proposal from #115 .

It introduces a new type, replication::Field<T> which people can use to mark replicated... fields.

⚠️ This is still very much a draft, with an API in flux. Currently trying to match all existing functionalities (including debug features, network stats).

class Sample : public MultiplayerObject
{
  float old_way;
  float old_way_interval;
  replication::Field<float> new_way;
  replication::Field<float> new_way_interval;
public:
  Sample()
  : new_way{this}, new_way_interval{this, 1.5f} // <- this part is likely to change slightly, but that's the concept
  {
    registerMemberReplication(&old_way);
    registerMemberReplication(&old_way_interval, 1.5f);
  }
};```

replicated fields can be used transparently in most context: a const-ref accessor is provided, and overloads are set for the most common writing operations (assignments mainly).

The replication handling has been moved into a dedicated class, the `ControlBlock` (modeling after the shared_ptr logic).

While it follow just about the same overall pattern as the current replication logic, the few improvements are:
  * It's type-safe. No more `void*` party.
  * It doesn't need to keep previous values around: items are marked dirty when they change (or are assigned for some objects where it is too expensive to do change detection).
  * It's faster at sorting out changed vs unchanged values: both from not having to check for change *each* attempt at updating and through using a bitset.
  * It's smaller: The current `MemberReplicationInfo` is about 48B on a 64b system, whereas the `Field<[trivial type]>` is only 32B. This currently includes padding, and could probably be shrunk a little more.

The API is aiming to be easy to use, hard to misuse while still being somewhat open ended to allow introducing custom Field types.

Again still WIP, still a draft, but would definitely appreciate any early insights or feedback.

Currently planned:
 * Field should take a setting-like struct at the beginning, currently it's a little confusing. So from `field{this, min_interval, field_args...}` to `field{ {this, min_interval, other_settings? }, field_args... }`
 * Multiplayer stats - still need a macro for the name injection etc. Will likely go from `: field{...}` to `: replicatedField(field, ...)`.
 * Vector support.

daid and others added 4 commits May 21, 2021 08:58
…ation values use a different method to check for changes.
Moving everything under a proper namespace.
Introducing a control block for replication, which drives the bulk of the logic.

For code facing, introducing a `Field<T>`, which handles the bulk of the replication logic in a clean typesafe fashion.
Adding support for string (still rough).
Hiding some of the details of the controller, making sure it's harder to mess up.

Changing the control block to be more SoA oriented.

Updated the send() logic to be closer to the current behavior
{
float min_delay{};
#ifdef DEBUG
string name{};
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be a pain in the ass to fill in DEBUG?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah that's where the macros are still needed (currently reviving them).

bool isDirty(size_t) const;
void setDirty(size_t);
void resetDirty();
std::vector<size_t> dirty;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Multiple vectors of the same size? Sounds more like thing for a single vector with a struct. Would also work better cache wise I would guess.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not quite actually: this is the structure of arrays vs array of structures (SoA vs AoS) approaches.

In this particular case, it's a lot about hot/cold data separation: we know the dirty check and 'tick time' are going to be updated very often, every frame.
The members list and other constants are cold - they're only changed when members are getting registered (which is "rare" in comparisons - only happens on object construction).

So with this approach, the cache accesses are actually better: with the dirty vector packed as tightly as 1-bit per member, a single cache line holds the status for up to 512 members to be checked (assuming x64).

If all arrays were merged in a single struct, this would drop down to only ~2.5 members that can be checked for dirtiness in a single line.

Now obviously this is all part speculation, part educated guesses before any profiling comes in, and I still need this to run at scale, but the preliminary results I have are encouraging.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I made the comment it wasn't clear to me that the dirty is bitflags. (No idea if std::vector specialization could be used instead, but if I remember right, that one is odd and not really well defined)

As for the rest being SoA instead of AoS, I preferrer readability over optimization. Unless you can show that it makes a real bit difference on total application speed, I highly preferrer more readable code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're referring to vector<bool> you're right it is generally frowned up.

I was initially tempted to pull everything into a DynamicBitset class, hopefully that would make things more legible.
Alternatively, I considered using a bitset<>, but that forces either to know the replicated fields at compile time (which we should) or set an upper bound (wasn't sure if that was acceptable).


if (!everything)
{
for (auto dirty_batch = 0; dirty_batch < dirty.size(); ++dirty_batch)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I like this. My method with the single direction linked list was simpler and more efficient, compared to these packed dirty bits.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about yours being more efficient: linked lists involve a ton of pointer chasing which are rarely a prefetcher's favorite, and it required an extra pointer on the replicated members themselves, so the memory footprint was also higher as well.

However, again, measurement is king.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My point on being more efficient is, the head pointer will usually point to nullptr, as any configuration value isn't changed, meaning we check nothing at all. And doing nothing is most certainly faster then checking an array to do nothing. I agree that going over the linked list is less efficient if it is linked, but it usually isn't...

And updating the 2 pointers should be in the same ballpark as looking up the right bit to set in the dirty array.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My point on being more efficient is, the head pointer will usually point to nullptr

Interesting.
In my tests, where I converted a few of the fields on the spaceship, there's almost always 'something' that changed among all the replicated values, so from what I've it looked like the list would never really be empty...
And I figured it'd get worse once alll the fields would be converted.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, see where our difference is then, you want to convert all the fields, I just want to convert the ones that almost never change.

@gcask
Copy link
Contributor Author

gcask commented May 25, 2021

I'm going to table this for the foreseeable future - Very selfishly, it's a little too far off my intended target :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants