Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

why commit a instance on fast-path that requires all replicas to be reply equal in preAcceptReply #16

Open
pengsven opened this issue Dec 23, 2019 · 9 comments

Comments

@pengsven
Copy link

hi, guys:

Why do need all the replies inst. Lb. AllEqual in the preAcceptReply reply to go commit on the fast-path , the following code is implemented:

1050 if inst.lb.preAcceptOKs >= r.N/2 && inst.lb.allEqual && allCommitted && isInitialBallot(inst.ballot) {
1112 if inst.lb.preAcceptOKs >= r.N/2 && inst.lb.allEqual && allCommitted && isInitialBallot(inst.ballot) {

In the paper, only the fast-quorum of replies with the same instance, then committing on fast-path

Is there anything else I haven't considered?

Thanks

@imoraru
Copy link
Member

imoraru commented Dec 24, 2019

My recommendation to anyone who wants to implement EPaxos is to avoid using the "seq" attribute. Seq is useful if we want to be able to notify the client that their command is committed before the dependency graph of that command has been finalized. This makes commit faster in the presence of conflicts. However, it introduces this ugly corner case, where "seq" is difficult to recover properly following failures unless we add an extra condition for the fast path (more on this in https://www.pdl.cmu.edu/PDL-FTP/associated/CMU-PDL-13-111.pdf, page 13). This complicates the protocol and it makes it difficult to implement correctly.

So how do we avoid using "seq"? Just don't tell the client the command has been committed until its entire dependency graph is finalized (i.e., it cannot change anymore). One simple way to do this is to execute the command first, and then tell the client it's done. This ensures strict serializability without the need for seq.

@drmingdrmer
Copy link

drmingdrmer commented Dec 24, 2019

But how to resolve the ordering in an SCC without seq?
I remember you have mentioned about this issue:

image

Finally a,b,c will all be committed with the finalized dependency graph as above.
Without seq, a might be executed before c, and this is not expected by the client.
Does it mean we need another component to solve the order, such as a timestamp service?
Or did I miss something that could solve this ordering issue?

@imoraru
Copy link
Member

imoraru commented Dec 24, 2019

You can use any arbitrary criterion to break cycles, as long as you use the same criterion on every replica. For example, a timestamp chosen at the command leader, or even just the command leader’s replica ID.

@pengsven
Copy link
Author

😄Thanks for the answer, but I still have some doubts, as follows:

Means that if the implementation of seq is not used, then the restriction of inst.lb.allEqual can be removed ?

And whether the restriction of allCommitted (refer to https://www.pdl.cmu.edu/PDL-FTP/associated/CMU-PDL-13-111.pdf, page 13, FP-deps-committed) can be removed?

Finally, what can be done to simplify int the Explicit Prepare Phase

@drmingdrmer
Copy link

I believe allEqual can not be removed because it is a requirement by the original fast-paxos: only when the number of equal values achieves fast-quorum the value is safe(e.g. it is the only value that will be committed).

@imoraru
Copy link
Member

imoraru commented Dec 27, 2019

Yes, allEqual is still necessary even if you don't use seq. That's necessary for correct recovery of a command committed on the fast path. However, allCommitted is no longer necessary.

@drmingdrmer
Copy link

drmingdrmer commented Dec 30, 2019

Yes, allEqual is still necessary even if you don't use seq. That's necessary for correct recovery of a command committed on the fast path. However, allCommitted is no longer necessary.

I am still confused about the above solution to get rid of seq:

Can I commit an instance on the fast-path if the leader receives 5/7 identical PreAcceptReplies(just deps, no seq in replies) and there are uncommitted command in deps ?

Because If the leader of the command a commits and then get lost,
just like the picture below,
the recovery procedure received PrepareReply from replicas 1234,
Then it seems there is no way to tell whether a->b is committed or a->c is committed.
(a depends on b and b is in PreAccepted status. So is c).

Either of these two values could have formed a fast-quorum(5 replicas): (replicas 12567 or replicas 34567).
image

Is there anything I missed to fix this case?

Thanks and happy new year:)

@imoraru
Copy link
Member

imoraru commented Dec 30, 2019

This is covered in the second-to-last paragraph of section 4.4. in the paper https://www.cs.cmu.edu/~dga/papers/epaxos-sosp2013.pdf: exactly because of this possibility, the optimized EPaxos only works if recovery can distinguish between a->b and a->c. To do so we can (a) make the leader send PreAccepts to only a quorum, or (b), send PreAccepts to all replicas, but commit on the fast path if a quorum agrees with the leader's original deps -- and record that they did. There could be other options as well.

In this particular case, recovery will notice the disagreement and conclude that 'a' was not committed on the fast path.

Happy new year :)

@pengsven
Copy link
Author

pengsven commented Jan 4, 2020

Sorry to reopen this issue, because re-reading the paper caused confusion:
what is the proof in the paper based on a or b ?

@pengsven pengsven reopened this Jan 4, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants