Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support cluster purge #7

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

chguocloudant
Copy link

  • add 2 APIs(set_purge_seq, get_purge_seq).
  • add a test case to test the new APIs.

@rnewson
Copy link
Contributor

rnewson commented Nov 30, 2016

where is purgeSeq persisted to the index?

@@ -70,6 +70,7 @@ class IndexService(ctx: ServiceContext[IndexServiceArgs]) extends Service(ctx) w
var reader = DirectoryReader.open(ctx.args.writer, true)
var updateSeq = getCommittedSeq
var pendingSeq = updateSeq
var purgeSeq = 0L
Copy link
Contributor

@mayya-sharipova mayya-sharipova Nov 30, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why the default value is 0? Similarly to updateSeq it should be read from commitData on disk

@rnewson rnewson changed the title Support clouster purge Support cluster purge Nov 30, 2016
@@ -199,6 +206,26 @@ class IndexService(ctx: ServiceContext[IndexServiceArgs]) extends Service(ctx) w
}
}

private def commitPurgeSeq(newSeq: Long) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method does not set the committing flag and so could allow concurrent attempts to commit, leading to a broken index (one with the wrong values).

This work should instead be done in the preceding commit function.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean we can commit purgeSeq and updateSeq with commit function?
We can change commit function with 2 arguments to support it.

  private def commit(newSeq: Long, type: String) {
    if (type == "updateSeq") {
      ...
    } else {
      ...
    }

- add 2 APIs(set_purge_seq, get_purge_seq).
- keep purge_seq in clouseau.
- adjust the commit logic to update purge seq.
- update the test code.

BugzID: 68280
@chguocloudant chguocloudant force-pushed the 68280_clouseau_for_clustered_purge branch from 86f9452 to a8c3c8e Compare December 22, 2016 02:27
Copy link
Contributor

@rnewson rnewson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved, nicely done. Please don't merge until we have the matching dreyfus work at the same level in case it changes what we need here.

@@ -569,7 +580,8 @@ class IndexService(ctx: ServiceContext[IndexServiceArgs]) extends Service(ctx) w
('doc_count, reader.numDocs),
('doc_del_count, reader.numDeletedDocs),
('pending_seq, pendingSeq),
('committed_seq, getCommittedSeq)
('committed_seq, getCommittedSeq),
('purge_seq, pendingPurgeSeq)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be purgeSeq here not pendingPurgeSeq.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I will change it later.

case SetPurgeSeqMsg(newSeq: Long) =>
pendingPurgeSeq = newSeq
logger.debug("Pending purge sequence is now %d".format(newSeq))
commit(pendingSeq, pendingPurgeSeq)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @rnewson .
I triggered commit when the purgeSeq is updated. Will it cost too much?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And I think the var pendingPurgeSeq can be removed if the commit is allowed. Right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm, good question. I'd rather we always commit asynchronously and on the established interval. For users that purge a lot (and there will be a lot of them if this work is extended to allow 'hard' deletes) they'll be committing very frequently and this will hurt performance.

This means the purge seq message could be processed but not actioned, when clouseau crashes before committing, so we'll need to ensure that dreyfus will retry (it should, since it reads the purge seq from the index, but let's be sure).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @rnewson .
If we commit purgeSeq asynchronously and also the users purge a lot, the dreyfus will retry to delete the index again in clouseau.
For example, if dreyfus gets a purgeSeq(6) from couchdb and gets an IdxPurgeSeq(5) from clouseau, it will purge index from 5 and set purgeSeq to 6.
If clouseau is crashed before committing, dreyfus will get purgeSeq(5) again. And dreyfus will purge index from 5 again.
But I am not sure which is more expensive, delete index or commit?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can safely assume clouseau crashes are rare from our production experience.

commit is expensive, the most expensive operation there is in Lucene, I think.

So the choice is between an expensive commit for every purge or a less expensive delete on retry for every clouseau crash.

We should also not say "delete index" when we really mean "delete document". :)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. I think I can modify it later.
Let me make sure that we can remove the value pendingPurgeSeq, right? If the dreyfus calls get_purge_seq, clouseau will return purgeSeq directly(now this value maybe is not committed yet).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see how we can remove pending. we need to return the committed purge seq when asked and need to retain the pending value for when we commit.

    - adjust the commit logic to update purge seq.

    BugzID: 68280
@@ -117,6 +120,10 @@ class IndexService(ctx: ServiceContext[IndexServiceArgs]) extends Service(ctx) w
pendingSeq = newSeq
logger.debug("Pending sequence is now %d".format(newSeq))
'ok
case SetPurgeSeqMsg(newPurgeSeq: Long) =>
purgeSeq = newPurgeSeq
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be pendingPurgeSeq = newPurgeSeq. purgeSeq must only change after it's been committed.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. I will modify it.

- use pendingPurgeSeq in set_purge_seq and get_purge_seq.
- update purgeSeq when pengdingPurgeSeq is committed.

BugzId: 68280
Copy link
Contributor

@rnewson rnewson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 but let's merge once the matching dreyfus PR is ready.

@@ -117,6 +121,10 @@ class IndexService(ctx: ServiceContext[IndexServiceArgs]) extends Service(ctx) w
pendingSeq = newSeq
logger.debug("Pending sequence is now %d".format(newSeq))
'ok
case SetPurgeSeqMsg(newPurgeSeq: Long) =>
pendingPurgeSeq = newPurgeSeq
logger.debug("purge sequence is now %d".format(newPurgeSeq))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should say 'Pending purge sequence is now %d'

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, got it. I will modify it later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants