Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Store ets backend data in it's own process so it will survive other process crashes #12

Open
andrewjstone opened this issue Oct 30, 2013 · 4 comments

Comments

@andrewjstone
Copy link
Owner

No description provided.

@d0rc
Copy link

d0rc commented Jan 3, 2014

Do you mean tables created at line 49 of https://github.com/andrewjstone/rafter/blob/master/src/rafter_backend_ets.erl, or those at lines 13 and 14 as well?

I can submit a patch with a gen_server to "heir" ETS tables on process crash if you like. (http://www.erlang.org/doc/man/ets.html#heir)

@andrewjstone
Copy link
Owner Author

Hi @d0rc

Thanks for the offer! I'm actually not sure about this issue I opened.
Typically it would make sense to keep all the ets tables alive with an
heir since they represent committed values. However, this requires also
keeping track of the last committed index in ets (or a supervisor) as well
so when logs are replayed on a restart they can start from where the last
index was committed. Right now, the ets tables get lost and logs get
replayed fully.

This change has some interesting correctness consequences that I haven't
thought through completely, so for now I'd say just leave it how it is. It
will be a performance optimization to be implemented later. This issue is
just to remind me in a few months :)

On Thu, Jan 2, 2014 at 8:55 PM, d0rc [email protected] wrote:

Do you mean tables created at line 49 of
https://github.com/andrewjstone/rafter/blob/master/src/rafter_backend_ets.erl,
or those at lines 13 and 14 as well?

I can submit a patch with a gen_server to "heir" ETS tables on process
crash if you like. (http://www.erlang.org/doc/man/ets.html#heir)


Reply to this email directly or view it on GitHubhttps://github.com//issues/12#issuecomment-31500689
.

@d0rc
Copy link

d0rc commented Jan 8, 2014

Ok, at least now I know you are going to get to performance optimizations
in a few month:) Sounds great.
I was also thinking about using basho/bitcask/ to store logs instead of
https://github.com/andrewjstone/rafter/blob/master/src/rafter_log.erl
Not sure why you've developed your own format.

2014/1/8 Andrew J. Stone [email protected]

Hi @d0rc

Thanks for the offer! I'm actually not sure about this issue I opened.
Typically it would make sense to keep all the ets tables alive with an
heir since they represent committed values. However, this requires also
keeping track of the last committed index in ets (or a supervisor) as well
so when logs are replayed on a restart they can start from where the last
index was committed. Right now, the ets tables get lost and logs get
replayed fully.

This change has some interesting correctness consequences that I haven't
thought through completely, so for now I'd say just leave it how it is. It
will be a performance optimization to be implemented later. This issue is
just to remind me in a few months :)

On Thu, Jan 2, 2014 at 8:55 PM, d0rc [email protected] wrote:

Do you mean tables created at line 49 of

https://github.com/andrewjstone/rafter/blob/master/src/rafter_backend_ets.erl,

or those at lines 13 and 14 as well?

I can submit a patch with a gen_server to "heir" ETS tables on process
crash if you like. (http://www.erlang.org/doc/man/ets.html#heir)


Reply to this email directly or view it on GitHub<
https://github.com/andrewjstone/rafter/issues/12#issuecomment-31500689>
.


Reply to this email directly or view it on GitHubhttps://github.com//issues/12#issuecomment-31759717
.

@andrewjstone
Copy link
Owner Author

There are some serious unnecessary downsides, and only minor upsides to using bitcask. First of all bitcask stores all keys in a hashtable in memory. This makes random access fast, but at a massive cost in RAM. This is unnecessary for a log that is almost always read sequentially and hence only needs to keep a few indexes cached at a time, for general operation. (The current code is less than optimal, but is relatively straightforward to improve). By not using bitcask we free up all the space for operational data, such as that stored in ets for the ets backend. That's the data that the clients care about. Additionally, bitcask does compactions that may be more resource intensive than just replaying the log into new ets tables and then snapshotting those. Replay can be done deterministically and is pausable.

The other reason I wrote my own is that I couldn't find anything out there that really did what I wanted as a standalone component. I could have attempted to rip a WAL out of postgres, but I've never looked at that codebase, and integrating with erlang probably would be a bitch. Additionally, Erlang's binary syntax kicks ass for writing protocols, schemas, file formats and the like. You can't beat it :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants