Store ets backend data in it's own process so it will survive other process crashes #12

andrewjstone · 2013-10-30T20:31:01Z

No description provided.

d0rc · 2014-01-03T01:55:40Z

Do you mean tables created at line 49 of https://github.com/andrewjstone/rafter/blob/master/src/rafter_backend_ets.erl, or those at lines 13 and 14 as well?

I can submit a patch with a gen_server to "heir" ETS tables on process crash if you like. (http://www.erlang.org/doc/man/ets.html#heir)

andrewjstone · 2014-01-07T17:40:52Z

Hi @d0rc

Thanks for the offer! I'm actually not sure about this issue I opened.
Typically it would make sense to keep all the ets tables alive with an
heir since they represent committed values. However, this requires also
keeping track of the last committed index in ets (or a supervisor) as well
so when logs are replayed on a restart they can start from where the last
index was committed. Right now, the ets tables get lost and logs get
replayed fully.

This change has some interesting correctness consequences that I haven't
thought through completely, so for now I'd say just leave it how it is. It
will be a performance optimization to be implemented later. This issue is
just to remind me in a few months :)

On Thu, Jan 2, 2014 at 8:55 PM, d0rc [email protected] wrote:

Do you mean tables created at line 49 of
https://github.com/andrewjstone/rafter/blob/master/src/rafter_backend_ets.erl,
or those at lines 13 and 14 as well?

I can submit a patch with a gen_server to "heir" ETS tables on process
crash if you like. (http://www.erlang.org/doc/man/ets.html#heir)

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/12#issuecomment-31500689
.

d0rc · 2014-01-08T02:51:00Z

Ok, at least now I know you are going to get to performance optimizations
in a few month:) Sounds great.
I was also thinking about using basho/bitcask/ to store logs instead of
https://github.com/andrewjstone/rafter/blob/master/src/rafter_log.erl
Not sure why you've developed your own format.

2014/1/8 Andrew J. Stone [email protected]

Hi @d0rc

Thanks for the offer! I'm actually not sure about this issue I opened.
Typically it would make sense to keep all the ets tables alive with an
heir since they represent committed values. However, this requires also
keeping track of the last committed index in ets (or a supervisor) as well
so when logs are replayed on a restart they can start from where the last
index was committed. Right now, the ets tables get lost and logs get
replayed fully.

This change has some interesting correctness consequences that I haven't
thought through completely, so for now I'd say just leave it how it is. It
will be a performance optimization to be implemented later. This issue is
just to remind me in a few months :)

On Thu, Jan 2, 2014 at 8:55 PM, d0rc [email protected] wrote:

Do you mean tables created at line 49 of

https://github.com/andrewjstone/rafter/blob/master/src/rafter_backend_ets.erl,

or those at lines 13 and 14 as well?

I can submit a patch with a gen_server to "heir" ETS tables on process
crash if you like. (http://www.erlang.org/doc/man/ets.html#heir)

—
Reply to this email directly or view it on GitHub<
https://github.com/andrewjstone/rafter/issues/12#issuecomment-31500689>
.

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/12#issuecomment-31759717
.

andrewjstone · 2014-01-08T07:17:57Z

There are some serious unnecessary downsides, and only minor upsides to using bitcask. First of all bitcask stores all keys in a hashtable in memory. This makes random access fast, but at a massive cost in RAM. This is unnecessary for a log that is almost always read sequentially and hence only needs to keep a few indexes cached at a time, for general operation. (The current code is less than optimal, but is relatively straightforward to improve). By not using bitcask we free up all the space for operational data, such as that stored in ets for the ets backend. That's the data that the clients care about. Additionally, bitcask does compactions that may be more resource intensive than just replaying the log into new ets tables and then snapshotting those. Replay can be done deterministically and is pausable.

The other reason I wrote my own is that I couldn't find anything out there that really did what I wanted as a standalone component. I could have attempted to rip a WAL out of postgres, but I've never looked at that codebase, and integrating with erlang probably would be a bitch. Additionally, Erlang's binary syntax kicks ass for writing protocols, schemas, file formats and the like. You can't beat it :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Store ets backend data in it's own process so it will survive other process crashes #12

Store ets backend data in it's own process so it will survive other process crashes #12

andrewjstone commented Oct 30, 2013

d0rc commented Jan 3, 2014

andrewjstone commented Jan 7, 2014

d0rc commented Jan 8, 2014

andrewjstone commented Jan 8, 2014

Store ets backend data in it's own process so it will survive other process crashes #12

Store ets backend data in it's own process so it will survive other process crashes #12

Comments

andrewjstone commented Oct 30, 2013

d0rc commented Jan 3, 2014

andrewjstone commented Jan 7, 2014

d0rc commented Jan 8, 2014

andrewjstone commented Jan 8, 2014