Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

handle segment granularity changes #52

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

pjain1
Copy link
Member

@pjain1 pjain1 commented Nov 19, 2015

With this PR tranquility can handle segment granularity changes without losing data and also prevents tasks (that spans the new and old segment interval) to hang up indefinitely. The basic idea used here is that we see an event and if the event timestamp falls in some existing beam interval we use that beam otherwise we try to create new beam. Implications of the following changes -

  • If the segment granularity in decreased then the new task with the changed segment granularity will not be spawned till the time existing tasks can handle the incoming events. For example, if at 10:12 (HH:mm) the granularity is changed from 5 minutes to 1 minute then until we get events with timestamp >= 10:15 no new tasks will be spawned.
  • If the granularity is increased then a partial beam (task having segment gran < the tuning segment gran) will be created to handle the events that falls in the interval not handled by currently active tasks for the new segment interval. For example, if there is a task handling interval 10:00 - 10:01 and seg gran is changed from 1 minute to 5 minutes at 10:00:12 (HH:mm:ss) then the next task will be created to handle the interval 10:01 - 10:05 instead of 10:00 - 10:05.

To achieve this functionality following design changes have been made -

  • Beam trait now exposes getInterval() which returns an optional interval as it may not make sense for all implementations of Beam
  • beams object in ClusteredBeam is now a reverse sorted list (by interval start) of beams known to us. This will be used while grouping the events to check if the event timestamp falls in the interval of the Beam. This should not degrade the performance as compared to doing look up in HashMap as most of the times the head item in the beams list would be the one that can handle the event, also truncating the event timestamp will be node be needed anymore.

Question - I am not sure why the beams object was ConcurrentHashMap instead of just HashMap ? As all the writes to beams are synchronized using beamWriteMonitor, I have used non-thread safe list for beams in this code.

Note - This is the first time I am writing Scala code so please feel free to point out mistakes

@pjain1
Copy link
Member Author

pjain1 commented Nov 20, 2015

Will reopen after investigating test failures

@pjain1 pjain1 closed this Nov 20, 2015
@pjain1 pjain1 reopened this Nov 30, 2015
@pjain1
Copy link
Member Author

pjain1 commented Nov 30, 2015

IMPORTANT NOTE - With this PR in case the segment granularity is increased between tranquility restarts, tranquility would require Druid to create a segment with interval != segment granular interval (Why? - Read first comment above). However, this is not supported by RealtimePlumber and after having discussion with @cheddar we thought that it will be a good to have thing when Druid moves to support kappa like architecture. So the PR might need to wait till this is supported.

@gianm What do you think about the PR in general ? Does the Appenderator stuff that you are working on supports creation of segments with such interval ?

@gianm
Copy link
Member

gianm commented Dec 1, 2015

@pjain1 the appenderators do support creation of variable-sized segments.

@pjain1
Copy link
Member Author

pjain1 commented Dec 1, 2015

👍 that's cool. Will wait till the Appenderator stuff is available in Druid

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants