-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
buckyd and bucky don't agree on the carbon hashring #31
Comments
For the inconsistent count, I've patched the code to make it match the original one (https://github.com/jjneely/buckytools/blob/master/cmd/bucky/inconsistent.go#L64). Here are my results
Am I on the right track here? |
|
Hi @zerosoul13 Looks like something is really fishy with your setup, but I have no idea what exactly. Maybe @bom-d-van have any ideas? |
There was indeed a new change from this commit to make bucky tool use both the peer address and port number returned from the seed buckyd node rather than using just the hostname/fqdn/ip. |
I'm not doing so. I'm passing the hashring as set on carbon-relay-ng and expect to be able to use the buckyd/bucky tooling with the port 4242. Right now, I have to go with 1 or the other. The proposed configuration will make things work for us but it won't match against the hashring on
@bom-d-van, how would this hashring members match my carbon-relay-ng configuration if my original hashrigh from carbon-relay-ng uses another ports or will it not matter at all and still produce the same carbon hashring? The code I've edited so far, allows me to declare the hashring same as the original one (in my head it does make a difference but could be totally wrong) and still maintain connectivy to the buckyd cluster. We can take this as example: #31 (comment). The number of inconsistent metrics went down drastically versus the out of the unpatched code. |
I started chasing this issue as the count inconsistent metrics was quite high. I saw that the original code from |
Hi @zerosoul13 , the most importation thing is the order, rather than the hostname-port tuple. We just need to make sure that the order of buckyd daemon is the same as the go-carbon daemon list configured in carbon-relay-ng, and we are good. For the hashing logics, you can check this out: https://github.com/go-graphite/buckytools/blob/master/hashing/hashing.go#L337-L353 The list of peers of buckyd command is later used by buckytools to communicate with other bucyd peers for rebalancing and buckytools actually don't communicate with go-carbon for rebalancing. Therefore if you are running your buckyd on port 4242 on all instances, you need to reflect it in the buckyd command by using this:
|
Thank you for the guidance. I was mistaken on how the hashring was being created and the elements it took into account to build it. |
I think this is a real issue, not in your case because you are using carbon_ch algorithm that just use hostname/ip to calculate the hashring In my case I use fnv1a_ch that use both ip and port. In that case if a set the instances as : the cluster will be healthy but it will said that metrics are not at the right position |
Hi @Thorsieger I think fnv1a_ch is also counting on host order and is similar to carbon_ch (which uses md5 as hashing algorithm). https://github.com/go-graphite/buckytools/blob/master/hashing/fnv1a.go#L92-L100 Or do you think I'm missing something important? |
Yes the order is important, it use different hashing algorithm but so is data entry : For carbon_ch it use hostname (+instance) : https://github.com/go-graphite/buckytools/blob/master/hashing/hashing.go#L270-L275 fnv1a_ch use (hostname + port) OR instance : https://github.com/go-graphite/buckytools/blob/master/hashing/fnv1a.go#L34-L39 I first found more info here on carbon-c-relay git |
@Thorsieger yeah, I think you are right. Thanks for flagging it. Only jump hash is unaffected by the changes. but carbon_ch and fnv1a_ch are affected: https://github.com/go-graphite/buckytools/blob/master/hashing/hashing.go#L308-L309 cc @zerosoul13 I will reopen the ticket. |
Hi @zerosoul13 @Thorsieger I have pushed the revert in #32 . If it's possible, can you check if the changes work for your setup? |
#32 fix the issue for me :
bucky servers
bucky inconsistent -f
|
In my case, fix #32 allows me to declare the hashring the same way as I do with carbon-relay-ng similar to my previous posting which I like.
I'm still having trouble getting the cluster to a consistent status after rebalancing the metrics and trying to figure out why that is. |
Hi @zerosoul13 Does #32 fix return similar or the same stats as buckytools before mr #26 for rebalancing in your clusters? |
@bom-d-van below my results before
after
These values lead me to believe that there might be other factors affecting my cluster, I'm just not sure what exactly. |
Hi @zerosoul13 , thanks for checking it out. Since they are running on different dates, could it be just that your production data set are changing? If it's possible, you can dry run the rebalance command with an old release of buckytools and the changes in #32 to double check. |
Hello,
I've raised an issue on the incorrect repo and would like to bring it to the right one. Below my original post on
jjneely/buckytools
. The content below is just added for everyone to have context on my initial issue jjneely/buckytools#38I've found 2 issues which I would love to discuss:
BuckyD and bucky configuration
buckyd
will accept the members of the hashring via non-option cli arguments asbuckyd <graphite1:port> <graphite2:port> ...
.bucky
calls for the cluster configuration and it will getgraphite1:hashringport
instead ofgraphite1:4242
because of this mismatch, bucky won't be able to reach the buckyd membersI've tracked the issue to line https://github.com/go-graphite/buckytools/blob/master/cmd/bucky/cluster.go#L88. The the port value for the cluster member is set to the same port as the hashring one instead of
4242
(or whichever port is specified by user).To test this theory, I've forked and patched the code to set it to default 4242 and cluster is reported as healthy with the correct hashring values as below
Is this a real issue or just a misconfiguration on my side?
Inconsistent metric count will almost match active metric count
bucky
is reporting metrics as inconsistent on our cluster and the number is nearly the same as the active metrics one which is very odd. Taking a closer look, this line https://github.com/go-graphite/buckytools/blob/master/cmd/bucky/inconsistent.go#L69 does check the port values and these don't match because one is 2004 and the other is 4242.The original code does not take the ports into account, just the hostnames
https://github.com/jjneely/buckytools/blob/master/cmd/bucky/inconsistent.go#L64
Is my assumption that these rings won't match because of this correct?
The text was updated successfully, but these errors were encountered: