-
Notifications
You must be signed in to change notification settings - Fork 393
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Rack awareness support #967
Conversation
Thank-you for this. I've taken a quick look, and I want to make sure I understand this correctly.
Is my understanding correct? |
1, Yes, exactly. It should be used responsibly as the other options. I tested it across different ring-size/node count/locations combo, and it works quite well, but for example it won't optimize transfers between old and new ring. The aim was to not modify the original claiming algorithm, until one or more locations have set. |
The challenge is how much value there is, without any guarantee. Perhaps, it would be nice if there was a CLI command that would allow you to report on how location-safe the cluster plan is. Previously, some consideration had been given to resurrecting claim v3 to implement things like location awareness. The claim v3 algorithm was a radical change, to treat claim as an optimisation problem. With claim v3, if you weren't happy with the optimised plan - you could reject it and roll the dice again, giving some control to the operator. Location awareness is on the to-do list for riak-core-lite, perhaps that team may have some thoughts on this. I will add a link to the PR into the slack channel as well, to try and canvas some broader feedback. |
Cool, thanks. There is something what i'd like to clarify. Example:
First and last partitions are on the same node so it could a be a problem, but anyway it is a problem.
If you have proper node count/location setup, it will work: Example:
|
I don't know if you're seeing the CI failures, but there are dialyzer issues:
The next stage is riak_test testing. There is a test group defined for core tests. If you've not used riak_test before there's some setup instructions. There hasn't been any positive feedback via slack etc wrt pushing for this change. There is interest in a rack-awareness with stricter guarantees - however I think any complete solution is going to be such a radical change as to be unrealistic, compared to this relatively simple enhancement. There is no firm process for getting something like this approved at the moment. I can see positive aspects of the change, but would want a broader consensus behind it. @martincox do you have an opinion? |
Sorry, dialyzer issues fixed. |
Put riak_core in _checkouts and I did run the tests:
Before I write more tests, it would be nice to know whether to continue with this solution or not. |
I think it sounds like a good starting point, even with the caveat over a lack of strict guarantees. Potentially could be developed further in the future, if anyone can invest more time into it? Makes sense to me to adopt it as is, it looks to provide some improvement. I'd be happy to see it merged. 👍 |
@systream just a note on timescales. I'm going to proceed with release 3.0.4 this week without this change, as I need something fixed ASAP, but once we have a riak_test we're satisfied with, I will set off the process for producing a Riak 3.0.5 which includes this rack awareness PR. I have a few other things on for the next few days, but I might have some time towards the end of next week to help with a riak_test test if required. |
Okay, cool thanks! I have already started writing new tests in the rack_test, but unfortunately I had urgent tasks to finish. |
@martinsumner I wrote tests in riak test. I'm not totally sure is it appropriate, could you check it. Thanks. |
I will have a look later in the week. Thanks. |
I think I'm fine with this from a test and code perspective now. Sorry for the delay, I've been distracted with other things. @systream - for others to use it, a write-up would be helpful (just some markdown in docs/). Is this something you could do? @martincox - anything to add? |
Sure, I will do the docs |
What's the next step? Is there anything i need to do? |
Sorry, it is on me now. I will start work on finalising a release next week. Apologies for the delay. |
Sorry, I don't want you to rush, just got the feeling that i might need to do something. |
@systream - sorry, but some late suggestions for the docs. I think the instructions need to be extended to make it clear that locations need to be allocated sequentially in order for the claim to have the desired affect. That is to say: node1 -> loc1 should provide a diverse spread across locations, but the following mapping will not: node1 -> loc1 Also in order to check a planned, but uncommitted cluster change, this script would be useful:
|
It doesn't (or shouldn't) matter which node is where, as long as the numbers of distinct locations are good. |
Perhaps I did something wrong, but I've just checked it - and not allocating nodes in sequence caused an issue:
Whereas a sequential allocation had no such violations. I'm going to have a play around to see if I can see how this might not be working for me. |
Hang on ... I think I might have prematurely checked before transfers were complete! Sorry, my mistake. That was the case. |
In this PR location (you can also call it site or availability zone) has been introduced. When claiming a new ring the list of nodes is ordered taking into consideration the location of the individual nodes, in a manner that to adjacent nodes are preferably from different locations.