Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

All slaves down, but no fallback to the master #186

Open
darksoul42 opened this issue Dec 30, 2017 · 4 comments
Open

All slaves down, but no fallback to the master #186

darksoul42 opened this issue Dec 30, 2017 · 4 comments

Comments

@darksoul42
Copy link

I have a Redmine cluster with one master and two slaves running Postgres 9.5 on FreeBSD servers, and everything works as expected when everything is up. (Updates go to the master, selects go to the slave)

However, if one slave node goes down, I get a set_client_encoding error (though I suspect this is due to how the "pg" gem handles its own errors), and this does not happen when not defining "encoding". I could get that "invalid encoding error" to be gracefully handled, but this only revealed an underlying issue.

If both nodes are down or blacklisted, it seems 0.3.9 never falls back to the master, and retries forever only on the slaves, leading to an application timeout, meaning things do not work as advertised in the README.

I was wondering if there shouldn't be a tunable to say whether one wants to fallback to the master or not? I did look in the source code but could not find it.

(Also, as a side-note, if the master is down, given that Redmine requires updating stuff like authentication tokens, only having a slave alive is not enough)

Here is my database.yml :

production:
  adapter: postgresql_makara
  database: redmine
  username: redmine
  encoding: utf-8
  pool: 10
  makara:
    master_ttl: 10
    sticky: true
    connections:
      - role: master
        host: master.host
      - role: slave
        host: slave1.host
      - role: slave
        host: slave2.host
    connection_error_matchers:
      - '/invalid encoding name/'
@darksoul42
Copy link
Author

I could narrow it down to non-select queries (i.e queries absolutely requiring a master) trying to be executed at this point in proxy.rb :

    def any_connection
      @master_pool.provide do |con|
        yield con
      end
    rescue ::Makara::Errors::AllConnectionsBlacklisted, ::Makara::Errors::NoConnectionsAvailable
      begin
        @master_pool.disabled = true
        @slave_pool.provide do |con|
          yield con
        end
      ensure
        @master_pool.disabled = false
      end
    end

Either I end up with an error that leads to blacklisting of the master node and since there are no alive slaves, it completely falls flat, either it just endlessly stalls, probably because it can't find a live master that it "can" use. (It should be noted that restarting one slave node instantly restores functionality)

I wonder if this is not a case of refusing to use the same context because of the current strategy/stickiness logic, but I didn't dive deep in the internals yet so I can't confirm this, but it really feels like it tries to avoid using the master for "update" queries (or anything not matching the appropriate regexp) when it has already been used for "select" as a fallback, until a slave comes back. I can also confirm it tries with insistance to connect to the slaves before getting a connection refused. (This might be even more troublesome if the host was down and it had to time out...)

@fmundaca
Copy link

Hello, did you solve this ? apparently i'm experiencing the same problem

Thxs !

@NKeerthi
Copy link

@darksoul42 one way I solved this problem is by setting slave_strategy: failover which falls back to master when slave connection is lost.
Setup: I tried this on my local machine by setting read only user on master and killing the mysql to slave user to check if it falls back to master.
There is also another way of solving this by using connection_error_matchers as described in read me. You can list known errors, which will help in blacklisting the node.
eg:

connection_error_matchers:
      - '/Query execution was interrupted/'
      - '/Access denied/'

@psahni
Copy link

psahni commented Jan 18, 2024

@NKeerthi
Will this fallback to master

  • '/Query execution was interrupted/'
  • '/Access denied/

Are these errors specifically to handle blacklisting

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants