Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docker 1.12 services - After Galera scale down maxscale's autodiscovery runs into trouble #1

Open
Franselbaer opened this issue Oct 29, 2016 · 8 comments

Comments

@Franselbaer
Copy link

If you scale up and then scale down the galera cluster the auto discovery of maxscale runs into trouble.

The command used in entrypoint script:

getent hosts tasks.dbcluster

This delivers N cluster ip's from dbcluster correctly BUT:

If you do something like:

docker service scale dbcluster=10

and then:

docker service scale dbcluster=5

The instance list:

docker service ps dbcluster

shows something like:

ID                         NAME             IMAGE                    NODE     DESIRED STATE  CURRENT STATE          ERROR
0s4hgq9tm28xmp3padelhq258  dbcluster.1      toughiq/mariadb-cluster  doswa-5  Running        Running 17 hours ago
3f2b2q0rs4i2yzy92ohue7dlq  dbcluster.2      toughiq/mariadb-cluster  doswa-4  Running        Running 17 hours ago
2ks1kl7einrlnbzkh8aayz9oq   \_ dbcluster.2  toughiq/mariadb-cluster  doswa-4  Shutdown       Shutdown 17 hours ago
0xgbr3q3wavzkk5bvagby8xyu  dbcluster.3      toughiq/mariadb-cluster  doswa-4  Running        Running 17 hours ago
bdsbd10u203pjj2kyvawohw23   \_ dbcluster.3  toughiq/mariadb-cluster  doswa-3  Shutdown       Shutdown 17 hours ago
6m92mbed7hrc2w0cnwfn7c66d  dbcluster.4      toughiq/mariadb-cluster  doswa-5  Running        Running 17 hours ago
9ky7bh2wewsqgx0pptzjkpaqm   \_ dbcluster.4  toughiq/mariadb-cluster  doswa-5  Shutdown       Shutdown 17 hours ago
as90l1abljf8seojivtyu265y   \_ dbcluster.4  toughiq/mariadb-cluster  doswa-5  Shutdown       Shutdown 17 hours ago
2ms4ilr6hbh9fovjixc1a0npi  dbcluster.5      toughiq/mariadb-cluster  doswa-5  Shutdown       Shutdown 17 hours ago
aavba7zhv7y9z77vsgyaab03n   \_ dbcluster.5  toughiq/mariadb-cluster  doswa-4  Shutdown       Shutdown 17 hours ago
d1in2lunlab6qfj3p0kbks288  dbcluster.6      toughiq/mariadb-cluster  doswa-4  Shutdown       Shutdown 17 hours ago
btm75qwpa8oi1fg07qkvnpf9t   \_ dbcluster.6  toughiq/mariadb-cluster  doswa-4  Shutdown       Shutdown 17 hours ago
4ymbc2lwzf4dt1o7ooswilyrt  dbcluster.7      toughiq/mariadb-cluster  doswa-3  Running        Running 17 hours ago
c60ahb1mmtbjjzut0z31v2o3v  dbcluster.8      toughiq/mariadb-cluster  doswa-3  Shutdown       Shutdown 17 hours ago
1bk8o6eajfbwz668pkzv629g4   \_ dbcluster.8  toughiq/mariadb-cluster  doswa-5  Shutdown       Shutdown 17 hours ago
dc9j3annf9dn1aueo2n46i9lu  dbcluster.9      toughiq/mariadb-cluster  doswa-5  Shutdown       Shutdown 17 hours ago
5ke252yv31v9rajzsr3x8n9uc  dbcluster.10     toughiq/mariadb-cluster  doswa-4  Shutdown       Shutdown 17 hours ago

And the getent delivers in this case 5 cluster ip's also from instances in shutdown state.
Unfortunately docker swarm seems not to clean up shuttet down instances.
I'm currently not sure what is a good way around this.

@toughIQ
Copy link
Owner

toughIQ commented Oct 29, 2016

Hi @Franselbaer,
I saw similar problems with the cluster discovery itself. Sometimes, if you do scale-out and scale-in repeatedly, the new nodes wont find existing ones. Or the cluster might break apart, since not every node can reach all the other members. I am not sure if the problem is the Swarm DNS or the networking itself. Sometimes I had the overlay network attached to all nodes, but no communication over this net was possible.
In my opinion this problem is caused by Swarm and its DNS itself. The only way to prevent this would be to establish some kind of alternative service discovery. But this would make the whole idea obsolete, since DNS and service discovery should be an environmental feature, provided by the cluster management, and just consumed by the client/containers.
Which Docker version did you use when getting your results? I didnt try the current 1.12.3 version yet to see if this behavior still exists.

@joneschan
Copy link

I am facing the same problem on 1.12.3

@Franselbaer
Copy link
Author

I've testet this only with 1.12.3 because i startet into Docker with this version.

@danfromtitan
Copy link

@Franselbaer see moby/swarmkit#1372

@yunghoy
Copy link

yunghoy commented Nov 6, 2017

It's an old bug, but causing some critical cases.
It is one of bugs, you cannot use docker in live services.

  1. Deleting and changing network properties or name among swarm cluster, you can find your new network doesn't work properly.
    Old Created or Dead containers hold the network so that the old network to be preserved.

  2. Depleting the resources of your machine.

@till
Copy link

till commented Jan 23, 2020

Auto-discovery inside Swarm doesn't seem to work "at all" when the stack starts and there is a race condition between the cluster starting and MaxScale. Took me a while to figure this out.

@4n70w4
Copy link

4n70w4 commented Oct 8, 2020

Hi! Similar issue. toughiq/maxscale give error ERROR 1045 (28000): failed to create new session if one of toughiq/mariadb-cluster swarm nodes recreated.

Docker version 19.03.12, build 48a66213fe

@gonzalloe
Copy link

gonzalloe commented Jul 26, 2022

I had the similar issue on Swarm mode when I scaled up and down the db container.


Even after I scaled up the containers back , all containers are on up and running status, I always get this error ERROR 1045 (28000): failed to create new session.


Result from maxadmin -pmariadb list servers shows all the nodes are down as well even they are running on docker. Checked on galera.cnf, the wsrep-cluster-address is not update to the latest nodes' IP address, which means the new created nodes wont find existing ones.


I also found that the galera service and the splitter listeners are all down. Can't find a way to manually restart the service and listener.


Any solution until now?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants