-
-
Notifications
You must be signed in to change notification settings - Fork 157
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ServerSelectionTimeoutError when services connect to mongo. #240
Comments
I got this problem too. I haven't solved it |
Same issue being faced here !! Unable to find a solution !! |
Back to previous version(st2-docker-3.6) can solve it. |
Same issue while trying to start st2 3.7.0 docker images, running on:
However, in my case going back to 3.6.0 did not help either, running these containers produces same issue:
Attached logs from containers. |
I would check if If mongo client could connect to the DB, while st2 cannot - it's a stackstorm problem. |
I've actually spent a lot of time back in September trying to solve this issue without any definitive answers. On Ubuntu VMs I encounter the issue, but in WSL2 running Rancher desktop no issues. Between me and my team, we were able to replicate on Ubuntu 20.04 and 22.04. From my notes, the issue presents sometime between dockerd 20.10.13 and 20.10.17. Generally, I have to reboot after downgrading to resolve, and if I upgrade after it works once, I'm good. But on a fresh install, problems. Ultimately, the issue presents as a network race condition. It's not a DNS timeout, that has a different error. I also tried adding a script to sleep before starting things like st2actionrunner to try and give time for the network to establish between st2actionrunner container and mongo. Here's what I know: When the issue occurs, the error message is relatively vague:
If you manage to shell into the container before it exits, you'll find that the mongo instance is reachable. I had a mongocheck.py script that I would mount in the container and run in before the container was restarted. Here is the script. In it, I have the errors you see if you have bad dns or the port is unreachable. import pymongo
# >>> client = pymongo.MongoClient("afakehost", 27017) # invalid host
# >>> db = client.st2
# >>> db.my_collection.insert_one({"x": 10}).inserted_id
# pymongo.errors.ServerSelectionTimeoutError: afakehost:27017: [Errno -3] Temporary failure in name resolution, Timeout: 30s, Topology Description: <TopologyDescription id: 6308e86ed939509aa52e4eeb, topology_type: Single, servers: [<ServerDescription ('afakehost', 27017) server_type: Unknown, rtt: None, error=AutoReconnect('afakehost:27017: [Errno -3] Temporary failure in name resolution')>]>
# >>> client = pymongo.MongoClient("mongo", 27019) # invalid port
# >>> db = client.st2
# >>> db.my_collection.insert_one({"x": 10}).inserted_id
mongohost = "mongo"
mongoport = 27017
print(f"Connecting to {mongohost}:{mongoport}")
client = pymongo.MongoClient("mongo", 27017)
db = client.st2
inserted = db.my_collection.insert_one({"x": 10}).inserted_id
print(f"inserted id of {inserted}") Here 50.10.13 of the docker-ce package had it working for me. I had downgraded from 20.10.17 and my docker client is still on that version. hogenj@devg:~/projects/st2-docker$ apt-cache madison docker-ce
docker-ce | 5:20.10.17~3-0~ubuntu-jammy | https://download.docker.com/linux/ubuntu jammy/stable amd64 Packages
docker-ce | 5:20.10.16~3-0~ubuntu-jammy | https://download.docker.com/linux/ubuntu jammy/stable amd64 Packages
docker-ce | 5:20.10.15~3-0~ubuntu-jammy | https://download.docker.com/linux/ubuntu jammy/stable amd64 Packages
docker-ce | 5:20.10.14~3-0~ubuntu-jammy | https://download.docker.com/linux/ubuntu jammy/stable amd64 Packages
docker-ce | 5:20.10.13~3-0~ubuntu-jammy | https://download.docker.com/linux/ubuntu jammy/stable amd64 Packages
sudo apt install docker-ce=5:20.10.13~3-0~ubuntu-jammy
hogenj@devg:~/projects/st2-docker$ docker --version
Docker version 20.10.17, build 100c701
hogenj@devg:~/projects/st2-docker$ docker-compose --version
docker-compose version 1.29.2, build unknown
hogenj@devg:~/projects/st2-docker$ dockerd --version
Docker version 20.10.13, build 906f57f Since it's been a while and there are newer versions of docker-ce out, I'm going to try upgrading my dev vm and see where things stand. |
After updates Docker version 20.10.21, build baeda1f I made some changes to st2api to try and work around any race conditions st2api:
image: ${ST2_IMAGE_REPO:-stackstorm/}st2api:${ST2_VERSION:-latest}
...
command: /startst2api.sh
# command: /mongocheck2.py
volumes:
- ./scripts/st2client-startup.sh:/st2client-startup.sh:ro
- ./files/mongocheck.py:/mongocheck.py:rw
- ./files/mongocheck2.py:/mongocheck2.py:rw
- ./files/startst2api.sh:/startst2api.sh:rw
... startst2api.sh is: #!/bin/bash
echo "sleeping 30 seconds"
sleep 30
echo "starting st2api"
/opt/stackstorm/st2/bin/st2api --config-file=/etc/st2/st2.conf --config-file=/etc/st2/st2.docker.conf --config-file=/etc/st2/st2.user.conf If I start that, I get the vague server selection timeout: hogenj@devg:~/projects/st2-docker$ docker-compose up st2api
st2-docker_mongo_1 is up-to-date
st2-docker_redis_1 is up-to-date
st2-docker_rabbitmq_1 is up-to-date
Starting st2-docker_st2makesecrets_1 ... done
Recreating st2-docker_st2api_1 ... done
Attaching to st2-docker_st2api_1
st2api_1 | sleeping 30 seconds
st2api_1 | starting st2api
st2api_1 | 2022-11-21 15:08:05,436 INFO [-] Using Python: 3.8.10 (/opt/stackstorm/st2/bin/python)
st2api_1 | 2022-11-21 15:08:05,436 INFO [-] Using fs encoding: utf-8, default encoding: utf-8, locale: en_US.UTF-8, LANG env variable: en_US.UTF-8, PYTHONIOENCODING env variable: notset
st2api_1 | 2022-11-21 15:08:05,436 INFO [-] Using config files: /etc/st2/st2.conf,/etc/st2/st2.docker.conf,/etc/st2/st2.user.conf
st2api_1 | 2022-11-21 15:08:05,437 INFO [-] Using logging config: /etc/st2/logging.api.gunicorn.conf
st2api_1 | 2022-11-21 15:08:05,437 INFO [-] Using coordination driver: redis
st2api_1 | 2022-11-21 15:08:05,437 INFO [-] Using metrics driver: noop
st2api_1 | 2022-11-21 15:08:05,461 INFO [-] Connecting to database "st2" @ "mongo:27017" as user "None".
st2api_1 | 2022-11-21 15:08:08,484 ERROR [-] Failed to connect to database "st2" @ "mongo:27017" as user "None": No servers found yet, Timeout: 3.0s, Topology Description: <TopologyDescription id: 637b945500772fa92caefd9a, topology_type: Single, servers: [<ServerDescription ('mongo', 27017) server_type: Unknown, rtt: None>]>
(snip) however...If I more or less create the code, even withtout a sleep, that works. files/mongocheck2.py #!/opt/stackstorm/st2/bin/python
import pymongo
import mongoengine
from pymongo.errors import OperationFailure
from pymongo.errors import ConnectionFailure
from pymongo.errors import ServerSelectionTimeoutError
import logging
logger = logging.getLogger()
logger.setLevel(logging.DEBUG)
from oslo_config import cfg
# import st2common.config as common_config
from st2api import config
config.register_opts(ignore_errors=True)
import st2common.models.db as db
db.LOG.setLevel(logging.DEBUG)
db_host = "mongo"
db_port = 27017
db_name = "st2"
username = None
password = None
connection_timeout = 3000 # ms
ssl_kwargs = {}
compressor_kwargs = {}
print('running _db_connect')
connection = db._db_connect(db_name, db_host, db_port)
# NOTE: We intentionally set "serverSelectionTimeoutMS" to 3 seconds. By default it's set to
# 30 seconds, which means it will block up to 30 seconds and fail if there are any SSL related
# or other errors
connection_timeout = cfg.CONF.database.connection_timeout
connection = mongoengine.connection.connect(
db_name,
host=db_host,
port=db_port,
tz_aware=True,
# alias='foo',
username=username,
password=password,
connectTimeoutMS=connection_timeout,
serverSelectionTimeoutMS=connection_timeout,
**ssl_kwargs,
**compressor_kwargs,
)
# NOTE: Since pymongo 3.0, connect() method is lazy and not blocking (always returns success)
# so we need to issue a command / query to check if connection has been
# successfully established.
# See http://api.mongodb.com/python/current/api/pymongo/mongo_client.html for details
try:
# The ping command is cheap and does not require auth
# https://www.mongodb.com/community/forums/t/how-to-use-the-new-hello-interface-for-availability/116748/
connection.admin.command("ping")
except (ConnectionFailure, ServerSelectionTimeoutError) as e:
# NOTE: ServerSelectionTimeoutError can also be thrown if SSLHandShake fails in the server
# Sadly the client doesn't include more information about the error so in such scenarios
# user needs to check MongoDB server log
print(f'Failed to connect to database connected to database "{db_name}" @ "{db_host}:{db_port}" as user "{username}".')
raise e
print(f'Successfully connected to database "{db_name}" @ "{db_host}:{db_port}" as user "{username}".')
# connection.close() and I set that as the command in st2api
ogenj@devg:~/projects/st2-docker$ docker-compose up st2api
Starting st2-docker_st2makesecrets_1 ...
st2-docker_mongo_1 is up-to-date
st2-docker_rabbitmq_1 is up-to-date
Starting st2-docker_st2makesecrets_1 ... done
Starting st2-docker_st2api_1 ... done
Attaching to st2-docker_st2api_1
st2api_1 | running _db_connect
st2api_1 | Successfully connected to database "st2" @ "mongo:27017" as user "None".
st2api_1 | Inserting one using pymongo
st2api_1 | inserted id of 637ba0ecbdf365d52a98264e
st2-docker_st2api_1 exited with code 0 And because this is already a wall of text, I'm providing mongocheck2.py as a gist. It's basically a copy of the st2common db connection. One interesting note which might not be related. After running mongocheck2.py, I started getting an error about not running disconnect first. That's why in mongocheck2.py, I added a line where
|
First, it's OK when stackstorm tries to connect to MongoDB which is not yet available and fail. It'll always retry until establishing a successful connection. The StackStorm services could work in 2 different ways when it comes to DB backend connection:
In Docker/K8s we override these settings because restarting on failure and letting the orchestrators handle the container restarts is the better way for these systems to control the recovery, liveness, and availability: If you'd like to retry the connection until it's successful in an existing running process, you can revert the default st2 settings here: What are the benefits of a middle-man script that waits for MongoDB to be up & running if st2 can handle reconnection itself? |
I do apologize. I do need to remember when uploading a lot of troubleshooting that I should summarize it some.
Per your suggestion, I went ahead and updated my retry configurations.
Running this has it retry multiple times, and the container does exit after a minute. tail end
|
I ran into the same issue. Somewhere, buried ONLY in the first attempt to start st2 (and incidentally mongodb) was the following error
Clearing up my volumes to ensure I didn't have compatibility issues was a way to resolve this for me. OR migrate your existing data from your current version to mongo:4.4 you could try to I noticed mongo was dying with exit code 14, each time it would respawn it would print some log (but not all of it) UPDATE: In my case, other services had a problem connecting mongo and failed with |
Problem
I tried to install StackStorm on Docker using this repository's guide without any changes done to the files.
After executing
docker-compose
, every service that depends on themongo
service, and tries to connect to its database throws the exceptionServerSelectionTimeoutError
:After doing some digging, the source of this issue the exception
ServerSelectionTimeoutError
is caught in line 205 of this file https://github.com/StackStorm/st2/blob/master/st2common/st2common/models/db/init.py.Some services keep restarting indefinitely due to the
restart: on-failure
option and onlyst2-docker_st2web_1
andst2-docker_st2client_1
stay up, but they are unusable.Anyone else encountered this error?
Versions
How to reproduce
Follow the README of this repository.
The text was updated successfully, but these errors were encountered: