Skip to content
This repository has been archived by the owner on Aug 7, 2020. It is now read-only.

Using volume to map SSL certificates can cause certificate mismatch #122

Open
1 task done
VincentSit opened this issue Jul 31, 2019 · 8 comments
Open
1 task done
Assignees
Labels

Comments

@VincentSit
Copy link

VincentSit commented Jul 31, 2019

Make sure you read and understand http://emqtt.io/docs/v2/index.html.
Use one of the two templates below and delete the rest.

  • BUG REPORT

BUG REPORT

For detailed steps, please check out the Test code below. My question is whether this problem is caused by my incorrect configuration or it's a bug in emqx. I tested all versions of 3.x and the issue exists.

BTW, I want to ask what this means?

[Alarm Handler] New Alarm: cpu_high_watermark, Alarm Info: 100.0

Expected behavior

Can be successfully connected via SSL.

Actual behavior

emqx reported that the certificate does not match.

Test code

First attempt (Failed) I have an instance of emqx running in the docker container, configured as follows:
emqx:
    container_name: my_emqx
    image: emqx/emqx:v3.2.1
    restart: always
    ports:
      - "1883:1883"
      - "8883:8883"
      - "18083:18083"
      - "18084:18084"
    volumes:
      - $PWD/config/emqx/certs:/opt/emqx/etc/certs:rw
      - $PWD/config/emqx/add-user.sh:/opt/emqx/add-user.sh:ro
    networks:
      - frontend
      - backend
    environment:
      - TZ=Asia/Shanghai
      - EMQX_LOADED_PLUGINS="emqx_management emqx_recon emqx_retainer emqx_dashboard emqx_auth_username"
      - EMQX_LOG__LEVEL=debug
      - EMQX_ALLOW_ANONYMOUS=false
      - EMQX_ACL_NOMATCH=deny
      - EMQX_MQTT__MAX_PACKET_SIZE=10MB
      - EMQX_LISTENER__SSL__EXTERNAL__TLS_VERSIONS=tlsv1.2,tlsv1.1,tlsv1
      - EMQX_LISTENER__SSL__EXTERNAL__DHFILE=etc/certs/dh-params.pem
      - EMQX_LISTENER__SSL__EXTERNAL__VERIFY=verify_peer
      - EMQX_LISTENER__SSL__EXTERNAL__FAIL_IF_NO_PEER_CERT=true
      - EMQX_LISTENER__SSL__EXTERNAL__REUSE_SESSIONS=on
      - EMQX_LISTENER__SSL__EXTERNAL__KEYFILE=etc/certs/server.key
      - EMQX_LISTENER__SSL__EXTERNAL__CERTFILE=etc/certs/server.pem
      - EMQX_LISTENER__SSL__EXTERNAL__CACERTFILE=etc/certs/ca.pem

Here is the log:

[vagrant@localhost docker]$ docker-compose up --build emqx
Creating network "docker_backend" with the default driver
Creating network "docker_frontend" with the default driver
Creating my_emqx ... done
Attaching to my_emqx
my_emqx  | listener.ssl.external.dhfile=etc/certs/dh-params.pem
my_emqx  | node.max_ports=1048576
my_emqx  | listener.ssl.external.verify=verify_peer
my_emqx  | listener.tcp.external.acceptors=64
my_emqx  | log.level=debug
my_emqx  | mqtt.max_packet_size=10MB
my_emqx  | listener.ssl.external.certfile=etc/certs/server.pem
my_emqx  | listener.ssl.external.fail_if_no_peer_cert=true
my_emqx  | allow_anonymous=false
my_emqx  | listener.ssl.external.acceptors=32
my_emqx  | listener.ssl.external.keyfile=etc/certs/server.key
my_emqx  | node.process_limit=2097152
my_emqx  | node.max_ets_tables=2097152
my_emqx  | listener.ws.external.acceptors=16
my_emqx  | listener.ssl.external.cacertfile=etc/certs/ca.pem
my_emqx  | listener.ssl.external.reuse_sessions=on
my_emqx  | [email protected]
my_emqx  | listener.ssl.external.tls_versions=tlsv1.2,tlsv1.1,tlsv1
my_emqx  | acl_nomatch=deny
my_emqx  | EMQX_LOADED_PLUGINS="emqx_management emqx_recon emqx_retainer emqx_dashboard emqx_auth_username"
my_emqx  | emqx v3.2.1 is started successfully!
my_emqx  |               {mfargs,{prometheus_histogram,start_link,[]}},
my_emqx  |               {restart_type,permanent},
my_emqx  |               {shutdown,5000},
my_emqx  |               {child_type,worker}]
my_emqx  | 2019-07-31 15:48:34.675 [info] application: prometheus
my_emqx  |     started_at: '[email protected]'
my_emqx  | 2019-07-31 15:48:34.682 [info] application: luerl
my_emqx  |     started_at: '[email protected]'
my_emqx  | 2019-07-31 15:48:34.682 [info] application: xmerl
my_emqx  |     started_at: '[email protected]'
my_emqx  | ['2019-07-31T15:48:40Z']:emqx start

As you can see that it has been successfully run. At this point I enter the container and add a few users. Because you changed this mechanism, it is a little troublesome.

[vagrant@localhost ~]$ docker exec -it my_emqx sh
/opt/emqx $ sh add-user.sh
init terminating in do_boot ({terminated,[{io,format,[<0.63.0>,[_],[_]],[]},{escript,start,1,[{_},{_}]},{init,start_em,1,[]},{init,do_boot,3,[]}]})

Crash dump is being written to: erl_crash.dump...done
ok
ok
ok
ok
/opt/emqx $

The contents of the add-user.sh script are as follows (sensitive information has been modified):

#!/bin/sh

set -e

ctl_cmd=/opt/emqx/bin/emqx_ctl

addUser() {
	$ctl_cmd users add u1 p1
	$ctl_cmd users add u2 p2
	$ctl_cmd users add u3 p3
	$ctl_cmd users add u4 p4
}

if $ctl_cmd users | grep -q "users list" 
then
	addUser
else
	$ctl_cmd plugins load emqx_auth_username
	addUser
fi

At this point emqx output log is as follows:

my_emqx  | 2019-07-31 15:49:12.650 [error] [Ctl] CMD Error:terminated, Stacktrace:[{io,format,
my_emqx  |                                          [<41476.63.0>,"~-48s# ~s~n",
my_emqx  |                                           ["users update <Username> <NewPassword>",
my_emqx  |                                            "Update User"]],
my_emqx  |                                          []},
my_emqx  |                                         {emqx_cli,'-usage/1-fun-0-',1,
my_emqx  |                                          [{file,
my_emqx  |                                            "/emqx_rel/_build/emqx/lib/emqx/src/emqx_cli.erl"},
my_emqx  |                                           {line,32}]},
my_emqx  |                                         {lists,map,2,
my_emqx  |                                          [{file,"lists.erl"},{line,1239}]},
my_emqx  |                                         {lists,map,2,
my_emqx  |                                          [{file,"lists.erl"},{line,1239}]},
my_emqx  |                                         {emqx_ctl,run_command,2,
my_emqx  |                                          [{file,
my_emqx  |                                            "/emqx_rel/_build/emqx/lib/emqx/src/emqx_ctl.erl"},
my_emqx  |                                           {line,80}]},
my_emqx  |                                         {rpc,'-handle_call_call/6-fun-0-',5,
my_emqx  |                                          [{file,"rpc.erl"},{line,197}]}]

I don't know what it means, so I choose to ignore it because it works fine.

At this point I try to connect to the instance, I use MQTT.fx as the test client, the connection is successful, the emqx log is as follows:

my_emqx  | 2019-07-31 15:52:15.187 [debug] 192.168.50.3:60639 [Channel] RECV <<16,67,0,4,77,81,84,84,4,194,0,60,0,14,77,81,84,84,95,70,88,95,
my_emqx  |                  67,108,105,101,110,116,0,21,114,111,121,98,105,45,109,113,116,
my_emqx  |                  116,64,110,111,100,101,115,101,114,118,101,114,0,16,120,56,77,
my_emqx  |                  80,69,107,81,86,97,53,70,55,51,115,114,81>>
my_emqx  | 2019-07-31 15:52:15.188 [debug] 192.168.50.3:60639 [Protocol] RECV CONNECT(Q0, R0, D0, ClientId=MQTT_FX_Client, ProtoName=MQTT, ProtoVsn=4, CleanStart=true, KeepAlive=60, Username=u1, Password=******)
my_emqx  | 2019-07-31 15:52:15.188 [debug] [email protected]:60639 [Protocol] SEND CONNACK(Q0, R0, D0, AckFlags=0, ReasonCode=0)

Then I disconnected, the emqx log is as follows:

my_emqx  | 2019-07-31 15:54:08.568 [debug] [email protected]:60639 [Channel] RECV <<224,0>>
my_emqx  | 2019-07-31 15:54:08.568 [debug] [email protected]:60639 [Protocol] RECV DISCONNECT(Q0, R0, D0, ReasonCode=0)
my_emqx  | 2019-07-31 15:54:08.568 [debug] [email protected]:60639 [Channel] Terminated for normal
my_emqx  | 2019-07-31 15:54:08.569 [info] [email protected]:60639 [Protocol] Shutdown for normal

At this point I tried to use the SSL connection, the connection failed, I guarantee that the client configuration is correct, the emqx log is as follows:

my_emqx  | 2019-07-31 15:54:49.943 [error] crasher:
my_emqx  |     initial call: emqx_channel:init/1
my_emqx  |     pid: <0.2148.0>
my_emqx  |     registered_name: []
my_emqx  |     exception error: no match of right hand side value
my_emqx  |                      {error,
my_emqx  |                          {ssl_error,
my_emqx  |                              {options,
my_emqx  |                                  {cacertfile,"etc/certs/ca.pem",
my_emqx  |                                      {error,efault}}}}}
my_emqx  |       in function  emqx_channel:init/1 (/emqx_rel/_build/emqx/lib/emqx/src/emqx_channel.erl, line 152)
my_emqx  |     ancestors: [<0.1719.0>,<0.1718.0>,esockd_sup,<0.1354.0>]
my_emqx  |     message_queue_len: 0
my_emqx  |     messages: []
my_emqx  |     links: [<0.1719.0>]
my_emqx  |     dictionary: []
my_emqx  |     trap_exit: true
my_emqx  |     status: running
my_emqx  |     heap_size: 4185
my_emqx  |     stack_size: 27
my_emqx  |     reductions: 5426
my_emqx  |   neighbours:
my_emqx  | 2019-07-31 15:54:49.943 [error] supervisor: 'esockd_connection_sup - <0.1719.0>'
my_emqx  |     errorContext: connection_crashed
my_emqx  |     reason: {{badmatch,
my_emqx  |                  {error,
my_emqx  |                      {ssl_error,
my_emqx  |                          {options,
my_emqx  |                              {cacertfile,"etc/certs/ca.pem",
my_emqx  |                                  {error,efault}}}}}},
my_emqx  |              [{emqx_channel,init,1,
my_emqx  |                   [{file,
my_emqx  |                        "/emqx_rel/_build/emqx/lib/emqx/src/emqx_channel.erl"},
my_emqx  |                    {line,152}]},
my_emqx  |               {proc_lib,init_p_do_apply,3,
my_emqx  |                   [{file,"proc_lib.erl"},{line,249}]}]}
my_emqx  |     offender: [{pid,<0.2148.0>},
my_emqx  |                {name,connection},
my_emqx  |                {mfargs,{emqx_channel,start_link,
my_emqx  |                                      [[{deflate_options,[]},
my_emqx  |                                        {max_conn_rate,500},
my_emqx  |                                        {active_n,100},
my_emqx  |                                        {zone,external}]]}}]
my_emqx  | 2019-07-31 15:54:49.946 [error] crasher:
my_emqx  |     initial call: emqx_channel:init/1
my_emqx  |     pid: <0.2152.0>
my_emqx  |     registered_name: []
my_emqx  |     exception error: no match of right hand side value
my_emqx  |                      {error,
my_emqx  |                          {ssl_error,
my_emqx  |                              {options,
my_emqx  |                                  {cacertfile,"etc/certs/ca.pem",
my_emqx  |                                      {error,efault}}}}}
my_emqx  |       in function  emqx_channel:init/1 (/emqx_rel/_build/emqx/lib/emqx/src/emqx_channel.erl, line 152)
my_emqx  |     ancestors: [<0.1719.0>,<0.1718.0>,esockd_sup,<0.1354.0>]
my_emqx  |     message_queue_len: 0
my_emqx  |     messages: []
my_emqx  |     links: [<0.1719.0>]
my_emqx  |     dictionary: []
my_emqx  |     trap_exit: true
my_emqx  |     status: running
my_emqx  |     heap_size: 4185
my_emqx  |     stack_size: 27
my_emqx  |     reductions: 5413
my_emqx  |   neighbours:
my_emqx  | 2019-07-31 15:54:49.946 [error] supervisor: 'esockd_connection_sup - <0.1719.0>'
my_emqx  |     errorContext: connection_crashed
my_emqx  |     reason: {{badmatch,
my_emqx  |                  {error,
my_emqx  |                      {ssl_error,
my_emqx  |                          {options,
my_emqx  |                              {cacertfile,"etc/certs/ca.pem",
my_emqx  |                                  {error,efault}}}}}},
my_emqx  |              [{emqx_channel,init,1,
my_emqx  |                   [{file,
my_emqx  |                        "/emqx_rel/_build/emqx/lib/emqx/src/emqx_channel.erl"},
my_emqx  |                    {line,152}]},
my_emqx  |               {proc_lib,init_p_do_apply,3,
my_emqx  |                   [{file,"proc_lib.erl"},{line,249}]}]}
my_emqx  |     offender: [{pid,<0.2152.0>},
my_emqx  |                {name,connection},
my_emqx  |                {mfargs,{emqx_channel,start_link,
my_emqx  |                                      [[{deflate_options,[]},
my_emqx  |                                        {max_conn_rate,500},
my_emqx  |                                        {active_n,100},
my_emqx  |                                        {zone,external}]]}}]
Second attempt (Successful)

Based on the configuration of the first attempt, I removed the following from volume:

- $PWD/config/emqx/certs:/opt/emqx/etc/certs:rw

Then execute the following command:

[vagrant@localhost docker]$ docker-compose down -v
Removing my_emqx ... done
Removing network docker_backend
Removing network docker_frontend
[vagrant@localhost docker]$ docker-compose up --build emqx
Creating network "docker_backend" with the default driver
Creating network "docker_frontend" with the default driver
Creating my_emqx ... done
Attaching to my_emqx
my_emqx  | listener.ssl.external.dhfile=etc/certs/dh-params.pem
my_emqx  | node.max_ports=1048576
my_emqx  | listener.ssl.external.verify=verify_peer
my_emqx  | listener.tcp.external.acceptors=64
my_emqx  | log.level=debug
my_emqx  | mqtt.max_packet_size=10MB
my_emqx  | listener.ssl.external.certfile=etc/certs/server.pem
my_emqx  | listener.ssl.external.fail_if_no_peer_cert=true
my_emqx  | allow_anonymous=false
my_emqx  | listener.ssl.external.acceptors=32
my_emqx  | listener.ssl.external.keyfile=etc/certs/server.key
my_emqx  | node.process_limit=2097152
my_emqx  | node.max_ets_tables=2097152
my_emqx  | listener.ws.external.acceptors=16
my_emqx  | listener.ssl.external.cacertfile=etc/certs/ca.pem
my_emqx  | listener.ssl.external.reuse_sessions=on
my_emqx  | [email protected]
my_emqx  | listener.ssl.external.tls_versions=tlsv1.2,tlsv1.1,tlsv1
my_emqx  | acl_nomatch=deny
my_emqx  | EMQX_LOADED_PLUGINS="emqx_management emqx_recon emqx_retainer emqx_dashboard emqx_auth_username"
my_emqx  | emqx v3.2.1 is started successfully!
my_emqx  |               {mfargs,{prometheus_histogram,start_link,[]}},
my_emqx  |               {restart_type,permanent},
my_emqx  |               {shutdown,5000},
my_emqx  |               {child_type,worker}]
my_emqx  | 2019-07-31 16:22:34.837 [info] application: prometheus
my_emqx  |     started_at: '[email protected]'
my_emqx  | 2019-07-31 16:22:34.843 [info] application: luerl
my_emqx  |     started_at: '[email protected]'
my_emqx  | 2019-07-31 16:22:34.843 [info] application: xmerl
my_emqx  |     started_at: '[email protected]'
my_emqx  | ['2019-07-31T16:22:40Z']:emqx start

I repeat the first attempted steps until I am ready to connect via SSL. As of now, everything is the same as the first step.

Now I run the following command in another session window to manually copy the SSL certificates into the container.

docker cp  /usr/local/src/my-project/docker/config/emqx/certs my_emqx:/opt/emqx/etc/

Now I connect to the emqx instance via SSL. The connection is successful, here is the log

my_emqx  | 2019-07-31 16:25:42.195 [debug] 192.168.50.3:61869 [Channel] RECV <<16,67,0,4,77,81,84,84,4,194,0,60,0,14,77,81,84,84,95,70,88,95,
my_emqx  |                  67,108,105,101,110,116,0,21,114,111,121,98,105,45,109,113,116,
my_emqx  |                  116,64,110,111,100,101,115,101,114,118,101,114,0,16,120,56,77,
my_emqx  |                  80,69,107,81,86,97,53,70,55,51,115,114,81>>
my_emqx  | 2019-07-31 16:25:42.195 [debug] 192.168.50.3:61869 [Protocol] RECV CONNECT(Q0, R0, D0, ClientId=MQTT_FX_Client, ProtoName=MQTT, ProtoVsn=4, CleanStart=true, KeepAlive=60, Username=u1, Password=******)
my_emqx  | 2019-07-31 16:25:42.196 [debug] [email protected]:61869 [Protocol] SEND CONNACK(Q0, R0, D0, AckFlags=0, ReasonCode=0)

EMQ version

v3.2.1

Docker version

Which docker-engine version?

Docker version 19.03.1, build 74b1e89

How docker info?

Client:
 Debug Mode: false

Server:
 Containers: 1
  Running: 1
  Paused: 0
  Stopped: 0
 Images: 49
 Server Version: 19.03.1
 Storage Driver: overlay2
  Backing Filesystem: xfs
  Supports d_type: true
  Native Overlay Diff: true
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 894b81a4b802e4eb2a91d1ce216b8817763c29fb
 runc version: 425e105d5a03fabd737a126ad93d62a9eeede87f
 init version: fec3683
 Security Options:
  seccomp
   Profile: default
 Kernel Version: 3.10.0-957.5.1.el7.x86_64
 Operating System: CentOS Linux 7 (Core)
 OSType: linux
 Architecture: x86_64
 CPUs: 2
 Total Memory: 1.795GiB
 Name: localhost.localdomain
 ID: JGSM:QO5H:TDQV:QSLA:UZCT:DZH4:GCBC:QWR5:TB5S:S5KU:NPJE:66ZR
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 HTTP Proxy: http://192.168.50.3:8888
 HTTPS Proxy: http://192.168.50.3:8888
 No Proxy: localhost,127.0.0.1
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

WARNING: bridge-nf-call-iptables is disabled
WARNING: bridge-nf-call-ip6tables is disabled

System

What system do you use?

[vagrant@localhost ~]$ cat /etc/redhat-release
CentOS Linux release 7.6.1810 (Core)
[vagrant@localhost ~]$ cat /proc/version
Linux version 3.10.0-957.5.1.el7.x86_64 ([email protected]) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-36) (GCC) ) #1 SMP Fri Feb 1 14:54:57 UTC 2019

Hardware

How is the host machine?

The host is a vagrant virtual machine created specifically to test the problem, running on VirtualBox with dual-core CPU and 2 GB of memory. There are no other programs running. This problem also exists on my production server on AWS. I can't I did the test on the production server, so I deployed the test environment. I think the problem has nothing to do with the host environment, so I didn't attach the Vagrantfile, but I can attach it if you need it.

@VincentSit
Copy link
Author

Thank you for your attention. I need to point out that the owner of all files in the etc directory is emqx:emqx, the permissions of the directory where the host mapping file is located are not root.

@Rory-Z
Copy link
Contributor

Rory-Z commented Aug 8, 2019

@VincentSit Sorry for the late reply. First, the error of add_user is our problem, and we will fix it later. Second, regarding the SSH connection error, can you show us your client configuration?

@VincentSit
Copy link
Author

@zhanghongtong Thank you for your response, here is the configuration.

06D676E8877E92C04A2A794442C3BA3E
970EF3394B824D1A63558EB802E5553E

@Rory-Z
Copy link
Contributor

Rory-Z commented Aug 8, 2019

@VincentSit We are trying to reproduce your issue. I have a question about docker:

I repeat the first attempted steps until I am ready to connect via SSL. As of now, everything is the same as the first step.
Now I run the following command in another session window to manually copy the SSL certificates into the container.
docker cp /usr/local/src/my-project/docker/config/emqx/certs my_emqx:/opt/emqx/etc/

Did you restart EMQX after you executed docker cp ?

@VincentSit
Copy link
Author

@zhanghongtong Sorry for the late reply. I didn't restart the EMQX, all operations including logs are as above.

@Rory-Z
Copy link
Contributor

Rory-Z commented Aug 8, 2019

@VincentSit I'm sorry we didn't reproduce the problem. We create the container from docker-compose.yaml. It's all right
Anything missing?

@VincentSit
Copy link
Author

I'm not too sure either. I'll pack the whole test environment for you. But it may be a little late to have time to do it. Thank you!

@VincentSit
Copy link
Author

The complete step is like above, And I did a test before opening the issue to make sure it could reproduce the problem. I don't know why you can't reproduce it, please wait for me to pack a whole test environment for you.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

3 participants