ceph-iscsi / tcmu-runner bad pefromance with vmware esxi #246

lightmans2 · 2021-09-23T13:01:35Z

Hello together,

i need some help on our ceph 16.2.5 cluster as iscsi target with esxi nodes

background infos:

we have build 3x osd nodes with 60 bluestore osd with and 60x6TB spinning disks, 12 ssd´s and 3nvme.
osd nodes have 32cores and 256gb Ram
the osd disk are connected to a scsi raid controller ... each disk is configured as raid0 and with write back enabled to use the raid controller cache etc.
we have 3x mons and 2x iscsi gateways
all servers are connected on a 10Gbit network (switches)
all servers have two 10gbit network adapter configured as bond-rr
we created one rbd pool with autoscaling and 128pg (at the moment)
in the pool are at the moment 5 rbd images... 2x 10tb and 3x500gb with feature exlusic lock and striping v2 (4mb obj / 1mb stipe / count 4)
All the images are attached to the two iscsi gateays running tcmu-runner 1.5.4 and exposed as iscsi target
we have 6 esxi 6.7u3 servers as computed node connected to the ceph iscsi target

esxi iscsi config:
esxcli system settings advanced set -o /ISCSI/MaxIoSizeKB -i 512
esxcli system module parameters set -m iscsi_vmk -p iscsivmk_LunQDepth=64
esxcli system module parameters set -m iscsi_vmk -p iscsivmk_HostQDepth=64
esxcli system settings advanced set --int-value 1 --option /DataMover/HardwareAcceleratedMove

the osd nodes, mons, rgw/iscsi gateways and esxi nodes are all connected to the 10gbit network with bond-rr

rbd benchmark test:

root@cd133-ceph-osdh-01:~# rados bench -p rbd 10 write
hints = 1
Maintaining 16 concurrent writes of 4194304 bytes to objects of size 4194304 for up to 10 seconds or 0 objects
Object prefix: benchmark_data_cd133-ceph-osdh-01_87894
  sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg lat(s)
    0       0         0         0         0         0           -           0
    1      16        69        53   211.987       212    0.250578    0.249261
    2      16       129       113   225.976       240    0.296519    0.266439
    3      16       183       167   222.641       216    0.219422    0.273838
    4      16       237       221   220.974       216    0.469045     0.28091
    5      16       292       276   220.773       220    0.249321     0.27565
    6      16       339       323   215.307       188    0.205553     0.28624
    7      16       390       374   213.688       204    0.188404    0.290426
    8      16       457       441   220.472       268    0.181254    0.286525
    9      16       509       493   219.083       208    0.250538    0.286832
   10      16       568       552   220.772       236    0.307829    0.286076
Total time run:         10.2833
Total writes made:      568
Write size:             4194304
Object size:            4194304
Bandwidth (MB/sec):     220.941
Stddev Bandwidth:       22.295
Max bandwidth (MB/sec): 268
Min bandwidth (MB/sec): 188
Average IOPS:           55
Stddev IOPS:            5.57375
Max IOPS:               67
Min IOPS:               47
Average Latency(s):     0.285903
Stddev Latency(s):      0.115162
Max latency(s):         0.88187
Min latency(s):         0.119276
Cleaning up (deleting benchmark objects)
Removed 568 objects
Clean up completed and total clean up time :3.18627

the rbd benchmark says that min 250 mb/s is possible... but i saw realy much more... up to 550mb/s

if i start iftop on one osd node i see the ceph iscsi gw names as rgw and the traffic is nearly 80mb/s

the ceph dashboard shows that the write iscsi performance are only 40mb/s
the max value i saw was between 40 and 60mb/s.. very poor

if i look into the vcenter and esxi datastore performance i see very high storage device latencys between 50 and 100ms... very bad

root@cd133-ceph-mon-01:/home/cephadm# ceph config dump
WHO                                               MASK       LEVEL     OPTION                                       VALUE                                                                                        RO
global                                                       basic     container_image                              docker.io/ceph/ceph@sha256:829ebf54704f2d827de00913b171e5da741aad9b53c1f35ad59251524790eceb  *
global                                                       advanced  journal_max_write_bytes                      1073714824
global                                                       advanced  journal_max_write_entries                    10000
global                                                       advanced  mon_osd_cache_size                           1024
global                                                       dev       osd_client_watch_timeout                     15
global                                                       dev       osd_heartbeat_interval                       5
global                                                       advanced  osd_map_cache_size                           128
global                                                       advanced  osd_max_write_size                           512
global                                                       advanced  rados_osd_op_timeout                         5
global                                                       advanced  rbd_cache_max_dirty                          134217728
global                                                       advanced  rbd_cache_max_dirty_age                      5.000000
global                                                       advanced  rbd_cache_size                               268435456
global                                                       advanced  rbd_op_threads                               2
  mon                                                        advanced  auth_allow_insecure_global_id_reclaim        false
  mon                                                        advanced  cluster_network                              10.50.50.0/24                                                                                *
  mon                                                        advanced  public_network                               10.50.50.0/24                                                                                *
  mgr                                                        advanced  mgr/cephadm/container_init                   True                                                                                         *
  mgr                                                        advanced  mgr/cephadm/device_enhanced_scan             true                                                                                         *
  mgr                                                        advanced  mgr/cephadm/migration_current                2                                                                                            *
  mgr                                                        advanced  mgr/cephadm/warn_on_stray_daemons            false                                                                                        *
  mgr                                                        advanced  mgr/cephadm/warn_on_stray_hosts              false                                                                                        *
  mgr                                                        advanced  mgr/dashboard/10.50.50.21/server_addr                                                                                                     *
  mgr                                                        advanced  mgr/dashboard/ALERTMANAGER_API_HOST          http://10.221.133.161:9093                                                                   *
  mgr                                                        advanced  mgr/dashboard/GRAFANA_API_SSL_VERIFY         false                                                                                        *
  mgr                                                        advanced  mgr/dashboard/GRAFANA_API_URL                https://10.221.133.161:3000                                                                  *
  mgr                                                        advanced  mgr/dashboard/ISCSI_API_SSL_VERIFICATION     true                                                                                         *
  mgr                                                        advanced  mgr/dashboard/NAME/server_port               80                                                                                           *
  mgr                                                        advanced  mgr/dashboard/PROMETHEUS_API_HOST            http://10.221.133.161:9095                                                                   *
  mgr                                                        advanced  mgr/dashboard/PROMETHEUS_API_SSL_VERIFY      false                                                                                        *
  mgr                                                        advanced  mgr/dashboard/RGW_API_ACCESS_KEY             W8VEKVFDK1RH5IH2Q3GN                                                                         *
  mgr                                                        advanced  mgr/dashboard/RGW_API_SECRET_KEY             IkIjmjfh3bMLrPOlAFbMfpigSIALAQoKGEHzZgxv                                                     *
  mgr                                                        advanced  mgr/dashboard/camdatadash/server_addr        10.251.133.161                                                                               *
  mgr                                                        advanced  mgr/dashboard/camdatadash/ssl_server_port    8443                                                                                         *
  mgr                                                        advanced  mgr/dashboard/cd133-ceph-mon-01/server_addr                                                                                               *
  mgr                                                        advanced  mgr/dashboard/dasboard/server_port           80                                                                                           *
  mgr                                                        advanced  mgr/dashboard/dashboard/server_addr          10.251.133.161                                                                               *
  mgr                                                        advanced  mgr/dashboard/dashboard/ssl_server_port      8443                                                                                         *
  mgr                                                        advanced  mgr/dashboard/server_addr                    0.0.0.0                                                                                      *
  mgr                                                        advanced  mgr/dashboard/server_port                    8080                                                                                         *
  mgr                                                        advanced  mgr/dashboard/ssl                            false                                                                                        *
  mgr                                                        advanced  mgr/dashboard/ssl_server_port                8443                                                                                         *
  mgr                                                        advanced  mgr/orchestrator/orchestrator                cephadm
  mgr                                                        advanced  mgr/prometheus/server_addr                   0.0.0.0                                                                                      *
  mgr                                                        advanced  mgr/telemetry/channel_ident                  true                                                                                         *
  mgr                                                        advanced  mgr/telemetry/contact                        [email protected]                                                                                *
  mgr                                                        advanced  mgr/telemetry/description                    ceph cluster                                                                         *
  mgr                                                        advanced  mgr/telemetry/enabled                        true                                                                                         *
  mgr                                                        advanced  mgr/telemetry/last_opt_revision              3                                                                                            *
  osd                                                        dev       bluestore_cache_autotune                     false
  osd                                             class:ssd  dev       bluestore_cache_autotune                     false
  osd                                                        dev       bluestore_cache_size                         4000000000
  osd                                             class:ssd  dev       bluestore_cache_size                         4000000000
  osd                                                        dev       bluestore_cache_size_hdd                     4000000000
  osd                                                        dev       bluestore_cache_size_ssd                     4000000000
  osd                                             class:ssd  dev       bluestore_cache_size_ssd                     4000000000
  osd                                                        advanced  bluestore_default_buffered_write             true
  osd                                             class:ssd  advanced  bluestore_default_buffered_write             true
  osd                                                        advanced  osd_max_backfills                            1
  osd                                             class:ssd  dev       osd_memory_cache_min                         4000000000
  osd                                             class:hdd  basic     osd_memory_target                            6000000000
  osd                                             class:ssd  basic     osd_memory_target                            6000000000
  osd                                                        advanced  osd_recovery_max_active                      3
  osd                                                        advanced  osd_recovery_max_single_start                1
  osd                                                        advanced  osd_recovery_sleep                           0.000000
    client.rgw.ceph-rgw.cd133-ceph-rgw-01.klvrwk             basic     rgw_frontends                                beast port=8000                                                                              *
    client.rgw.ceph-rgw.cd133-ceph-rgw-01.ptmqcm             basic     rgw_frontends                                beast port=8001                                                                              *
    client.rgw.ceph-rgw.cd88-ceph-rgw-01.czajah              basic     rgw_frontends                                beast port=8000                                                                              *
    client.rgw.ceph-rgw.cd88-ceph-rgw-01.pdknfg              basic     rgw_frontends                                beast port=8000                                                                              *
    client.rgw.ceph-rgw.cd88-ceph-rgw-01.qkdlfl              basic     rgw_frontends                                beast port=8001                                                                              *
    client.rgw.ceph-rgw.cd88-ceph-rgw-01.tdsxpb              basic     rgw_frontends                                beast port=8001                                                                              *
    client.rgw.ceph-rgw.cd88-ceph-rgw-01.xnadfr              basic     rgw_frontends                                beast port=8001                                                                              *

can somebody explain me what i am doing wrong or what can i do to get a better performance with ceph-iscsi?
doesnt matter what i do or what i tweak the write performance will not get better.

i already experimented with gwcli and the iscsi queue and other settings.
actually i set:
hw_max_sectors 8192
max_data_area_mb 32
cmdsn_depth 64 / the esxi nodes are alredy set fixed to 64 max iscsi commands

everything is fine and multipathing is workind and the recovery is fast ... but the iscsi very slow and i dont know why.
can somebody help me maybe?

The text was updated successfully, but these errors were encountered:

breeze-cool · 2021-11-24T00:50:31Z

Try to turn off multipath or turn off the feature exclusive lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ceph-iscsi / tcmu-runner bad pefromance with vmware esxi #246

ceph-iscsi / tcmu-runner bad pefromance with vmware esxi #246

lightmans2 commented Sep 23, 2021 •

edited

Loading

breeze-cool commented Nov 24, 2021

ceph-iscsi / tcmu-runner bad pefromance with vmware esxi #246

ceph-iscsi / tcmu-runner bad pefromance with vmware esxi #246

Comments

lightmans2 commented Sep 23, 2021 • edited Loading

breeze-cool commented Nov 24, 2021

lightmans2 commented Sep 23, 2021 •

edited

Loading