The performance of SOCKET Epoll on the lo loopback interface is poor. #4393
Unanswered
jifengzhou
asked this question in
Q&A
Replies: 1 comment 2 replies
-
There was a past attempt to communicate over unix sockets. I think it yielded about 5% improvement - and was not merged. |
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I have a cluster consisting of two servers, A and B. Each server has an NVMe disk of the same model. Two volumes, a and b, are created using these disks, each with a single replica. The details of the volumes are as follows:
Volume Name: a
Type: Distribute
Volume ID: df6375c8-87ee-402d-8ff8-9a6db5af08ca
Status: Started
Snapshot Count: 0
Number of Bricks: 1
Transport-type: tcp
Bricks:
Brick1: node2:/mnt/test2/brick
Options Reconfigured:
client.event-threads: 32
diagnostics.client-log-level: INFO
nfs.disable: on
transport.address-family: inet
storage.fips-mode-rchecksum: on
performance.write-behind: off
performance.iot-watchdog-debug: on
server.event-threads: 10
Volume Name: b
Type: Distribute
Volume ID: 655fe125-a265-40f3-996b-a408feaecef7
Status: Started
Snapshot Count: 0
Number of Bricks: 1
Transport-type: tcp
Bricks:
Brick1: node3:/mnt/test3/brick
Options Reconfigured:
client.event-threads: 32
server.event-threads: 10
diagnostics.client-log-level: INFO
performance.write-behind: off
storage.fips-mode-rchecksum: on
transport.address-family: inet
nfs.disable: on
[root@B ~]# mount | grep test3
/dev/nvme0n1 on /mnt/test3 type xfs (rw,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota)
[root@B ~]# ssh A mount | grep test2
/dev/nvme0n1 on /mnt/test2 type xfs (rw,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota)
Both volumes a and b are mounted on server B. Performance testing using fio shows IOPS of 46k for volume a and 36k for volume b. Initially, I suspected that the performance of volume b was worse because its replica service was on server B, leading to a higher load on B. However, further testing and analysis showed this was not the case.
I mounted both volumes a and b on host A and tested the mount points on host A using the fio program. Simultaneously, I used fio to test the mount points on host B, ensuring that the CPU load on both servers, A and B, was identical. The test results showed that volume b performed better than volume a on host A, and volume a performed better than volume b on host B. This result was inconsistent with my previous expectations. I originally thought that, in the absence of performance bottlenecks in the replica service, the fio performance for volume a on host A should be higher than for volume b, since volume a's data does not travel over the external network but communicates via the lo loopback interface, which should have lower latency. However, by examining the glusterdump data, I found that the WRITE latency for test2-client-0.latency.WRITE on volume a's replica on host A was higher than the WRITE latency for test3-client-0.latency.WRITE on volume b.
Therefore, a basic conclusion that can be drawn from this test result is that the performance of the local lo loopback interface is lower than the performance across hosts.
I then analyzed and tracked where the high latency occurred in the source code. By comparing the differences, I found that the processing time for the local replica was significantly longer in the following code path: epoll_wait -> event_dispatch_epoll_handler -> socket_event_handler -> socket_event_poll_in -> socket_proto_state_machine. Due to GlusterFS socket epoll using the EPOLLONESHOT feature, this code path cannot be executed in parallel. Because the execution time for this code path is too long for the local replica, increasing the number of client.event-threads does not provide any performance improvement.
Questions and Requests for Help
1、How can I optimize the performance of the lo loopback interface to make Gluster clients interact with local replica services with the same performance as cross-host replica interactions?
2、Are there alternative methods?
3、If there are no better solutions, I plan to modify Gluster to use local socket files for communication between the client and local replica services. Is this approach feasible?
Beta Was this translation helpful? Give feedback.
All reactions