The performance of SOCKET Epoll on the lo loopback interface is poor. #4393

jifengzhou · 2024-07-16T12:44:50Z

jifengzhou
Jul 16, 2024

I have a cluster consisting of two servers, A and B. Each server has an NVMe disk of the same model. Two volumes, a and b, are created using these disks, each with a single replica. The details of the volumes are as follows:
Volume Name: a
Type: Distribute
Volume ID: df6375c8-87ee-402d-8ff8-9a6db5af08ca
Status: Started
Snapshot Count: 0
Number of Bricks: 1
Transport-type: tcp
Bricks:
Brick1: node2:/mnt/test2/brick
Options Reconfigured:
client.event-threads: 32
diagnostics.client-log-level: INFO
nfs.disable: on
transport.address-family: inet
storage.fips-mode-rchecksum: on
performance.write-behind: off
performance.iot-watchdog-debug: on
server.event-threads: 10

Volume Name: b
Type: Distribute
Volume ID: 655fe125-a265-40f3-996b-a408feaecef7
Status: Started
Snapshot Count: 0
Number of Bricks: 1
Transport-type: tcp
Bricks:
Brick1: node3:/mnt/test3/brick
Options Reconfigured:
client.event-threads: 32
server.event-threads: 10
diagnostics.client-log-level: INFO
performance.write-behind: off
storage.fips-mode-rchecksum: on
transport.address-family: inet
nfs.disable: on

[root@B ~]# mount | grep test3
/dev/nvme0n1 on /mnt/test3 type xfs (rw,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota)

[root@B ~]# ssh A mount | grep test2
/dev/nvme0n1 on /mnt/test2 type xfs (rw,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota)

Both volumes a and b are mounted on server B. Performance testing using fio shows IOPS of 46k for volume a and 36k for volume b. Initially, I suspected that the performance of volume b was worse because its replica service was on server B, leading to a higher load on B. However, further testing and analysis showed this was not the case.

I mounted both volumes a and b on host A and tested the mount points on host A using the fio program. Simultaneously, I used fio to test the mount points on host B, ensuring that the CPU load on both servers, A and B, was identical. The test results showed that volume b performed better than volume a on host A, and volume a performed better than volume b on host B. This result was inconsistent with my previous expectations. I originally thought that, in the absence of performance bottlenecks in the replica service, the fio performance for volume a on host A should be higher than for volume b, since volume a's data does not travel over the external network but communicates via the lo loopback interface, which should have lower latency. However, by examining the glusterdump data, I found that the WRITE latency for test2-client-0.latency.WRITE on volume a's replica on host A was higher than the WRITE latency for test3-client-0.latency.WRITE on volume b.

Therefore, a basic conclusion that can be drawn from this test result is that the performance of the local lo loopback interface is lower than the performance across hosts.

I then analyzed and tracked where the high latency occurred in the source code. By comparing the differences, I found that the processing time for the local replica was significantly longer in the following code path: epoll_wait -> event_dispatch_epoll_handler -> socket_event_handler -> socket_event_poll_in -> socket_proto_state_machine. Due to GlusterFS socket epoll using the EPOLLONESHOT feature, this code path cannot be executed in parallel. Because the execution time for this code path is too long for the local replica, increasing the number of client.event-threads does not provide any performance improvement.

Questions and Requests for Help
1、How can I optimize the performance of the lo loopback interface to make Gluster clients interact with local replica services with the same performance as cross-host replica interactions?
2、Are there alternative methods?
3、If there are no better solutions, I plan to modify Gluster to use local socket files for communication between the client and local replica services. Is this approach feasible?

mykaul · 2024-07-16T15:20:43Z

mykaul
Jul 16, 2024
Collaborator

There was a past attempt to communicate over unix sockets. I think it yielded about 5% improvement - and was not merged.

2 replies

jifengzhou Jul 17, 2024
Author

Can you send me the commit link of the code that was submitted using unix sockets?

mykaul Jul 17, 2024
Collaborator

https://review.gluster.org/#/c/glusterfs/+/12709/

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The performance of SOCKET Epoll on the lo loopback interface is poor. #4393

{{title}}

Replies: 1 comment 2 replies

{{title}}

{{title}}

{{title}}

Select a reply

The performance of SOCKET Epoll on the lo loopback interface is poor. #4393

jifengzhou Jul 16, 2024

Replies: 1 comment · 2 replies

mykaul Jul 16, 2024 Collaborator

jifengzhou Jul 17, 2024 Author

mykaul Jul 17, 2024 Collaborator

jifengzhou
Jul 16, 2024

Replies: 1 comment 2 replies

mykaul
Jul 16, 2024
Collaborator

jifengzhou Jul 17, 2024
Author

mykaul Jul 17, 2024
Collaborator