Skip to content

Commit

Permalink
Merge pull request #522 from ywc689/performance
Browse files Browse the repository at this point in the history
Performance
  • Loading branch information
ywc689 authored Feb 11, 2020
2 parents 837ab75 + 4069794 commit a5a3e8b
Show file tree
Hide file tree
Showing 159 changed files with 5,869 additions and 162,236 deletions.
16 changes: 0 additions & 16 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -11,22 +11,6 @@ cscope*
filenametags
build/
bin/
tools/ipvsadm/ipvsadm
tools/keepalived/Makefile
tools/keepalived/bin/genhash
tools/keepalived/bin/keepalived
tools/keepalived/config.status
tools/keepalived/genhash/Makefile
tools/keepalived/keepalived.spec
tools/keepalived/keepalived/Makefile
tools/keepalived/keepalived/check/Makefile
tools/keepalived/keepalived/core/Makefile
tools/keepalived/keepalived/libipvs-2.6/Makefile
tools/keepalived/keepalived/libipvs-2.6/libipvs.a
tools/keepalived/keepalived/vrrp/Makefile
tools/keepalived/lib/Makefile
tools/keepalived/lib/config.h
tools/keepalived/install-sh
src/dpvs
.cache.mk
.tmp_versions/
Expand Down
4 changes: 3 additions & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,9 @@ clean:
for i in $(SUBDIRS); do $(MAKE) -C $$i clean || exit 1; done

distclean:
$(MAKE) -C tools/keepalived distclean || exit 1
$(MAKE) -C tools/keepalived distclean || true
-rm -f tools/keepalived/configure
-rm -f tools/keepalived/Makefile

install:all
-mkdir -p $(INSDIR)
Expand Down
2 changes: 1 addition & 1 deletion conf/dpvs.conf.items
Original file line number Diff line number Diff line change
Expand Up @@ -138,7 +138,7 @@ timer_defs {
! dpvs neighbor config
neigh_defs {
<init> unres_queue_length 128 <128, 16-8192>
<init> timeout 60 <60, 1-3600>
timeout 60 <60, 1-3600>
}

! dpvs ipv4 config
Expand Down
2 changes: 1 addition & 1 deletion conf/dpvs.conf.sample
Original file line number Diff line number Diff line change
Expand Up @@ -223,7 +223,7 @@ timer_defs {
! dpvs neighbor config
neigh_defs {
<init> unres_queue_length 128
<init> timeout 60
timeout 60
}

! dpvs ipv4 config
Expand Down
2 changes: 1 addition & 1 deletion conf/dpvs.conf.single-bond.sample
Original file line number Diff line number Diff line change
Expand Up @@ -175,7 +175,7 @@ timer_defs {
! dpvs neighbor config
neigh_defs {
<init> unres_queue_length 128
<init> timeout 60
timeout 60
}

! dpvs ipv4 config
Expand Down
2 changes: 1 addition & 1 deletion conf/dpvs.conf.single-nic.sample
Original file line number Diff line number Diff line change
Expand Up @@ -148,7 +148,7 @@ timer_defs {
! dpvs neighbor config
neigh_defs {
<init> unres_queue_length 128
<init> timeout 60
timeout 60
}

! dpvs ipv4 config
Expand Down
284 changes: 284 additions & 0 deletions doc/Worker-Performance-Tuning.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,284 @@
# DPVS Worker Performance Tuning

This doc should tell you how to achieve best performance by tunning CPU cores on which DPVS process is running.

### About DPVS Workers

DPVS is a multi-thread DPDK application program. It is based on the "polling" framework, all the threads would get into an infinite loop to process various jobs registered for the thread during the initialization stage. We call each such DPVS thread to be a DPVS Worker. There exists three different DPVS Worker types now.

* **Master Worker**: in charge of all the jobs from the control plane;
* **Forwarding Worker**: the data plane worker in charge of packet receiving, processing, forwarding and all other jobs in the data plane;
* **Isolate Recieving Worker**: an optional worker used to take the responsibility of Forwarding Worker to receive packets to reduce NIC packets imiss.

As all other DPDK applications, each DPVS Worker is bound to a distinct CPU core to avoid they interfere with each other. By default, the first N CPUs of the system are bound with DPVS Workers. The performance may not good enough when many other work load are scheduled into these CPUs by the system. For example, CPU0, the first CPU core in the system, is generally a lot busier than other CPU cores, because many processes, interrupts, and kernel threads run on it by default. The following of this doc would tell you how to alleviate/offload irrelative work load on DPVS Workers.

### When do you need to consider this performance tuning?

In case of the following situations, you should consider this performance tuning.

* There are increasing imissed packets from NIC. You can get imiss data using the following command.

```
dpip link show -s
dpip link show [dpvs-nic-name] -s -i 2 -C
```

* There are frequent msg timeout from DPVS log.

> Another msg timeout reason is the configuration "ctrl_defs/lcore_msg/sync_msg_timeout_us" is too small. The default value for this configuration is 2000 us.
* There exists big worker loops.

> To observe worker loop time, you should uncomment the macro "CONFIG_RECORD_BIG_LOOP" in src/config.mk,recompile DPVS program and run it.
>
> Besides, macros "BIG_LOOP_THRESH_SLAVE" and "BIG_LOOP_THRESH_MASTER" define the threshold time of worker loop. Modify them if needed.
### Optimize work load on CPUs for DPVS Worker

**Optimize Kernel Command Line Parameters (Boot Options)**

```
default_hugepagesz=1G hugepagesz=1G hugepages=[nr-hugepages] isolcpus=[cpu-list] nohz_full=[cpu-list] nowatchdog nmi_watchdog=0 rcu_nocbs=[cpu-list] rcu_nocb_poll
```

Refer to [Linux kernel parameters document](https://www.kernel.org/doc/Documentation/admin-guide/kernel-parameters.txt) for detailed explanations of the above boot options.

Refer to [DPDK performance report](https://fast.dpdk.org/doc/perf/) for more BOOT and BIOS settings about high performance of DPDK applications.

Set the boot options and reboot your system,then you can check the hardware/software interrupts on each CPU from the system proc file `/proc/interrupts` and `/proc/softirqs`. Hopefully, the interrupts on the CPU specified by "cpu-list" would be reduced dramatically.

**How to choose CPU cores for DPVS Workers ?**

The [cpu-list] in the above kernel command line parameters is the list of CPU cores for DPVS Workers.

Generally speaking, we may follow some practical rules below to choose the CPU core list.

* Avoid using the first CPU core, including both HT(Hyper-Threading) CPU cores of the first physical CPU core.
* Turn off HT(Hyper-Threading) CPUs in the system if possible.
* Do not use boot HT CPU cores from one physical CPU, unless the workers are the "forwarding worker" and "isolate receiving worker" couples.

You can get the CPU layout of your system by the script provided by DPDK `cpu_layout.py `, example as shown below.

```
[root@~ dpdk]# python dpdk-stable-18.11.2/usertools/cpu_layout.py
======================================================================
Core and Socket Information (as reported by '/sys/devices/system/cpu')
======================================================================
cores = [0, 1, 2, 3, 4, 8, 9, 10, 11, 12]
sockets = [0, 1]
Socket 0 Socket 1
-------- --------
Core 0 [0] [10]
Core 1 [1] [11]
Core 2 [2] [12]
Core 3 [3] [13]
Core 4 [4] [14]
Core 8 [5] [15]
Core 9 [6] [16]
Core 10 [7] [17]
Core 11 [8] [18]
Core 12 [9] [19]
```

There are 20 HT CPU cores, which is derived from 10 physical CPU cores in my system. Following examples are based on this environment.

**Example #1: Run DPVS with 1 master worker and 4 forwarding worker**

* Master Worker: Cpu1

* Forwarding Worker: Cpu2 - Cpu5

* Boot Option Cpu List: Cpu1 - Cpu5, Cpu11 - Cpu15

Use DPDK EAL common option `--lcore` to map System CPU ID to DPVS Worker ID. Boot DPVS with the command:

```
./bin/dpvs -- --lcores 0@1,1@2,2@3,3@4,4@5
```

> The above command indicates DPVS to map the system's Cpu1 - Cpu5 into DPVS Worker 0 - 4, respectively.
Note:

1. DPVS Worker ID is specified in DPVS configuration file "worker_defs/work [name]/cpu_id".
2. EAL common option `-c COREMASK` or `-l CORELIST` is not supported by DPVS, because they conflict with DPVS Worker ID.

```
### dpvs.conf worker config
worker_defs {
<init> worker cpu0 {
type master
cpu_id 0 # DPVS Master Worker ID
}
<init> worker cpu1 {
type slave
cpu_id 1 # DPVS Forwarding Worker ID
port dpdk0 {
rx_queue_ids 0
tx_queue_ids 0
}
}
<init> worker cpu2 {
type slave
cpu_id 2 # DPVS Forwarding Worker ID
port dpdk0 {
rx_queue_ids 1
tx_queue_ids 1
}
}
<init> worker cpu3 {
type slave
cpu_id 3 # DPVS Forwarding Worker ID
port dpdk0 {
rx_queue_ids 2
tx_queue_ids 2
}
}
<init> worker cpu4 {
type slave
cpu_id 4 # DPVS Forwarding Worker ID
port dpdk0 {
rx_queue_ids 3
tx_queue_ids 3
}
}
}
```

Check CPU work load using the `top` command. The zero-idle CPUs are occupied by DPVS Workers.

```
top - 17:46:28 up 2 days, 21:29, 1 user, load average: 8.35, 14.57, 16.19
Tasks: 244 total, 2 running, 241 sleeping, 0 stopped, 1 zombie
%Cpu0 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu1 : 40.9 us, 59.1 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu2 :100.0 us, 0.0 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu3 :100.0 us, 0.0 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu4 :100.0 us, 0.0 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu5 :100.0 us, 0.0 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu6 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu7 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu8 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu9 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu10 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu11 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu12 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu13 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu14 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu15 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu16 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu17 : 0.3 us, 0.3 sy, 0.0 ni, 99.3 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu18 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu19 : 1.3 us, 0.3 sy, 0.0 ni, 98.3 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem : 65685352 total, 15709580 free, 48904136 used, 1071636 buff/cache
KiB Swap: 4194300 total, 4194300 free, 0 used. 16183592 avail Mem
```

**Example #2: Run DPVS with 1 master worker, 4 forwarding worker, 4 isolate receiving worker**

- Master Worker: Cpu3
- Forwarding Worker: Cpu1, Cpu2, Cpu4, Cpu5
- Isolate Receiving Worker: Cpu11, Cpu12, Cpu14, Cpu15

- Boot Option Cpu List: Cpu1 - Cpu5, Cpu11 - Cpu15

In this case, we change the Master Worker to the 3rd Worker(Worker ID is 2), just to explain how to specify DPVS Master Worker to any DPVS Worker as we want. Use DPDK EAL common option `--master-lcore` to specify the Master Worker ID, use DPDK EAL common option `--lcore` to map System CPU ID to DPVS Worker ID. Boot DPVS with the command:

```
./bin/dpvs -- --lcores 0@1,1@2,2@3,3@4,4@5,5@11,6@12,7@14,8@15 --master-lcore 2
```

> The above command indicates DPVS to map the system's Cpu1-Cpu5,Cpu11,Cpu12,Cpu14,Cpu15 into DPVS Worker ID 0-8, and use the 2nd worker as DPVS Master Worker, respectively. We skip Cpu13 because it resides in the same physical CPU core as Cpu3, on which the Master Worker is to run.
The DPVS worker configurations for this case shown as below. Note the configurations for the "cpu_id" fields.

```
### dpvs.conf worker config
worker_defs {
<init> worker cpu0 {
type master
cpu_id 2 # DPVS Master Worker ID
}
<init> worker cpu1 {
type slave
cpu_id 0 # DPVS Forwarding Worker ID
port dpdk0 {
rx_queue_ids 0
tx_queue_ids 0
isol_rx_cpu_ids 5 # DPVS Isolate Receiving Worker ID
isol_rxq_ring_sz 1048576
}
}
<init> worker cpu2 {
type slave
cpu_id 1 # DPVS Forwarding Worker ID
port dpdk0 {
rx_queue_ids 1
tx_queue_ids 1
isol_rx_cpu_ids 6 # DPVS Isolate Receiving Worker ID
isol_rxq_ring_sz 1048576
}
}
<init> worker cpu3 {
type slave
cpu_id 3 # DPVS Forwarding Worker ID
port dpdk0 {
rx_queue_ids 2
tx_queue_ids 2
isol_rx_cpu_ids 7 # DPVS Isolate Receiving Worker ID
isol_rxq_ring_sz 1048576
}
}
<init> worker cpu4 {
type slave
cpu_id 4 # DPVS Forwarding Worker ID
port dpdk0 {
rx_queue_ids 3
tx_queue_ids 3
isol_rx_cpu_ids 8 # DPVS Isolate Receiving Worker ID
isol_rxq_ring_sz 1048576
}
}
}
```

Check CPU work load using the `top` command. The zero-idle CPUs are occupied by DPVS Workers.

```
top - 19:38:15 up 2 days, 23:20, 1 user, load average: 6.19, 1.89, 0.75
Tasks: 249 total, 2 running, 246 sleeping, 0 stopped, 1 zombie
%Cpu0 : 0.0 us, 0.4 sy, 0.0 ni, 99.6 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu1 :100.0 us, 0.0 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu2 :100.0 us, 0.0 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu3 : 63.3 us, 36.7 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu4 :100.0 us, 0.0 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu5 :100.0 us, 0.0 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu6 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu7 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu8 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu9 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu10 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu11 :100.0 us, 0.0 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu12 :100.0 us, 0.0 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu13 :100.0 us, 0.0 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu14 :100.0 us, 0.0 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu15 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu16 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu17 : 0.0 us, 0.3 sy, 0.0 ni, 99.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu18 : 0.0 us, 0.3 sy, 0.0 ni, 99.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu19 : 1.0 us, 0.3 sy, 0.0 ni, 98.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem : 65685352 total, 15695352 free, 48908988 used, 1081012 buff/cache
KiB Swap: 4194300 total, 4194300 free, 0 used. 16171432 avail Mem
```

1 change: 0 additions & 1 deletion include/cfgfile.h
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,6 @@
#ifndef __CONFIG_H__
#define __CONFIG_H__

void try_reload(void);
int cfgfile_init(void);
int cfgfile_term(void);

Expand Down
4 changes: 4 additions & 0 deletions include/conf/README
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
Header files in "./tools/keepalived/keepalived/include/conf/"
is a mirror of "./include/conf/".

Always keep the header files of the two paths the same!
3 changes: 0 additions & 3 deletions include/common.h → include/conf/common.h
Original file line number Diff line number Diff line change
Expand Up @@ -142,9 +142,6 @@ ssize_t writen(int fd, const void *vptr, size_t n);
/* send "n" bytes to a descriptor */
ssize_t sendn(int fd, const void *vptr, size_t n, int flags);

/* get backtrace for the calling program */
int dpvs_backtrace(char *buf, int len);

static inline char *strupr(char *str) {
char *s;
for (s = str; *s != '\0'; s++)
Expand Down
Loading

0 comments on commit a5a3e8b

Please sign in to comment.