From 13ae3bc917ca1a4d3755da6c5854136d95fc8406 Mon Sep 17 00:00:00 2001 From: Souvik Sarkar Date: Thu, 13 Jul 2023 12:06:51 +0530 Subject: [PATCH] Vale checks for tuning guide Vale checks for tuning guide Vale checks for tuning guide Vale style checks for tuning guide Vale style checks for tuning guide Vale style checks for tuning guide Vale style checks for tuning guide Vale style checks for tuning guide Vale style checks for tuning guide Vale style checks for tuning guide Vale style checks for tuning guide Vale style checks for tuning guide Vale style checks for tuning guide Vale style checks for tuning guide Vale style checks for tuning guide Vale checks for tuning guide Vale checks for tuning guide Vale style checks for tuning guide Vale style checks for tuning guide Vale style checks for tuning guide Revisions based on Daria's review MInor fix after rebasing with main Revisedcontent based on Daria's second round of review Revisions based on Daria's comment --- xml/tuning_cgroups.xml | 42 ++++++----- xml/tuning_how.xml | 20 +++--- xml/tuning_kexec.xml | 75 ++++++++++---------- xml/tuning_kprobes.xml | 15 ++-- xml/tuning_logfiles.xml | 20 +++--- xml/tuning_memory.xml | 105 ++++++++++++++-------------- xml/tuning_network.xml | 22 +++--- xml/tuning_numactl.xml | 16 ++--- xml/tuning_oprofile.xml | 25 ++++--- xml/tuning_perf.xml | 26 ++++--- xml/tuning_power.xml | 36 +++++----- xml/tuning_ptp.xml | 16 ++--- xml/tuning_sapconf.xml | 34 +++++---- xml/tuning_storagescheduler.xml | 113 +++++++++++++++--------------- xml/tuning_systemd_coredump.xml | 13 ++-- xml/tuning_systemresources.xml | 28 ++++---- xml/tuning_systemtap.xml | 32 ++++----- xml/tuning_taskscheduler.xml | 68 +++++++++--------- xml/tuning_tracing.xml | 30 ++++---- xml/utilities.xml | 119 +++++++++++++++----------------- 20 files changed, 407 insertions(+), 448 deletions(-) diff --git a/xml/tuning_cgroups.xml b/xml/tuning_cgroups.xml index f876fe316d..6a458e1f5e 100644 --- a/xml/tuning_cgroups.xml +++ b/xml/tuning_cgroups.xml @@ -37,7 +37,7 @@ Every process is assigned exactly one administrative cgroup. cgroups are ordered in a hierarchical tree structure. You can set resource - limitations, such as CPU, memory, disk I/O, or network bandwidth usage, + limitations such as CPU, memory, disk I/O, or network bandwidth usage, for single processes or for whole branches of the hierarchy tree. @@ -52,9 +52,8 @@ - The kernel cgroup API comes in two variants, v1 and v2. Additionally, - there can be multiple cgroup hierarchies exposing different APIs. From - the numerous possible combinations, there are two practical choices: + The kernel cgroup API comes in two variants—v1 and v2. Additionally, + there can be multiple cgroup hierarchies exposing different APIs. From many possible combinations, there are two practical choices: @@ -98,7 +97,7 @@ - simpler handling of the single hierararchy + simpler handling of the single hierarchy @@ -108,8 +107,8 @@ To enable the unified control group hierarchy, append as a kernel command - line parameter to the &grub; boot loader. (Refer to - for more details about configuring &grub;.) + line parameter to the &grub; boot loader. For more details about configuring &grub;, refer to + . @@ -122,9 +121,9 @@ - The accounting has relatively small but non-zero overhead, whose impact - depends on the workload. Activating accounting for one unit will also - implicitly activate it for all units in the same slice, and for all its + The accounting has comparatively small but non-zero overhead, whose impact + depends on the workload. Activating accounting for one unit also + implicitly activates it for all units in the same slice, and for all its parent slices, and the units contained in them. @@ -215,7 +214,7 @@ TasksMax=infinity DefaultTasksMax=infinity infinity means having no limit. It is not a requirement - to change the default, but setting some limits may help to prevent system + to change the default, but setting certain limits may help to prevent system crashes from runaway processes. @@ -293,7 +292,7 @@ DefaultTasksMax=256 Default <literal>TasksMax</literal> limit on users - The default limit on users should be fairly high, because user sessions + The default limit on users should be high, because user sessions need more resources. Set your own default for any user by creating a new file, for example /etc/systemd/system/user-.slice.d/40-user-taskmask.conf. @@ -320,7 +319,7 @@ TasksMax=16284 How do you know what values to use? This varies according to your workloads, system resources, and other resource configurations. When your - TasksMax value is too low, you will see error messages + TasksMax value is too low, you may see error messages such as Failed to fork (Resources temporarily unavailable), Can't create thread to handle new connection, and Error: Function call 'fork' failed @@ -402,7 +401,7 @@ The throttling policy is implemented higher in the stack, therefore it does not require any additional adjustments. The proportional I/O control policies have two different implementations: the BFQ controller, and the cost-based model. -We describe the BFQ controller here. In order to exert its proportional +We describe the BFQ controller here. To exert its proportional implementation for a particular device, we must make sure that BFQ is the chosen scheduler. Check the current scheduler: @@ -418,9 +417,8 @@ Switch the scheduler to BFQ: You must specify the disk device (not a partition). The -optimal way to set this attribute is a udev rule specific to the device -(note that &productname; ships udev rules that already enable BFQ for rotational -disk drives). +optimal way to set this attribute is a udev rule specific to the device. &productname; ships udev +rules that already enable BFQ for rotational disk drives. @@ -444,7 +442,7 @@ r I/O is originating only from cgroups c and b. Even though c has a higher -weight, it will be treated with lower priority because it is level-competing +weight, it is treated with lower priority because it is level-competing with b. @@ -476,7 +474,7 @@ example: I/O control behavior and setting expectations The following list items describe I/O control behavior, and what you - should expect under various conditions. + should expect under different conditions. @@ -485,7 +483,7 @@ I/O control works best for direct I/O operations (bypassing page cache), the situations where the actual I/O is decoupled from the caller (typically writeback via page cache) may manifest variously. For example, delayed I/O control or even no observed I/O control (consider little bursts or competing -workloads that happen to never "meet", submitting I/O at the same time, and +workloads that happen to never meet, submitting I/O at the same time, and saturating the bandwidth). For these reasons, the resulting ratio of I/O throughputs does not strictly follow the ratio of configured weights. @@ -530,7 +528,7 @@ each other (but responsible resource design perhaps avoids that). The I/O device bandwidth is not the only shared resource on the I/O path. Global file system structures are involved, which is relevant -when I/O control is meant to guarantee certain bandwidth; it will not, and +when I/O control is meant to guarantee certain bandwidth; it does not, and it may even lead to priority inversion (prioritized cgroup waiting for a transaction of slower cgroup). @@ -539,7 +537,7 @@ transaction of slower cgroup). So far, we have been discussing only explicit I/O of file system data, but swap-in and swap-out can also be controlled. Although if such a need -arises, it usually points out to improperly provisioned memory (or memory limits). +arises, it points out to improperly provisioned memory (or memory limits). diff --git a/xml/tuning_how.xml b/xml/tuning_how.xml index a7f23690c6..bfdaf11768 100644 --- a/xml/tuning_how.xml +++ b/xml/tuning_how.xml @@ -19,7 +19,7 @@ and provides means to solve these problems. Before you start tuning your system, you should make sure you have ruled out common problems and have found the cause for the problem. You should also have a detailed plan on - how to tune the system, because applying random tuning tips often will + how to tune the system, because applying random tuning tips does not help and could make things worse. @@ -62,7 +62,7 @@ - Identify the subsystem(s) where the application is spending the most + Identify the subsystems where the application is spending the most time. @@ -106,7 +106,7 @@ Before starting to tuning a system, try to describe the problem as exactly as possible. A statement like The system is slow! is not a helpful problem description. For example, it could make a - difference whether the system speed needs to be improved in general or + difference whether the system speed needs to be generally improved, or only at peak times. @@ -115,8 +115,8 @@ otherwise you cannot verify if the tuning was a success or not. You should always be able to compare before and after. Which metrics to use depends on the scenario or - application you are looking into. Relevant Web server metrics, for - example, could be expressed in terms of: + application you are looking into. For example, relevant Web server metrics could be expressed in + terms of the following: @@ -140,7 +140,7 @@ Active users - The maximum number of users that can be downloading pages while still + The maximum number of users that can download pages while still receiving pages within an acceptable latency @@ -152,7 +152,7 @@ A performance problem often is caused by network or hardware problems, - bugs, or configuration issues. Make sure to rule out problems such as the + bugs or configuration issues. Make sure to rule out problems such as the ones listed below before attempting to tune your system: @@ -207,7 +207,7 @@ Finding the bottleneck - Finding the bottleneck very often is the hardest part when tuning a + Finding the bottleneck is the hardest part when tuning a system. &productname; offers many tools to help you with this task. See for detailed information on general system monitoring applications and log file analysis. If the @@ -220,7 +220,7 @@ Once you have collected the data, it needs to be analyzed. First, inspect if the server's hardware (memory, CPU, bus) and its I/O capacities (disk, network) are sufficient. If these basic conditions are met, the system - might benefit from tuning. + can benefit from tuning. @@ -233,7 +233,7 @@ impact. Each tuning activity should be measured over a sufficient time period to ensure you can do an analysis based on significant data. If you cannot measure a positive effect, do not make the change - permanent. Chances are that it might have a negative effect in the + permanent. Chances are that it can have a negative effect in the future. diff --git a/xml/tuning_kexec.xml b/xml/tuning_kexec.xml index c6361bbf0f..ece5a4fc23 100644 --- a/xml/tuning_kexec.xml +++ b/xml/tuning_kexec.xml @@ -142,13 +142,13 @@ &prompt.root;kexec KERNEL_IMAGE - This kernel will be booted automatically when the system crashes. + This kernel is booted automatically when the system crashes. - If you want to boot another kernel and preserve the data of the production + To boot another kernel and preserve the data of the production kernel when the system crashes, you need to reserve a dedicated area of the system memory. The production kernel never loads to this area because it must be always available. It is used for the capture kernel so that the @@ -163,7 +163,7 @@ - Note that this is not a parameter of the capture kernel. The capture kernel + This is not a parameter of the capture kernel. The capture kernel does not use &kexec;. @@ -176,7 +176,7 @@ To load the capture kernel, you need to include the kernel boot parameters. - Usually, the initial RAM file system is used for booting. You can specify it + In most cases, the initial RAM file system is used for booting. You can specify it with =FILENAME. With @@ -257,7 +257,7 @@ MaxHigh: 45824 the memory up to the 4 GB mark. Petr Tesarik says in bsc#948565 that High means "all memory". - This is not completely true in SLE 12 SP1 RC1, but fixed for + This is not true in SLE 12 SP1 RC1, but fixed for SLE 12 SP1 RC2. - sknorr, 2015-10-05 @@ -267,7 +267,7 @@ MaxHigh: 45824 - sknorr, 2015-10-05 SIZE_LOW is the amount of memory required by 32-bit-only devices. The - kernel will allocate 64M for DMA32 bounce buffers. If your server does + kernel allocates 64M for DMA32 bounce buffers. If your server does not have any 32-bit-only devices, everything should work with the default allocation of 72M for SIZE_LOW. A possible exception to this is on NUMA machines, which may make it appear that more @@ -487,8 +487,8 @@ MaxHigh: 45824 Depending on the number of available devices the calculated amount of memory specified by the kernel parameter may not be sufficient. Instead of increasing the value, you may alternatively - limit the amount of devices visible to the kernel. This will lower the - required amount of memory for the "crashkernel" setting. + limit the amount of devices visible to the kernel. This lowers the + required amount of memory for the crashkernel setting. @@ -500,11 +500,10 @@ MaxHigh: 45824 &prompt.sudo;cio_ignore -u -k cio_ignore=all,!da5d,!f500-f502 - When you run cio_ignore -u -k, the blacklist will - become active and replace any existing blacklist immediately. Unused + When you run cio_ignore -u -k, the blocklist becomes active and replaces any existing blocklist immediately. Unused devices are not being purged, so they still appear in the channel subsystem. But adding new channel devices (via CP ATTACH under z/VM or - dynamic I/O configuration change in LPAR) will treat them as blacklisted. + dynamic I/O configuration change in LPAR) treats them as blocked. To prevent this, preserve the original setting by running sudo cio_ignore -l first and reverting to that state after running cio_ignore -u -k. As an alternative, add the generated @@ -591,9 +590,8 @@ cio_ignore=all,!da5d,!f500-f502 - Note that firmware and the boot loader are not used when the system reboots - with &kexec;. Any changes you make to the boot loader configuration will be - ignored until the computer performs a hard reboot. + The firmware and the boot loader are not used when the system reboots + with &kexec;. Any changes you make to the boot loader configuration are ignored until the computer performs a hard reboot. @@ -632,9 +630,9 @@ cio_ignore=all,!da5d,!f500-f502 Target file system for &kdump; must be mounted during configuration When configuring &kdump;, you can specify a location to which the dumped - images will be saved (default: /var/crash). This + images are saved (default: /var/crash). This location must be mounted when configuring &kdump;, otherwise the - configuration will fail. + configuration fails. @@ -673,7 +671,7 @@ cio_ignore=all,!da5d,!f500-f502 You can edit the options in /etc/sysconfig/kdump. - Reading the comments will help you understand the meaning of individual + Reading the comments helps you understand the meaning of individual options. @@ -727,7 +725,7 @@ cio_ignore=all,!da5d,!f500-f502 The KDUMP_KEEP_OLD_DUMPS option controls the number of preserved kernel dumps (default is 5). Without compression, the size of - the dump can take up to the size of the physical RAM memory. Make sure you + the dump can take up to the size of the physical memory or RAM. Make sure you have sufficient space on the /var partition. @@ -780,7 +778,7 @@ cio_ignore=all,!da5d,!f500-f502 After hardware changes, set <guimenu>&kdump; memory</guimenu> values again If you have set up &kdump; on a computer and later decide to change the - amount of RAM or hard disks available to it, &yast; will continue to + amount of RAM or hard disks available to it, &yast; continues to display and use outdated memory values. @@ -827,7 +825,7 @@ cio_ignore=all,!da5d,!f500-f502 It is possible to specify a path for saving &kdump; dumps where other applications also save their dumps. When cleaning its old dump files, - &kdump; will safely ignore other applications' dump files. + &kdump; safely ignores other applications' dump files. @@ -862,7 +860,7 @@ cio_ignore=all,!da5d,!f500-f502 &kdump; must be able to authenticate to the target machine. Only public - key authentication is currently available. By default, &kdump; will use + key authentication is currently available. By default, &kdump; uses &rootuser;'s private key, but it is advisable to make a separate key for &kdump;. This can be done with ssh-keygen: @@ -911,7 +909,7 @@ cio_ignore=all,!da5d,!f500-f502 Secure Shell protocol (SSH) - Some other distributions use SSH to run some commands on the target + Some other distributions use SSH to run certain commands on the target host. &productname; can also use this method. The &kdump; user on the target host must have a login shell that can execute these commands: mkdir, dd and @@ -970,8 +968,8 @@ cio_ignore=all,!da5d,!f500-f502 - If you want to debug the Linux kernel, you need to install its debugging - information package in addition. Check if the package is installed on your + To debug the Linux kernel, install its debugging + information package, too. Check if the package is installed on your system with: @@ -1019,7 +1017,7 @@ cio_ignore=all,!da5d,!f500-f502 Kernel binary formats The Linux kernel comes in Executable and Linkable Format (ELF). This file - is usually called vmlinux and is directly generated in + is called vmlinux and is directly generated in the compilation process. Not all boot loaders support ELF binaries, especially on the &x86-64; architecture. The following solutions exist on different architectures supported by &productnamereg;. @@ -1114,7 +1112,7 @@ cio_ignore=all,!da5d,!f500-f502 Regardless of the computer on which you analyze the dump, the crash - utility will produce output similar to this: + utility produces output similar to this: &prompt.user;crash /boot/vmlinux-5.3.18-8-default.gz \ /var/crash/2020-04-23-11\:17/vmcore @@ -1213,14 +1211,14 @@ PID: 9446 TASK: ffff88003a57c3c0 CPU: 1 COMMAND: "bash" kmem, you can display details about the kernel memory usage. With vm, you can inspect the virtual memory of a process, even at the level of individual page mappings. The list of useful - commands is very long and many of these accept a wide range of options. + commands is long, and many of these accept a wide range of options. The commands that we mentioned reflect the functionality of the common Linux commands, such as ps and lsof. To find out the exact sequence of events with the debugger, you need to know how to use GDB and to have strong debugging skills. Both of these are - out of the scope of this document. In addition, you need to understand the + out of the scope of this document. Additionally, you need to understand the Linux kernel. Several useful reference information sources are given at the end of this document. @@ -1242,10 +1240,9 @@ PID: 9446 TASK: ffff88003a57c3c0 CPU: 1 COMMAND: "bash" You can change the directory for the kernel dumps with the option. Keep in mind that the size of kernel - dumps can be very large. &kdump; will refuse to save the dump if the free + dumps can be large. &kdump; refuses to save the dump if the free disk space, subtracted by the estimated dump size, drops below the value - specified by the option. Note that - understands the URL format + specified by the option. understands the URL format PROTOCOL://SPECIFICATION, where PROTOCOL is one of , , , or @@ -1256,19 +1253,19 @@ PID: 9446 TASK: ffff88003a57c3c0 CPU: 1 COMMAND: "bash" - Kernel dumps are usually huge and contain many pages that are not necessary + Kernel dumps are large and contain many pages that are not necessary for analysis. With option, you can omit such pages. The option understands numeric value between 0 and 31. If you - specify 0, the dump size will be largest. If you - specify 31, it will produce the smallest dump. + specify 0, the dump size is the largest. If you + specify 31, it produces the smallest dump. For a complete table of possible values, see the manual page of kdump (man 7 kdump). - Sometimes it is very useful to make the size of the kernel dump smaller. For - example, if you want to transfer the dump over the network, or if you need - to save some disk space in the dump directory. This can be done with + Sometimes it is useful to make the size of the kernel dump smaller. For + example, you can do so to transfer the dump over the network or to + save disk space in the dump directory. This can be done with set to compressed. The crash utility supports dynamic decompression of the compressed dumps. @@ -1279,7 +1276,7 @@ PID: 9446 TASK: ffff88003a57c3c0 CPU: 1 COMMAND: "bash" After making changes to the /etc/sysconfig/kdump file, you need to run systemctl restart kdump.service. - Otherwise, the changes will only take effect next time you reboot the + Otherwise, the changes only take effect next time you reboot the system. @@ -1363,7 +1360,7 @@ PID: 9446 TASK: ffff88003a57c3c0 CPU: 1 COMMAND: "bash" - A very comprehensive overview of the Linux kernel internals is given in + A comprehensive overview of the Linux kernel internals is given in Understanding the Linux Kernel by Daniel P. Bovet and Marco Cesati (ISBN 978-0-596-00565-8). diff --git a/xml/tuning_kprobes.xml b/xml/tuning_kprobes.xml index f2aea343f9..64c5982924 100644 --- a/xml/tuning_kprobes.xml +++ b/xml/tuning_kprobes.xml @@ -20,7 +20,7 @@ Kernel probes are a set of tools to collect Linux kernel debugging and - performance information. Developers and system administrators usually use + performance information. Developers and system administrators use them either to debug the kernel, or to find system performance bottlenecks. The reported data can then be used to tune the system for better performance. @@ -37,8 +37,8 @@ the exit function. The init function (such as register_kprobe()) registers one or more probes, while the exit function unregisters them. The registration function - defines where the probe will be inserted and - which handler will be called after the probe is hit. + defines where the probe is inserted and + which handler is called after the probe is hit. To register or unregister a group of probes at one time, you can use relevant register_<PROBE_TYPE>probes() @@ -157,7 +157,7 @@ control back to the control function. - In general, you can insert multiple probes on one function. Jprobe is, + Generally, you can insert multiple probes on one function. Jprobe is, however, limited to only one instance per function. @@ -176,7 +176,7 @@ Before you call register_kretprobe(), you need to set a maxactive argument, which specifies how many instances of the function can be probed at the same time. If - set too low, you will miss a certain number of probes. + set too low, a certain number of probes is missed. @@ -319,9 +319,8 @@ c03dedc5 r tcp_v4_rcv+0x0 &prompt.root;echo "1" > /sys/kernel/debug/kprobes/enabled - Note that this way you do not change the status of the probes. If a - probe is temporarily disabled, it will not be enabled automatically but - will remain in the [DISABLED] state after entering + With such operations, you do not change the status of the probes. If a + probe is temporarily disabled, it is not enabled automatically but remains in the [DISABLED] state after entering the latter command. diff --git a/xml/tuning_logfiles.xml b/xml/tuning_logfiles.xml index e3421810e0..acf827f409 100644 --- a/xml/tuning_logfiles.xml +++ b/xml/tuning_logfiles.xml @@ -184,7 +184,7 @@ https://tldp.org/LDP/LGNET/148/darin.html - Log messages of some boot scripts, for example the log of the DHCP + Log messages of certain boot scripts, for example, the log of the DHCP client. @@ -352,7 +352,7 @@ Is this a systemd-related change? Should this then be removed entirely? Log files under /var/log grow on a daily basis and - quickly become very large. logrotate is a tool that + quickly become large. logrotate is a tool that helps you manage log files and their growth. It allows automatic rotation, removal, compression, and mailing of log files. Log files can be handled periodically (daily, weekly, or monthly) or when exceeding a @@ -360,11 +360,11 @@ Is this a systemd-related change? Should this then be removed entirely? - logrotate is usually run daily by &systemd;, - and thus usually modifies log files only once a day. However, exceptions + logrotate is run daily by &systemd;, + and thus modifies log files only once a day. However, exceptions occur when a log file is modified because of its size, if logrotate is run multiple times a day, or if - is enabled. Use + is enabled. Use /var/lib/misc/logrotate.status to find out when a particular file was last rotated. @@ -821,8 +821,7 @@ FILES logwatch is a customizable, pluggable log-monitoring script. It parses system logs, extracts the important information and presents them in a human readable manner. To use - logwatch, install the - logwatch package. + logwatch, install the logwatch package. @@ -868,7 +867,7 @@ logwatch --service smartd --range 'between 5/5/2005 and 5/7/2005' \ logwatch can be customized to great detail. However, - the default configuration should usually be sufficient. The default + the default configuration should be sufficient. The default configuration files are located under /usr/share/logwatch/default.conf/. Never change them because they would get overwritten again with the next update. Rather @@ -931,8 +930,7 @@ logwatch --service smartd --range 'between 5/5/2005 and 5/7/2005' \ mailbox and will be notified about new mail messages upon login. - These messages can contain security relevant reports and incidents that might - require a quick response by the system administrator. To get notified about + These messages can contain security relevant reports and incidents that require a quick response by the system administrator. To get notified about these messages in a timely fashion, it is strongly recommended to forward these mails to a dedicated remote e-mail account that is regularly checked. @@ -1211,7 +1209,7 @@ $UDPServerRun PORT Memory usage - Memory allocations in general can be characterized as + Memory allocations can be characterized as pinned (also known as unreclaimable), reclaimable or swappable. @@ -244,13 +243,13 @@ Reducing kernel memory overheads - Kernel memory that is reclaimable (caches, described above) will be + Kernel memory that is reclaimable (caches, described above) is trimmed automatically during memory shortages. Most other kernel memory cannot be easily reduced but is a property of the workload given to the kernel. - Reducing the requirements of the user space workload will reduce the + Reducing the requirements of the user space workload reduces the kernel memory usage (fewer processes, fewer open files and sockets, etc.) @@ -271,10 +270,9 @@ Virtual memory manager (VM) tunable parameters - When tuning the VM it should be understood that some changes will - take time to affect the workload and take full effect. If the workload - changes throughout the day, it may behave very differently at different - times. A change that increases throughput under some conditions may + When tuning the VM, it should be understood that certain changes take time to affect the workload and take full effect. If the workload + changes throughout the day, it may behave differently at different + times. A change that increases throughput under certain conditions may decrease it under other conditions. @@ -293,19 +291,19 @@ Swap I/O tends to be much less efficient than other I/O. However, - some pagecache pages will be accessed much more frequently than less + certain pagecache pages are accessed much more frequently than less used anonymous memory. The right balance should be found here. If swap activity is observed during slowdowns, it may be worth reducing this parameter. If there is a lot of I/O activity and the amount of pagecache in the system is rather small, or if there are - large dormant applications running, increasing this value might + large dormant applications running, increasing this value can improve performance. - Note that the more data is swapped out, the longer the system will - take to swap data back in when it is needed. + The more data is swapped out, the longer the system + takes to swap data back in when it is needed. @@ -339,12 +337,11 @@ This controls the amount of memory that is kept free for use by special reserves including atomic allocations (those which cannot wait for reclaim). This should not normally be lowered - unless the system is being very carefully tuned for memory usage + unless the system is being carefully tuned for memory usage (normally useful for embedded rather than server applications). If page allocation failure messages and stack traces are frequently seen in logs, min_free_kbytes could be increased until the - errors disappear. There is no need for concern, if these messages are - very infrequent. The default value depends on the amount of RAM. + errors disappear. There is no need for concern if these messages are infrequent. The default value depends on the amount of RAM. @@ -412,7 +409,7 @@ This contains the amount of dirty memory at which - the background kernel flusher threads will start writeback. + the background kernel flusher threads start writeback. dirty_background_bytes is the counterpart of dirty_background_ratio. If one of them is set, the other one will automatically be read as 0. @@ -441,10 +438,10 @@ however the amount of dirty memory is in bytes as opposed to a percentage of reclaimable memory. Since both dirty_ratio and dirty_bytes - control the same tunable, if one of them is set, the other one will - automatically be read as 0. The minimum value allowed + control the same tunable, if one of them is set, the other one is + automatically read as 0. The minimum value allowed for dirty_bytes is two pages (in bytes); any value - lower than this limit will be ignored and the old configuration will be + lower than this limit is ignored and the old configuration will be retained. @@ -454,11 +451,10 @@ - Data which has been dirty in-memory for longer than this interval - will be written out next time a flusher thread wakes up. Expiration + The data which has been dirty in-memory for longer than this interval + is written out next time a flusher thread wakes up. Expiration is measured based on the modification time of a file's inode. - Therefore, multiple dirtied pages from the same file will all be - written when the interval is exceeded. + Therefore, multiple dirtied pages from the same file are all written when the interval is exceeded. @@ -492,17 +488,17 @@ SLE-12: vm.dirty_ratio = 20 The primary advantage of using the lower ratio in &sle; 12 is that page reclamation and allocation in low memory situations completes - faster as there is a higher probability that old clean pages will be + faster as there is a higher probability that old clean pages are quickly found and discarded. The secondary advantage is that if all data on the system must be synchronized, then the time to complete the - operation on &sle; 12 will be lower than &sle; 11 SP3 by default. + operation on &sle; 12 is lower than &sle; 11 SP3 by default. Most workloads will not notice this change as data is synchronized with fsync() by the application or data is not dirtied quickly enough to hit the limits. - There are exceptions and if your application is affected by this, it - will manifest as an unexpected stall during writes. To prove it is + There are exceptions, and if your application is affected by this, it + can manifest as an unexpected stall during writes. To prove it is affected by dirty data rate limiting then monitor /proc/PID_OF_APPLICATION/stack and it will be observed that the application spends significant time in @@ -511,11 +507,12 @@ SLE-12: vm.dirty_ratio = 20 vm.dirty_ratio to 40 to restore the &sle; 11 SP3 behavior. - - It is important to note that the overall I/O throughput is the same - regardless of the setting. The only difference is the timing of when - the I/O is queued. - + + + The overall I/O throughput is the same regardless of the setting. The only difference is the + timing of when the I/O is queued. + + This is an example of using dd to asynchronously write 30% of memory to disk which would happen to be affected by the @@ -529,11 +526,10 @@ SLE-12: vm.dirty_ratio = 20 dd if=/dev/zero of=zerofile ibs=1048576 count=$((MEMTOTAL_MBYTES*30/100)) 2507145216 bytes (2.5 GB) copied, 10.1593 s, 247 MB/s - Note that the parameter affects the time it takes for the command to + The parameter affects the time it takes for the command to complete and the apparent write speed of the device. With dirty_ratio=40, more of the data is cached and - written to disk in the background by the kernel. It is very important - to note that the speed of I/O is identical in both cases. To + written to disk in the background by the kernel. The speed of I/O is identical in both cases. To demonstrate, this is the result when dd synchronizes the data before exiting: @@ -544,7 +540,7 @@ dd if=/dev/zero of=zerofile ibs=1048576 count=$((MEMTOTAL_MBYTES*30/100)) &prompt.root;dd if=/dev/zero of=zerofile ibs=1048576 count=$((MEMTOTAL_MBYTES*30/100)) conv=fdatasync 2507145216 bytes (2.5 GB) copied, 21.7286 s, 115 MB/s - Note that dirty_ratio had almost no impact here and + As observed, dirty_ratio had almost no impact here and is within the natural variability of a command. Hence, dirty_ratio does not directly impact I/O performance but it may affect the apparent performance of a workload that writes @@ -561,10 +557,10 @@ dd if=/dev/zero of=zerofile ibs=1048576 count=$((MEMTOTAL_MBYTES*30/100)) If one or more processes are sequentially reading a file, the kernel - reads some data in advance (ahead) to reduce the amount of + reads certain data in advance (ahead) to reduce the amount of time that processes need to wait for data to be available. The actual amount of data being read in advance is computed dynamically, based - on how much "sequential" the I/O seems to be. This parameter sets the + on the extent of sequentiality of the I/O. This parameter sets the maximum amount of data that the kernel reads ahead for a single file. If you observe that large sequential reads from a file are not fast enough, you can try increasing this value. Increasing it too far may @@ -614,7 +610,7 @@ always madvise [never] If disabled, the value never is shown in square brackets like in the example above. A value of - always will always try and use THP at fault + always mandatorily tries and uses THP at fault time but defer to khugepaged if the allocation fails. A value of madvise will only allocate THP for address spaces explicitly specified by an application. @@ -629,8 +625,7 @@ always madvise [never] allocating a THP. A value of always is the default for &sle; 12 SP1 and earlier&opensuse; 42.1 and earlier releases - that supported THP. If a THP is not available, the application will - try to defragment memory. It potentially incurs large stalls in an + that supported THP. If a THP is not available, the application tries to defragment memory. It potentially incurs large stalls in an application if the memory is fragmented and a THP is not available. @@ -644,14 +639,14 @@ always madvise [never] defer is only available on &sle; 12 SP2&opensuse; 42.2 and later releases. If a THP is not available, the - application will fall back to using small pages if a THP is not - available. It will wake the kswapd and + application falls back to using small pages if a THP is not + available. It wakes the kswapd and kcompactd kernel threads to defragment memory in the background and a THP will be allocated later by khugepaged. - The final option never will use small pages if + The final option never uses small pages if a THP is unavailable but no other action will take place. @@ -661,7 +656,7 @@ always madvise [never] khugepaged parameters - khugepaged will be automatically started when + khugepaged is automatically started when transparent_hugepage is set to always or madvise, and it will be automatically shut down if it is set to never. Normally @@ -700,7 +695,7 @@ always madvise [never] khugepaged sleeps for a short interval specified by this parameter after each pass to limit how much CPU usage is - used. Reducing this value will allocate THP in the background faster + used. Reducing this value allocates THP in the background faster at the cost of CPU usage. A value of 0 will force continual scanning. @@ -852,7 +847,7 @@ always madvise [never] These counters correspond to how many THPs were allocated directly by an application and how many times a THP was not available and small pages were used. Generally a high fallback rate is harmless - unless the application is very sensitive to TLB pressure. + unless the application is sensitive to TLB pressure. @@ -875,7 +870,7 @@ always madvise [never] These counters may increase when THP is enabled and the system is fragmented. compact_stall is incremented when - an application stalls allocating THP. The remaining counters + an application stalls allocating THP. The remaining counters account for pages scanned, the number of defragmentation events that succeeded or failed. diff --git a/xml/tuning_network.xml b/xml/tuning_network.xml index c1ea7c5546..ffa41f9e62 100644 --- a/xml/tuning_network.xml +++ b/xml/tuning_network.xml @@ -42,7 +42,7 @@ Configurable kernel socket buffers - Networking is largely based on the TCP/IP protocol and a socket interface + Most of modern networking is based on the TCP/IP protocol and a socket interface for communication; for more information about TCP/IP, see . The Linux kernel handles data it receives or sends via the socket interface in socket buffers. These kernel socket @@ -54,7 +54,7 @@ TCP autotuning Since kernel version 2.6.17 full autotuning with 4 MB maximum buffer - size exists. This means that manual tuning usually will not + size exists. This means that manual tuning does not improve networking performance considerably. It is often the best not to touch the following variables, or, at least, to check the outcome of tuning efforts carefully. @@ -217,7 +217,7 @@ Before starting with network tuning, it is important to isolate network - bottlenecks and network traffic patterns. There are some tools that can + bottlenecks and network traffic patterns. There are certain tools that can help you with detecting those bottlenecks. @@ -247,7 +247,7 @@ - There are quite a lot of properties Netfilter can take into account. + There are many properties Netfilter can take into account. Thus, the more rules are defined, the longer packet processing may last. Also advanced connection tracking could be rather expensive and, thus, slowing down overall networking. @@ -284,7 +284,7 @@ multiple receive queues in hardware. However, others are only equipped with a single queue and the driver must deal with all incoming packets in a single, serialized stream. To work around this issue, the operating - system must "parallelize" the stream to distribute the work across + system must parallelize the stream to distribute the work across multiple CPUs. On &productname; this is done via Receive Packet Steering (RPS). RPS can also be used in virtual environments. @@ -311,16 +311,15 @@ If the network interface hardware only supports a single receive queue, - only rx-0 will exist. If it supports multiple receive - queues, there will be an rx-N directory for + only rx-0 exists. If it supports multiple receive + queues, there is an rx-N directory for each receive queue. These configuration files contain a comma-delimited list of CPU bitmaps. - By default, all bits are set to 0. With this setting - RPS is disabled and therefore the CPU that handles the interrupt will - also process the packet queue. + By default, all bits are set to 0. With this setting, + RPS is disabled and therefore the CPU that handles the interrupt also processes the packet queue. @@ -356,8 +355,7 @@ F F 0 0 (hex) - On non-NUMA machines, all CPUs can be used. If the interrupt rate is very - high, excluding the CPU handling the network interface can boost + On non-NUMA machines, all CPUs can be used. If the interrupt rate is high, excluding the CPU handling the network interface can boost performance. The CPU being used for the network interface can be determined from /proc/interrupts. For example: diff --git a/xml/tuning_numactl.xml b/xml/tuning_numactl.xml index 0d0173879d..de87d373df 100644 --- a/xml/tuning_numactl.xml +++ b/xml/tuning_numactl.xml @@ -51,7 +51,7 @@ - The next access to the data will result in a NUMA Hinting Fault. Based + The next access to the data results in a NUMA Hinting Fault. Based on this fault, the data can be migrated to a memory node associated with the task accessing the memory. @@ -66,7 +66,7 @@ The unmapping of data and page fault handling incurs overhead. However, - commonly the overhead will be offset by threads accessing data associated + commonly the overhead is offset by threads accessing data associated with the CPU. @@ -75,7 +75,7 @@ Static configuration has been the recommended way of tuning workloads on - NUMA hardware for some time. To do this, memory policies can be set with + NUMA hardware. To do this, memory policies can be set with numactl, taskset or cpusets. NUMA-aware applications can use special APIs. In cases where the static policies have already been created, automatic @@ -84,7 +84,7 @@ - numactl will show the + numactl shows the memory configuration of the machine and whether it supports NUMA or not. This is example output from a 4-node machine. @@ -117,8 +117,7 @@ node 0 1 2 3 Automatic NUMA balancing can be enabled or disabled for the current session by writing 1 or 0 - to /proc/sys/kernel/numa_balancing which will - enable or disable the feature respectively. To permanently enable or + to /proc/sys/kernel/numa_balancing which enables or disables the feature respectively. To permanently enable or disable it, use the kernel command line option numa_balancing=[enable|disable]. @@ -151,7 +150,7 @@ node 0 1 2 3 Controls how frequently a task's data is scanned. Depending on the - locality of the faults the scan rate will increase or decrease. These + locality of the faults, the scan rate increases or decreases. These settings control the min and max scan rates. @@ -244,8 +243,7 @@ node 0 1 2 3 running the SpecJBB 2005 sknorr, 2014-08-21: "benchmark"(?) using a single instance of the JVM with no static tuning around memory - policies. Note, however, that the impact for each workload will vary and - that this example is based on a pre-release version of &productname; + policies. However, the impact for each workload varies and this example is based on a pre-release version of &productname; 12. diff --git a/xml/tuning_oprofile.xml b/xml/tuning_oprofile.xml index a59bb033e3..dfadbb8a53 100644 --- a/xml/tuning_oprofile.xml +++ b/xml/tuning_oprofile.xml @@ -27,8 +27,8 @@ It is not necessary to recompile or use wrapper libraries to - use &oprof;. Not even a kernel patch is needed. Usually, when - profiling an application, a small overhead is expected, depending on the + use &oprof;. Not even a kernel patch is needed. When + profiling an application, you can expect a small overhead, depending on the workload and sampling frequency. @@ -81,7 +81,7 @@ It is useful to install the *-debuginfo package for - the respective application you want to profile. If you want to profile + the respective application you want to profile. To profile the kernel, you need the debuginfo package as well. @@ -145,8 +145,7 @@ - Converts sample database files from a foreign binary format to the - native format. + Converts sample database files from a foreign binary format to the format specific to the platform. @@ -168,14 +167,14 @@ With &oprof;, you can profile both the kernel and applications. When profiling the kernel, tell &oprof; where to find the vmlinuz* file. Use the - option and point it to vmlinuz* (usually in + option and point it to vmlinuz* (generally available in /boot). If you need to profile kernel modules, &oprof; does this by default. However, make sure you read . - Applications usually do not need to profile the kernel, therefore you + Most applications do not need to profile the kernel, therefore you should use the option to reduce the amount of information. @@ -194,7 +193,7 @@ - Decide if you want to profile with or without the Linux kernel: + Decide whether to profile with or without the Linux kernel: @@ -406,7 +405,7 @@ BR_MISS_PRED_RETIRED: (counter: all)) , operf has written its data to CUR_DIR/oprofile_data/samples/current, and the reporting tools opreport and - opannotate will look there by default. + opannotate look there by default. @@ -428,10 +427,10 @@ BR_MISS_PRED_RETIRED: (counter: all)) /lib/libfoo.so - The option contains a comma separated list of - paths which is stripped from debug source files. These paths were - searched prior to looking in . The - option is also a comma separated list of + The option contains a comma-separated list of + paths which is stripped from debug source files. These paths are + searched before looking in . The + option is also a comma-separated list of directories to search for source files. diff --git a/xml/tuning_perf.xml b/xml/tuning_perf.xml index 69a58b6f45..ea1b26cfc8 100644 --- a/xml/tuning_perf.xml +++ b/xml/tuning_perf.xml @@ -39,7 +39,7 @@ Tony Jones, Mel Gorman. - Code integrated into the Linux kernel that is responsible for instructing + Code integrated into the Linux kernel that instructs the hardware. @@ -69,8 +69,7 @@ Tony Jones, Mel Gorman. Many modern processors contain a performance monitoring unit (PMU). The design and functionality of a PMU is CPU-specific. - For example, the number of registers, counters and features supported will - vary by CPU implementation. + For example, the number of registers, counters and features supported varies by CPU implementation. Each PMU model consists of a set of registers: the performance monitor @@ -194,8 +193,7 @@ Tony Jones, Mel Gorman. Display a report file and an annotated version of the executed code. - If debug symbols are installed, you will also see the source code - displayed. + If debug symbols are installed, the source code is also displayed. @@ -205,7 +203,7 @@ Tony Jones, Mel Gorman. List event types that Perf can report with the current kernel and with your CPU. - You can filter event types by category—for example, to see hardware + You can filter event types by category. For example, to see hardware events only, use perf list hw. @@ -218,8 +216,8 @@ Tony Jones, Mel Gorman. &prompt.user;man perf_event_open | grep -A5 BRANCH_MISSES Sometimes, events may be ambiguous. - Note that the lowercase hardware event names are not the name of raw - hardware events but instead the name of aliases created by Perf. + The lowercase hardware event names are not the names of raw + hardware events but instead the names of aliases created by Perf. These aliases map to differently named but similarly defined hardware events on each supported processor. @@ -294,7 +292,7 @@ Tony Jones, Mel Gorman. Recording events specific to particular commands - There are various ways to sample events specific to a particular command: + There are several ways to sample events specific to a particular command: @@ -304,7 +302,7 @@ Tony Jones, Mel Gorman. &prompt.root;perf record COMMAND Then, use the started process normally. - When you quit the process, the Perf session will also stop. + When you quit the process, the Perf session also stops. @@ -315,7 +313,7 @@ Tony Jones, Mel Gorman. &prompt.root;perf record -a COMMAND Then, use the started process normally. - When you quit the process, the Perf session will also stop. + When you quit the process, the Perf session also stops. @@ -336,7 +334,7 @@ Tony Jones, Mel Gorman. &prompt.user;perf report - This will open a pseudo-graphical interface. + This opens a pseudo-graphical interface.