Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EMStress Thread-1 hangs indefinitely #573

Open
kyle-ibm opened this issue Jan 8, 2020 · 6 comments
Open

EMStress Thread-1 hangs indefinitely #573

kyle-ibm opened this issue Jan 8, 2020 · 6 comments

Comments

@kyle-ibm
Copy link

kyle-ibm commented Jan 8, 2020

EMStress test hangs indefinitely.
Initial debug points to Thread 1 test (CPU Governor change test) hanging because no return value was detected after a couple of minutes.

SUT OS: RHEL7.6alt 4.14.0-115.16.1.el7a.ppc64le
op-test LCB OS: Ubuntu 16.04.6 4.14.0-115.16.1.el7a.ppc64le

command: ./op-test -c SUT1 --run testcases.EMStress.RuntimeEMStress

~/op-test/test-reports/test-run-20200108014848$ ls -alh
total 1.5M
drwxrwxr-x  2 kloh kloh 4.0K Jan  8 01:49 .
drwxrwxr-x 19 kloh kloh 4.0K Jan  8 01:48 ..
-rw-rw-r--  1 kloh kloh  938 Jan  8 03:42 20200107174848837130.main.log
-rw-rw-r--  1 kloh kloh 620K Jan  8 05:20 20200107174848837593.debug.log
-rw-rw-r--  1 kloh kloh 254K Jan  8 04:48 20200108014848.log
-rw-rw-r--  1 kloh kloh 192K Jan  8 01:51 20200108014923-Thread-1.log    <==thread 1 hung after just 3 min
-rw-rw-r--  1 kloh kloh 3.8K Jan  8 05:20 20200108014927-Thread-2.log
-rw-rw-r--  1 kloh kloh  25K Jan  8 04:49 20200108014927-Thread-3.log
-rw-rw-r--  1 kloh kloh 163K Jan  8 05:09 20200108014928-Thread-4.log
-rw-rw-r--  1 kloh kloh 144K Jan  8 04:49 20200108014928-Thread-5.log

tail of Thread-1 log shows no return value at the end

$tail ~/op-test/test-reports/test-run-20200108014848/*Thread-1*
[console-expect]#for j in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor; do echo ondemand > $j; done
for j in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor; do echo ondemand > $j; done
bash: echo: write error: Invalid argument
bash: echo: write error: Invalid argument
..
..
bash: echo: write error: Invalid argument
bash: echo: write error: Invalid argument
[console-expect]#echo $?
echo $?
0
[console-expect]#for j in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor; do echo ondemand > $j; done
for j in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor; do echo ondemand > $j; done
bash: echo: write error: Invalid argument
bash: echo: write error: Invalid argument
..
..
bash: echo: write error: Invalid argument
bash: echo: write error: Invalid argument
kloh@openpowerlcb:~$
@kyle-ibm
Copy link
Author

kyle-ibm commented Jan 8, 2020

if I Control-C end the op-test script, errors point the OpTestThread.py class OpSSHThreadLinearVar1


^CException in thread Thread-1:
Traceback (most recent call last):
  File "/usr/lib/python3.6/threading.py", line 916, in _bootstrap_inner
    self.run()
  File "/home/kloh/op-test/common/OpTestThread.py", line 66, in run
    self.name, self.cmd_list, self.sleep_time, self.execution_time, self.ignore_fail)
  File "/home/kloh/op-test/common/OpTestThread.py", line 76, in inband_child_thread
    self.c.run_command(cmd)
  File "/home/kloh/op-test/common/OpTestSSH.py", line 225, in run_command
    return self.util.run_command(self, command, timeout, retry)
  File "/home/kloh/op-test/common/OpTestUtil.py", line 1611, in run_command
    output = self.try_command(term_obj, command, timeout)
  File "/home/kloh/op-test/common/OpTestUtil.py", line 1632, in try_command
    pty.sendline(command)
  File "/home/kloh/.local/lib/python3.6/site-packages/pexpect/pty_spawn.py", line 577, in sendline
    return self.send(s + self.linesep)
  File "/home/kloh/.local/lib/python3.6/site-packages/pexpect/pty_spawn.py", line 565, in send
    self._log(s, 'send')
  File "/home/kloh/.local/lib/python3.6/site-packages/pexpect/spawnbase.py", line 127, in _log
    self.logfile.flush()
BrokenPipeError: [Errno 32] Broken pipe

@kyle-ibm
Copy link
Author

kyle-ibm commented Jan 8, 2020

Also, test can pass if i comment out the thread-1 test from the script.

@gautshen
Copy link

On the system, what is the output of

cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_available_governors

and

cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor

@Oracle-Chen
Copy link

Hi, gautshen
After run EMStress test and output cmd:
[2020-04-07 16:18:24] [console-expect]#cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_available_goverrnors
[2020-04-07 16:18:51] conservative ondemand userspace powersave performance schedutil
[2020-04-07 16:18:51] [console-expect]#cat
sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
[2020-04-07 16:19:12] userspace

EMStress.RuntimeEMStress_SUT3.log

@Peiyu-Jhong
Copy link

We use the latest op-test version and run this test again. The result still failed.

The log message is below:
[console-expect]#ERROR (12709.448s)
Log file: /home/ooo/Pnor_test/0313_op-test/test-reports/test-run-20200416151712/20200416151745-Thread-1.log
logcmd: tee /home/ooo/Pnor_test/0313_op-test/test-reports/test-run-20200416151712/20200416151745-Thread-1.log| sed -u -e 's/\r$//g'|cat -v
<subprocess.Popen object at 0x7fafdc6c7a90>
Log file: <_io.TextIOWrapper name=10 encoding='utf-8'>
Log file: /home/ooo/Pnor_test/0313_op-test/test-reports/test-run-20200416151712/20200416151749-Thread-2.log
logcmd: tee /home/ooo/Pnor_test/0313_op-test/test-reports/test-run-20200416151712/20200416151749-Thread-2.log| sed -u -e 's/\r$//g'|cat -v
<subprocess.Popen object at 0x7fafd8505f60>
Log file: <_io.TextIOWrapper name=16 encoding='utf-8'>
Log file: /home/ooo/Pnor_test/0313_op-test/test-reports/test-run-20200416151712/20200416151749-Thread-3.log
logcmd: tee /home/ooo/Pnor_test/0313_op-test/test-reports/test-run-20200416151712/20200416151749-Thread-3.log| sed -u -e 's/\r$//g'|cat -v
<subprocess.Popen object at 0x7fafd8518588>
Log file: <_io.TextIOWrapper name=21 encoding='utf-8'>

======================================================================
ERROR [12709.448s]: runTest (testcases.EMStress.RuntimeEMStress)

Traceback (most recent call last):
File "/home/ooo/Pnor_test/0313_op-test/testcases/EMStress.py", line 137, in runTest
for core in range(1, num_avail_cores + 1):
TypeError: 'float' object cannot be interpreted as an integer


Ran 1 test in 12709.750s

FAILED (errors=1)
20200416_EMStress_fail.zip

@Gene-Lo
Copy link

Gene-Lo commented Dec 22, 2021

We use run this test in Rhel8.4 again, the result still failed.

《OP-Test Log》
test-run-20211211212625.zip

《SUT's Config》
[Kernel]
4.18.0-305.25.1.el8_4.ppc64le

[FW Config]
BMC: op940.22.mih-1-0-g41157d8d2e
Pnor: OP9_v2.4.1-4.31-prod

[HW Config]
CPU DD2.3 20 core *2
Micron Technology(MTA18ASF2G72PZ-2G9E1)16GiB x32
SAMSUNG PM985 (MZ1LB960HAJQ-00007) 960GB M.2 x1
PSU ACBEL 2000w *2
Slot1: 2-PORT 100Gb ROCE EN CONNECTX-5 GEN4 PCIe x16 LP CAPABLE ADAPTER

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants