Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

swoole5.0.1 php8.1/8.0.23 使用table incr/set 出现段错误。附有coredump分析 #4954

Open
wuyiwai opened this issue Jan 11, 2023 · 23 comments
Assignees

Comments

@wuyiwai
Copy link

wuyiwai commented Jan 11, 2023

Please answer these questions before submitting your issue.

What did you do? If possible, provide a simple script for reproducing the error.

我写了一个swoole的Server,部署在k8s上,在server里使用了table,总的大概预先分配了20g内存给table,程序的主要逻辑是在table中做set和incr操作,在这个过程中发现会使得程序异常,产生coredump。关键代码如下:

foreach ($params as $item)
{
	$countRet = $this->userMailFolderMap->incr('48142122:5', 'count', $count);
}

以上简单的代码暂时无法稳定复现,但启动服务后,24h内必定产生coredump。只能从coredump日志推出是由于incr导致。由于我incr的入口只有这一处,且上下文没有很特殊的逻辑,所以暂时无法提供稳定复现的php代码。

What did you expect to see?

如果该问题很明确且有解决方案,希望获取解决方案。
如果该问题不清晰或目前没有解决方案,由于目前没有方向进行继续定位,希望可以提供一些方向使得我可以继续进行debug,如果信息不够可以沟通继续补充。

What did you see instead?

目前在程序的stdout日志中,会打印出以下log,属于issue中的段错误类型

WARNING	Server::check_worker_exit_status(): worker(pid=31972, id=3) abnormal exit, status=255, signal=0

下面是php 8.0.23版本下的调试分析
对coredump文件进入gdb调试后的堆栈信息如下:

(gdb) bt
#0  0x00007fb4f69f3aff in raise () from /usr/lib64/libc.so.6
#1  0x00007fb4f69c6ea5 in abort () from /usr/lib64/libc.so.6
#2  0x00007fb4f69c6d79 in __assert_fail_base.cold.0 () from /usr/lib64/libc.so.6
#3  0x00007fb4f69ec456 in __assert_fail () from /usr/lib64/libc.so.6
#4  0x00007fb4f2fc20c2 in swoole::FixedPool::alloc (this=0x36bf650, size=<optimized out>)
    at /usr/local/src/swoole/src/memory/fixed_pool.cc:149
#5  swoole::FixedPool::alloc (this=0x36bf650, size=<optimized out>)
    at /usr/local/src/swoole/src/memory/fixed_pool.cc:149
#6  0x00007fb4f2fc3c60 in swoole::Table::alloc_row (this=0x7fb4d2c55010)
    at /usr/local/src/swoole/include/swoole_table.h:278
#7  swoole::Table::set (this=this@entry=0x7fb4d2c55010, key=0x7faf725f36f8 "48142122:5", keylen=10,
    rowlock=rowlock@entry=0x39d02f8, out_flags=out_flags@entry=0x39d02cc)
    at /usr/local/src/swoole/src/memory/table.cc:322
#8  0x00007fb4f2f9c711 in zim_swoole_table_incr (execute_data=0x7faf72250600, return_value=0x7faf72250560)
    at /usr/local/src/swoole/ext-src/swoole_table.cc:369
#9  0x00000000007f82cd in ZEND_DO_FCALL_SPEC_RETVAL_USED_HANDLER ()
    at /usr/local/src/php/Zend/zend_vm_execute.h:1863
#10 execute_ex (ex=0x2) at /usr/local/src/php/Zend/zend_vm_execute.h:55176
#11 0x00007fb4f2f1d2dd in swoole::PHPCoroutine::main_func (arg=0x7fff63934bb0)
    at /usr/local/src/swoole/ext-src/swoole_coroutine.cc:700
#12 0x00007fb4f2fac680 in std::function<void (void*)>::operator()(void*) const (__args#0=<optimized out>,
    this=0x3b60910) at /usr/include/c++/8/bits/std_function.h:682
#13 swoole::coroutine::Context::context_func (arg=0x3b60910)
    at /usr/local/src/swoole/src/coroutine/context.cc:142
#14 0x00007fb4f302fd31 in swoole_make_fcontext ()
    at /usr/local/src/swoole/thirdparty/boost/asm/make_x86_64_sysv_elf_gas.S:70
#15 0x0000000000200010 in ?? ()
#16 0x0000000000000021 in ?? ()
#17 0x0000000003a492e0 in ?? ()
#18 0x00007fb4f6d65c30 in main_arena () from /usr/lib64/libc.so.6
#19 0x0000000000000020 in ?? ()
#20 0x00000000000000a0 in ?? ()
#21 0x0000000000000040 in ?? ()
#22 0x0000000000000000 in ?? ()

还有一些其他的core dump,是在php 8.1版本上测的,内容差不多,一起贴上来

#0  swoole::FixedPool::alloc (this=0x2c332b0, size=0) at /usr/local/src/swoole/src/memory/fixed_pool.cc:151
#1  0x00007f85b9e07fb0 in swoole::Table::alloc_row (this=0x7f8599a91010) at /usr/local/src/swoole/include/swoole_table.h:278
#2  swoole::Table::set (this=this@entry=0x7f8599a91010, key=key@entry=0x7f85999ae018 "48380494:1", keylen=keylen@entry=10, rowlock=rowlock@entry=0x5736248, out_flags=out_flags@entry=0x5736244)
    at /usr/local/src/swoole/src/memory/table.cc:322
#3  0x00007f85b9de112f in zim_swoole_table_set (execute_data=0x7f859968ba10, return_value=0x7f859968b990) at /usr/local/src/swoole/ext-src/swoole_table.cc:295
#4  0x00000000007f52c7 in ZEND_DO_FCALL_SPEC_RETVAL_USED_HANDLER () at /usr/local/src/php/Zend/zend_vm_execute.h:1870
#5  execute_ex (ex=0x2c332b0) at /usr/local/src/php/Zend/zend_vm_execute.h:55807
#6  0x00007f85b9d5b8ed in swoole::PHPCoroutine::main_func (arg=0x7ffde52b5910) at /usr/local/src/swoole/ext-src/swoole_coroutine.cc:700
#7  0x00007f85b9df09d0 in std::function<void (void*)>::operator()(void*) const (__args#0=<optimized out>, this=0x4f372b0) at /usr/include/c++/8/bits/std_function.h:682
#8  swoole::coroutine::Context::context_func (arg=0x4f372b0) at /usr/local/src/swoole/src/coroutine/context.cc:142
#9  0x00007f85b9e73bd1 in swoole_make_fcontext () at /usr/local/src/swoole/thirdparty/boost/asm/make_x86_64_sysv_elf_gas.S:70
#10 0x0000000000000000 in ?? ()
(gdb) bt
#0  swoole::FixedPool::alloc (this=0x2a6a650, size=0)
    at /usr/local/src/swoole/src/memory/fixed_pool.cc:151
#1  0x00007f0a4efc3c60 in swoole::Table::alloc_row (this=0x7f0a2ec55010)
    at /usr/local/src/swoole/include/swoole_table.h:278
#2  swoole::Table::set (this=this@entry=0x7f0a2ec55010, key=0x7f0a2e893798 "39927417:9", keylen=10,
    rowlock=rowlock@entry=0x2d7b528, out_flags=out_flags@entry=0x2d7b4fc)
    at /usr/local/src/swoole/src/memory/table.cc:322
#3  0x00007f0a4ef9c711 in zim_swoole_table_incr (execute_data=0x7f0a2ebe8600,
    return_value=0x7f0a2ebe8560) at /usr/local/src/swoole/ext-src/swoole_table.cc:369
#4  0x00000000007f82cd in ZEND_DO_FCALL_SPEC_RETVAL_USED_HANDLER ()
    at /usr/local/src/php/Zend/zend_vm_execute.h:1863
#5  execute_ex (ex=0x2a6a650) at /usr/local/src/php/Zend/zend_vm_execute.h:55176
#6  0x00007f0a4ef1d2dd in swoole::PHPCoroutine::main_func (arg=0x7ffeb60556b0)
    at /usr/local/src/swoole/ext-src/swoole_coroutine.cc:700
#7  0x00007f0a4efac680 in std::function<void (void*)>::operator()(void*) const (
    __args#0=<optimized out>, this=0x2b794a0) at /usr/include/c++/8/bits/std_function.h:682
#8  swoole::coroutine::Context::context_func (arg=0x2b794a0)
    at /usr/local/src/swoole/src/coroutine/context.cc:142
#9  0x00007f0a4f02fd31 in swoole_make_fcontext ()
    at /usr/local/src/swoole/thirdparty/boost/asm/make_x86_64_sysv_elf_gas.S:70
#10 0x0000000000000000 in ?? ()
(gdb)
(gdb) bt
#0  swoole::FixedPool::alloc (this=0x2c332b0, size=0) at /usr/local/src/swoole/src/memory/fixed_pool.cc:151
#1  0x00007f85b9e07fb0 in swoole::Table::alloc_row (this=0x7f8599a91010) at /usr/local/src/swoole/include/swoole_table.h:278
#2  swoole::Table::set (this=this@entry=0x7f8599a91010, key=key@entry=0x7f85999af3d8 "48307068:4234896397235", keylen=keylen@entry=22, rowlock=rowlock@entry=0x5531ab8, out_flags=out_flags@entry=0x5531ab4)
    at /usr/local/src/swoole/src/memory/table.cc:322
#3  0x00007f85b9de112f in zim_swoole_table_set (execute_data=0x7f85996b4a10, return_value=0x7f85996b4990) at /usr/local/src/swoole/ext-src/swoole_table.cc:295
#4  0x00000000007f52c7 in ZEND_DO_FCALL_SPEC_RETVAL_USED_HANDLER () at /usr/local/src/php/Zend/zend_vm_execute.h:1870
#5  execute_ex (ex=0x2c332b0) at /usr/local/src/php/Zend/zend_vm_execute.h:55807
#6  0x00007f85b9d5b8ed in swoole::PHPCoroutine::main_func (arg=0x7ffde52b5910) at /usr/local/src/swoole/ext-src/swoole_coroutine.cc:700
#7  0x00007f85b9df09d0 in std::function<void (void*)>::operator()(void*) const (__args#0=<optimized out>, this=0x3b353e0) at /usr/include/c++/8/bits/std_function.h:682
#8  swoole::coroutine::Context::context_func (arg=0x3b353e0) at /usr/local/src/swoole/src/coroutine/context.cc:142
#9  0x00007f85b9e73bd1 in swoole_make_fcontext () at /usr/local/src/swoole/thirdparty/boost/asm/make_x86_64_sysv_elf_gas.S:70
#10 0x0000000000000000 in ?? ()

What version of Swoole are you using (show your php --ri swoole)?

Swoole => enabled
Author => Swoole Team <[email protected]>
Version => 5.0.1
Built => Jan 10 2023 09:13:22
coroutine => enabled with boost asm context
epoll => enabled
eventfd => enabled
signalfd => enabled
cpu_affinity => enabled
spinlock => enabled
rwlock => enabled
openssl => OpenSSL 1.1.1k  FIPS 25 Mar 2021
dtls => enabled
http2 => enabled
json => enabled
curl-native => enabled
zlib => 1.2.11
mutex_timedlock => enabled
pthread_barrier => enabled
futex => enabled
async_redis => enabled

Directive => Local Value => Master Value
swoole.enable_coroutine => On => On
swoole.enable_library => On => On
swoole.enable_preemptive_scheduler => Off => Off
swoole.display_errors => On => On
swoole.use_shortname => On => On
swoole.unixsock_buffer_size => 8388608 => 8388608

What is your machine environment used (show your uname -a & php -v & gcc -v) ?

linux
Linux crm-mail-unread-server-0 4.19.91-26.1.al7.x86_64 #1 SMP Tue Jul 26 17:52:28 CST 2022 x86_64 x86_64 x86_64GNU/Linux

php
PHP 8.0.23 (cli) (built: Jan 10 2023 09:10:32) ( NTS )Copyright (c) The PHP Group
Zend Engine v4.0.23, Copyright (c) Zend Technologies
    with Zend OPcache v8.0.23, Copyright (c), by Zend Technologies

gcc
Using built-in specs.COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-redhat-linux/8/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-redhat-linux
Configured with: ../configure --enable-bootstrap --enable-languages=c,c++,fortran,lto --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=https://bugs.rockylinux.org/ --enable-shared --enable-threads=posix --enable-checking=release --enable-multilib --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-gnu-unique-object --enable-linker-build-id --with-gcc-major-version-only --with-linker-hash-style=gnu --enable-plugin --enable-initfini-array --with-isl --disable-libmpx --enable-offload-targets=nvptx-none --without-cuda-driver --enable-gnu-indirect-function --enable-cet --with-tune=generic --with-arch_32=x86-64 --build=x86_64-redhat-linux
Thread model: posix
gcc version 8.5.0 20210514 (Red Hat 8.5.0-15) (GCC)
@NathanFreeman
Copy link
Member

NathanFreeman commented Jan 12, 2023

这个看起来像是地址问题,gdb的时候输入f 0进入alloc函数,打印一下slice看看。

@wuyiwai
Copy link
Author

wuyiwai commented Jan 12, 2023

以下是完整的调试信息
#0  swoole::FixedPool::alloc (this=0x2a6a650, size=0)
    at /usr/local/src/swoole/src/memory/fixed_pool.cc:151
#1  0x00007f0a4efc3c60 in swoole::Table::alloc_row (this=0x7f0a2ec55010)
    at /usr/local/src/swoole/include/swoole_table.h:278
#2  swoole::Table::set (this=this@entry=0x7f0a2ec55010,
    key=0x7f0a2e893798 "39927417:9", keylen=10,
    rowlock=rowlock@entry=0x2d7b528, out_flags=out_flags@entry=0x2d7b4fc)
    at /usr/local/src/swoole/src/memory/table.cc:322
#3  0x00007f0a4ef9c711 in zim_swoole_table_incr (execute_data=0x7f0a2ebe8600,
    return_value=0x7f0a2ebe8560)
    at /usr/local/src/swoole/ext-src/swoole_table.cc:369
#4  0x00000000007f82cd in ZEND_DO_FCALL_SPEC_RETVAL_USED_HANDLER ()
    at /usr/local/src/php/Zend/zend_vm_execute.h:1863
#5  execute_ex (ex=0x2a6a650)
    at /usr/local/src/php/Zend/zend_vm_execute.h:55176
#6  0x00007f0a4ef1d2dd in swoole::PHPCoroutine::main_func (arg=0x7ffeb60556b0)
    at /usr/local/src/swoole/ext-src/swoole_coroutine.cc:700
#7  0x00007f0a4efac680 in std::function<void (void*)>::operator()(void*) const    (__args#0=<optimized out>, this=0x2b794a0)
    at /usr/include/c++/8/bits/std_function.h:682
#8  swoole::coroutine::Context::context_func (arg=0x2b794a0)
    at /usr/local/src/swoole/src/coroutine/context.cc:142
#9  0x00007f0a4f02fd31 in swoole_make_fcontext ()


接下来一步步打印其中的变量
(gdb) f 0#0  swoole::FixedPool::alloc (this=0x2a6a650, size=0)    at /usr/local/src/swoole/src/memory/fixed_pool.cc:151151     in /usr/local/src/swoole/src/memory/fixed_pool.cc(gdb) p slice
$1 = (swoole::FixedPoolSlice *) 0x0
(gdb) p impl
$2 = (swoole::FixedPoolImpl *) 0x7f0a11b33008


(gdb) f 1
#1  0x00007f0a4efc3c60 in swoole::Table::alloc_row (this=0x7f0a2ec55010)
    at /usr/local/src/swoole/include/swoole_table.h:278
278     in /usr/local/src/swoole/include/swoole_table.h
(gdb) p new_row
$4 = <optimized out>(gdb) p pool
$5 = (swoole::FixedPool *) 0x2a6a650


(gdb) f 2
#2  swoole::Table::set (this=this@entry=0x7f0a2ec55010,    key=0x7f0a2e893798 "39927417:9", keylen=10,
    rowlock=rowlock@entry=0x2d7b528, out_flags=out_flags@entry=0x2d7b4fc)
    at /usr/local/src/swoole/src/memory/table.cc:322
322     /usr/local/src/swoole/src/memory/table.cc: No such file or directory.
(gdb) p key
$1 = 0x7f0a2e893798 "39927417:9"
(gdb) p keylen
$2 = 10
(gdb) p rowlock
$3 = (swoole::TableRow **) 0x2d7b528
(gdb) p out_flags
$4 = (int *) 0x2d7b4fc
(gdb) p keylen
$1 = 10
(gdb) p row
$2 = (swoole::TableRow *) 0x7f09a63ac9f8
(gdb) p _conflict_level
$3 = <optimized out>
(gdb)p row->active
$5 = 1 '\001'
(gdb) p row->key
$6 = "48379855:1797613", '\000' <repeats 47 times>
(gdb) p key
$7 = 0x7f0a2e893798 "39927417:9"
(gdb) p keylen
$8 = 10
(gdb) p row->key_len
$9 = 16 '\020'
(gdb) p row->next
$10 = (swoole::TableRow *) 0x0
(gdb) p conflict_count
$11 = 112868
(gdb) p _conflict_level
$12 = <optimized out>
(gdb) p conflict_max_level
$13 = 4
(gdb) p new_row
$14 = <optimized out>
(gdb) p insert_count
$15 = 1977079
(gdb) p update_count
$16 = 2697062


(gdb)f 3
#3  0x00007f0a4ef9c711 in zim_swoole_table_incr (execute_data=0x7f0a2ebe8600,
    return_value=0x7f0a2ebe8560)
    at /usr/local/src/swoole/ext-src/swoole_table.cc:369
369     /usr/local/src/swoole/ext-src/swoole_table.cc: No such file or directory.
(gdb) p table
$18 = (swoole::Table *) 0x7f0a2ec55010
(gdb) p key_len
$19 = 10
(gdb) p col_len
$20 = 5
(gdb) p incrby
$21 = (zval *) 0x7f0a2ebe8670
(gdb) p key
$22 = 0x7f0a2e893798 "39927417:9"
(gdb) p col
$23 = 0x7f0a2e913c08 "count"
(gdb) p incrby
$24 = (zval *) 0x7f0a2ebe8670
(gdb) p out_flags
$25 = 0
(gdb) p _rowlock
$26 = (swoole::TableRow *) 0x7f09a63ac9f8
(gdb) p row
$27 = <optimized out>


(gdb) f 6
#6  0x00007f0a4ef1d2dd in swoole::PHPCoroutine::main_func (arg=0x7ffeb60556b0)
    at /usr/local/src/swoole/ext-src/swoole_coroutine.cc:700
700     /usr/local/src/swoole/ext-src/swoole_coroutine.cc: No such file or directory.
(gdb) p current_execute_data
No symbol "current_execute_data" in current context.
(gdbp call
$1 = (zend_execute_data *) 0x7f0a2ebe80e0
(gdb) p func
$2 = (zend_function *) 0x7f0a2e92eed0
(gdb) p func->op_array
$3 = {type = 13 '\r', arg_flags = "/\177", fn_flags = 0, function_name = 0x0,
  scope = 0x1b0, prototype = 0x400003c0000050b, num_args = 8327985,
  required_num_args = 0, arg_info = 0x1b000000160, attributes = 0x34,
  cache_size = 1291, last_var = 264214, T = 8324372, last = 0,
  opcodes = 0xfffff0c800779f70, run_time_cache__ptr = 0xd0000001b0,
  static_variables_ptr__ptr = 0x20100520000050c,
  static_variables = 0x7f1b9d <execute_ex+6333>, vars = 0xfffff0b8000001b0,
  refcount = 0x3000000f8, last_live_range = 1292, last_try_catch = 66160,
  live_range = 0x7f3237 <execute_ex+12119>, try_catch_array = 0x100000150,
  filename = 0x50, line_start = 1292, line_end = 2114,
  doc_comment = 0x7f468a <execute_ex+17322>, last_literal = -4072,
  literals = 0x60, reserved = {0x1740000050c, 0x7f3237 <execute_ex+12119>,
    0x300000120, 0x70, 0x8420000050c, 0x7f2f0d <execute_ex+11309>}}
(gdb) p retval
$4 = (zval *) 0x2d7b650
(gdb) p func->type
$5 = 13 '\r'


(gdb) f 8
#8  swoole::coroutine::Context::context_func (arg=0x2b794a0)
    at /usr/local/src/swoole/src/coroutine/context.cc:142
142     /usr/local/src/swoole/src/coroutine/context.cc: No such file or directory.
(gdb) p arg
$6 = (void *) 0x2b794a0
(gdb) p _this
$7 = (swoole::coroutine::Context *) 0x2b794a0
(gdb) p _this->private_data_
$8 = (void *) 0x7ffeb60556b0
(gdb) p _this->fn_
$9 = {<std::_Maybe_unary_or_binary_function<void, void*>> = {<std::unary_function<void*, void>> = {<No d
ata fields>}, <No data fields>}, <std::_Function_base> = {static _M_max_size = 16, static _M_max_align =
 8, _M_functor = {_M_unused = {
        _M_object = 0x7f0a4ef1cee0 <swoole::PHPCoroutine::main_func(void*)>,
        _M_const_object = 0x7f0a4ef1cee0 <swoole::PHPCoroutine::main_func(void*)>,
        _M_function_pointer = 0x7f0a4ef1cee0 <swoole::PHPCoroutine::main_func(void*)>,
        _M_member_pointer = (void (std::_Undefined_class::*)(std::_Undefined_class * const)) 0x7f0a4ef1c
ee0 <swoole::PHPCoroutine::main_func(void*)>, this adjustment 139682314584409},
      _M_pod_data = "\340\316\361N\n\177\000\000Y\001\276R\n\177\000"},
    _M_manager = 0x7f0a4ef1fcd0 <std::_Function_base::_Base_manager<void (*)(void*)>::_M_manager(std::_A
ny_data&, std::_Any_data const&, std::_Manager_operation)>},
  _M_invoker = 0x7f0a4ef1fcc0 <std::_Function_handler<void (void*), void (*)(void*)>::_M_invoke(std::_An
y_data const&, void*&&)>}
(gdb) p _this->end_
$10 = false

还需要更详细的信息吗,我会及时补充上来
@NathanFreeman

@NathanFreeman
Copy link
Member

NathanFreeman commented Jan 12, 2023

slice在这里是个空指针,所以段错误了。我看看是为啥。

@wuyiwai
Copy link
Author

wuyiwai commented Jan 12, 2023

slice在这里是个空指针,所以段错误了。我看看是为啥。
感谢,如果还需要更多关于程序逻辑或者相关信息可以直接@我,我会及时补充相关

@NathanFreeman NathanFreeman self-assigned this Jan 12, 2023
@NathanFreeman
Copy link
Member

可以gdb之后, f 0 进去alloc看看impl->head是不是空指针吗

@wuyiwai
Copy link
Author

wuyiwai commented Jan 13, 2023

可以gdb之后, f 0 进去alloc看看impl->head是不是空指针吗

(gdb) p impl
$1 = (swoole::FixedPoolImpl *) 0x7fb4b5b33008
(gdb) p impl->head
$2 = (swoole::FixedPoolSlice *) 0x7fb4d0b68348

@NathanFreeman
Copy link
Member

谢谢

@wuyiwai
Copy link
Author

wuyiwai commented Jan 13, 2023

应该的。目前的现状是。我曾经部署过两个版本的swoole服务在k8s上。swoole的版本都是5.0.1,但php的版本曾经打过8.1和8.0.23的镜像。

8.1的gdb分析是这样的:

(gdb) bt
#0  swoole::FixedPool::alloc (this=0x2a6a650, size=0)
    at /usr/local/src/swoole/src/memory/fixed_pool.cc:151
#1  0x00007f0a4efc3c60 in swoole::Table::alloc_row (this=0x7f0a2ec55010)
    at /usr/local/src/swoole/include/swoole_table.h:278
#2  swoole::Table::set (this=this@entry=0x7f0a2ec55010, key=0x7f0a2e893798 "39927417:9", keylen=10,
    rowlock=rowlock@entry=0x2d7b528, out_flags=out_flags@entry=0x2d7b4fc)
    at /usr/local/src/swoole/src/memory/table.cc:322
#3  0x00007f0a4ef9c711 in zim_swoole_table_incr (execute_data=0x7f0a2ebe8600,
    return_value=0x7f0a2ebe8560) at /usr/local/src/swoole/ext-src/swoole_table.cc:369
#4  0x00000000007f82cd in ZEND_DO_FCALL_SPEC_RETVAL_USED_HANDLER ()
    at /usr/local/src/php/Zend/zend_vm_execute.h:1863
#5  execute_ex (ex=0x2a6a650) at /usr/local/src/php/Zend/zend_vm_execute.h:55176
#6  0x00007f0a4ef1d2dd in swoole::PHPCoroutine::main_func (arg=0x7ffeb60556b0)
    at /usr/local/src/swoole/ext-src/swoole_coroutine.cc:700
#7  0x00007f0a4efac680 in std::function<void (void*)>::operator()(void*) const (
    __args#0=<optimized out>, this=0x2b794a0) at /usr/include/c++/8/bits/std_function.h:682
#8  swoole::coroutine::Context::context_func (arg=0x2b794a0)
    at /usr/local/src/swoole/src/coroutine/context.cc:142
#9  0x00007f0a4f02fd31 in swoole_make_fcontext ()
    at /usr/local/src/swoole/thirdparty/boost/asm/make_x86_64_sysv_elf_gas.S:70
#10 0x0000000000000000 in ?? ()
(gdb)

8.0.23 (当前部署的版本)

(gdb) bt
#0  0x00007fb4f69f3aff in raise () from /usr/lib64/libc.so.6
#1  0x00007fb4f69c6ea5 in abort () from /usr/lib64/libc.so.6
#2  0x00007fb4f69c6d79 in __assert_fail_base.cold.0 () from /usr/lib64/libc.so.6
#3  0x00007fb4f69ec456 in __assert_fail () from /usr/lib64/libc.so.6
#4  0x00007fb4f2fc20c2 in swoole::FixedPool::alloc (this=0x36bf650, size=<optimized out>)
    at /usr/local/src/swoole/src/memory/fixed_pool.cc:149
#5  swoole::FixedPool::alloc (this=0x36bf650, size=<optimized out>)
    at /usr/local/src/swoole/src/memory/fixed_pool.cc:149
#6  0x00007fb4f2fc3c60 in swoole::Table::alloc_row (this=0x7fb4d2c55010)
    at /usr/local/src/swoole/include/swoole_table.h:278
#7  swoole::Table::set (this=this@entry=0x7fb4d2c55010, key=0x7faf725f36f8 "48142122:5", keylen=10,
    rowlock=rowlock@entry=0x39d02f8, out_flags=out_flags@entry=0x39d02cc)
    at /usr/local/src/swoole/src/memory/table.cc:322
#8  0x00007fb4f2f9c711 in zim_swoole_table_incr (execute_data=0x7faf72250600, return_value=0x7faf72250560)
    at /usr/local/src/swoole/ext-src/swoole_table.cc:369
#9  0x00000000007f82cd in ZEND_DO_FCALL_SPEC_RETVAL_USED_HANDLER ()
    at /usr/local/src/php/Zend/zend_vm_execute.h:1863
#10 execute_ex (ex=0x2) at /usr/local/src/php/Zend/zend_vm_execute.h:55176
#11 0x00007fb4f2f1d2dd in swoole::PHPCoroutine::main_func (arg=0x7fff63934bb0)
    at /usr/local/src/swoole/ext-src/swoole_coroutine.cc:700
#12 0x00007fb4f2fac680 in std::function<void (void*)>::operator()(void*) const (__args#0=<optimized out>,
    this=0x3b60910) at /usr/include/c++/8/bits/std_function.h:682
#13 swoole::coroutine::Context::context_func (arg=0x3b60910)
    at /usr/local/src/swoole/src/coroutine/context.cc:142
#14 0x00007fb4f302fd31 in swoole_make_fcontext ()
    at /usr/local/src/swoole/thirdparty/boost/asm/make_x86_64_sysv_elf_gas.S:70
#15 0x0000000000200010 in ?? ()
#16 0x0000000000000021 in ?? ()
#17 0x0000000003a492e0 in ?? ()
#18 0x00007fb4f6d65c30 in main_arena () from /usr/lib64/libc.so.6
#19 0x0000000000000020 in ?? ()
#20 0x00000000000000a0 in ?? ()
#21 0x0000000000000040 in ?? ()
#22 0x0000000000000000 in ?? ()

两者的报错还不太一样。 @ @NathanFreeman 请问这两者我需要部署哪个版本更利于你们分析?

@NathanFreeman
Copy link
Member

还要看看gdb之后, f 0 进入alloc函数,执行完FixedPoolSlice *slice = impl->head;这行代码之后的slice和impl->head的具体值。
p *slice
p *impl->head

@wuyiwai
Copy link
Author

wuyiwai commented Jan 14, 2023

p *impl->head

(gdb) p *slice
Cannot access memory at address 0x0
(gdb) p *impl->head
Cannot access memory at address 0x0

@NathanFreeman
Copy link
Member

谢谢

@matyhtf
Copy link
Member

matyhtf commented Jan 16, 2023

@wuyiwai 是否有其他进程执行了 Table::destroy() ?是否使用了一些多线程方面的扩展,比如 pthreads 、grpc、parallels

另外日志中有 status=255 可能是存在 PHP 致命错误,需要检查一下 php 的错误日志。

@wuyiwai
Copy link
Author

wuyiwai commented Jan 16, 2023

  1. 是否有其他进程执行了 Table::destroy() ?> 目前代码没有显示调用的逻辑
  2. grpc的扩展有装,但代码里对这三个拓展都没有使用到比较原生的用法?用的都是swoole比较常见的一些东西:协程、table、Timer这些。如果我这样描述不够清晰的话,方便提供一些应用相关的思路吗?
  3. php_errors.log 的确有一些日志。不过感觉可能相关性不是特别大,我贴上来看看:
    1. Waring错误:
    [14-Dec-2022 20:51:26 Asia/Shanghai] PHP Warning:  Invalid "opcache.jit" setting. Should be "disable", "on", "off", "tracing", "function" or 4-digit number in Unknown on line 0
    
    1. Waring错误:该错误是由于之前在k8s上面部署服务时,由于之前踩过grpc扩展跟swoole冲突的坑,所以在k8s的yaml配置上grpc相关的扩展配置成''导致只要运行swoole就会报这个warnnig的错误,但不是fatal error
    [16-Jan-2023 10:09:26 UTC] PHP Warning:  PHP Startup: Unable to load dynamic library '' (tried: /usr/local/php/lib/php/extensions/no-debug-non-zts-20210902/ (/usr/local/php/lib/php/extensions/no-debug-non-zts-20210902/: cannot read file data: Is a directory), /usr/local/php/lib/php/extensions/no-debug-non-zts-20210902/.so (/usr/local/php/lib/php/extensions/no-debug-non-zts-20210902/.so: cannot open shared object file: No such file or directory)) in Unknown on line 0
    
    目前只记录到这两种错误。如果后续还需要什么信息我会继续跟进提供 @matyhtf

@wuyiwai
Copy link
Author

wuyiwai commented Jan 17, 2023

还发现了一些别的coredump分析

warning: Can't open file (null) during file-backed mapping note processing
[New LWP 81667]
[New LWP 81668]
[New LWP 81669]
[New LWP 81670]
[New LWP 81671]
[New LWP 81672]
[New LWP 81673]
[New LWP 81674]
[New LWP 81675]
[New LWP 81676]
[New LWP 81677]
[New LWP 81678]
[New LWP 81679]
[New LWP 81680]
[New LWP 81681]
[New LWP 81682]
[New LWP 81683]
[New LWP 81684]
[New LWP 81685]
[New LWP 81686]
[New LWP 81687]
[New LWP 81688]
[New LWP 81689]
[New LWP 81690]
[New LWP 81691]
[New LWP 81692]
[New LWP 81693]
[New LWP 81694]
[New LWP 81695]
[New LWP 81696]
[New LWP 81697]
[New LWP 81698]
[New LWP 81699]

warning: Error reading shared library list entry at 0x294e0

warning: Error reading shared library list entry at 0x2528323340382d00
Failed to read a valid object file image from memory.
Core was generated by `mail unread server 3 event worker                                             '.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007f9b35406378 in ?? ()
[Current thread is 1 (LWP 81667)]
(gdb) bt
#0  0x00007f9b35406378 in ?? ()
#1  0x00007f9b35407fb0 in ?? ()
#2  0x00000000027eae58 in ?? ()
#3  0x0000000a00000000 in ?? ()
#4  0x00007f9b14d456d8 in ?? ()
#5  0x00007f9b14d94ac0 in ?? ()
#6  0x00007f9b14ff0010 in ?? ()
#7  0x00007f9b14d5f678 in ?? ()
#8  0x0000001a00000002 in ?? ()
#9  0x00007f9b14d94420 in ?? ()
#10 0x00007f9b14d94900 in ?? ()
#11 0x00007f9b353e0a21 in ?? ()
#12 0x0000000000000001 in ?? ()
#13 0x00007f9b14d44298 in ?? ()
#14 0x0000000000000000 in ?? ()

@matyhtf
Copy link
Member

matyhtf commented Jan 17, 2023

@wuyiwai 上面的 core 信息无效了,堆栈已破坏,无法分析问题。可以关闭 grpc 试试,另外在 gdb core 中可以使用 info threads 看看是否有其他的线程正在执行什么操作。

status 255 比较可以,包括 php error_log 、swoole log_file 都需要看一下,是否存在 fatal error

另外对 table 的操作只有 set、incr、del 这些吗?key 就是类似于 '48142122:5' 这样的格式,请提供一下 table create 时的参数

@wuyiwai
Copy link
Author

wuyiwai commented Jan 17, 2023

@matyhtf

  1. 创建table时的参数
$table = new Table((int)(256 * 65536));
$table->column('count', 1 4);
$table->column('company_count',1, 4);
$table->column('colleague_count', 1, 4);
$table->column('contact_count', 1, 4);
$table->create();
$this->userMailFolderMap = $table;
  1. key是类似"48142122:5",用":"做连接符
  2. 目前只有incr和se和del操作,set一般是初始化的时候会用,incr用的比较多,有些场景是可能不存在该key就incr了,但我查了文档和自测,跟文档描述的一样,incr key不存在时默认0,这个没报错。
  3. 目前php error_log 只有
[14-Dec-2022 20:51:26 Asia/Shanghai] PHP Warning:  Invalid "opcache.jit" setting. Should be "disable", "on", "off", "tracing", "function" or 4-digit number in Unknown on line 0

[16-Jan-2023 10:09:26 UTC] PHP Warning:  PHP Startup: Unable to load dynamic library '' (tried: /usr/local/php/lib/php/extensions/no-debug-non-zts-20210902/ (/usr/local/php/lib/php/extensions/no-debug-non-zts-20210902/: cannot read file data: Is a directory), /usr/local/php/lib/php/extensions/no-debug-non-zts-20210902/.so (/usr/local/php/lib/php/extensions/no-debug-non-zts-20210902/.so: cannot open shared object file: No such file or directory)) in Unknown on line 0

swoole log_file 目前在k8s上设置为false所以没有收集到文件,是配置了日志收集stdout输出的日志,不过去搜了发现有一些新的结果,可以留意下,大概有一下几种:

[2023-01-17 02:58:11 *37353.3]	WARNING	TableRow::lock(): lock process[36787] not exists, force unlock

[2023-01-17 02:51:13 *36961.0]	WARNING	TableRow::lock(): timeout, force unlock

[2023-01-17 02:59:32 $13.0]	WARNING	Server::check_worker_exit_status(): worker(pid=37127, id=2) abnormal exit, status=0, signal=11

WARNING	Server::check_worker_exit_status(): worker(pid=31972, id=3) abnormal exit, status=255, signal=0
  1. info threads
(gdb)  info threads
  Id   Target Id                         Frame
* 1    Thread 0x7fe456ef6e40 (LWP 44104) swoole::FixedPool::alloc (this=0x1beb2b0, size=0) at /usr/local/src/swoole/src/memory/fixed_pool.cc:151
  2    Thread 0x7fdf5e5d6700 (LWP 44105) 0x00007fe45313545c in pthread_cond_wait@@GLIBC_2.3.2 () from /usr/lib64/libpthread.so.0
  3    Thread 0x7fdf5ddd5700 (LWP 44106) 0x00007fe45313545c in pthread_cond_wait@@GLIBC_2.3.2 () from /usr/lib64/libpthread.so.0
  4    Thread 0x7fdf5d5d4700 (LWP 44107) 0x00007fe45313545c in pthread_cond_wait@@GLIBC_2.3.2 () from /usr/lib64/libpthread.so.0
  5    Thread 0x7fdf5cdd3700 (LWP 44108) 0x00007fe45313545c in pthread_cond_wait@@GLIBC_2.3.2 () from /usr/lib64/libpthread.so.0
  6    Thread 0x7fdf5c5d2700 (LWP 44109) 0x00007fe45313545c in pthread_cond_wait@@GLIBC_2.3.2 () from /usr/lib64/libpthread.so.0
  7    Thread 0x7fdf5bdd1700 (LWP 44110) 0x00007fe45313545c in pthread_cond_wait@@GLIBC_2.3.2 () from /usr/lib64/libpthread.so.0
  8    Thread 0x7fdf5b5d0700 (LWP 44111) 0x00007fe45313545c in pthread_cond_wait@@GLIBC_2.3.2 () from /usr/lib64/libpthread.so.0
  9    Thread 0x7fdf5adcf700 (LWP 44112) 0x00007fe45313545c in pthread_cond_wait@@GLIBC_2.3.2 () from /usr/lib64/libpthread.so.0
  10   Thread 0x7fdf5a5ce700 (LWP 44113) 0x00007fe45313545c in pthread_cond_wait@@GLIBC_2.3.2 () from /usr/lib64/libpthread.so.0
  11   Thread 0x7fdf59dcd700 (LWP 44114) 0x00007fe45313545c in pthread_cond_wait@@GLIBC_2.3.2 () from /usr/lib64/libpthread.so.0
  12   Thread 0x7fdf595cc700 (LWP 44115) 0x00007fe45313545c in pthread_cond_wait@@GLIBC_2.3.2 () from /usr/lib64/libpthread.so.0
  13   Thread 0x7fdf58dcb700 (LWP 44116) 0x00007fe45313545c in pthread_cond_wait@@GLIBC_2.3.2 () from /usr/lib64/libpthread.so.0
  14   Thread 0x7fdf585ca700 (LWP 44117) 0x00007fe45313545c in pthread_cond_wait@@GLIBC_2.3.2 () from /usr/lib64/libpthread.so.0
  15   Thread 0x7fdf57dc9700 (LWP 44118) 0x00007fe45313545c in pthread_cond_wait@@GLIBC_2.3.2 () from /usr/lib64/libpthread.so.0
  16   Thread 0x7fdf575c8700 (LWP 44119) 0x00007fe45313545c in pthread_cond_wait@@GLIBC_2.3.2 () from /usr/lib64/libpthread.so.0
  17   Thread 0x7fdf56dc7700 (LWP 44120) 0x00007fe45313545c in pthread_cond_wait@@GLIBC_2.3.2 () from /usr/lib64/libpthread.so.0
  18   Thread 0x7fdf565c6700 (LWP 44121) 0x00007fe45313545c in pthread_cond_wait@@GLIBC_2.3.2 () from /usr/lib64/libpthread.so.0
  19   Thread 0x7fdf55dc5700 (LWP 44122) 0x00007fe45313545c in pthread_cond_wait@@GLIBC_2.3.2 () from /usr/lib64/libpthread.so.0
  20   Thread 0x7fdf555c4700 (LWP 44123) 0x00007fe45313545c in pthread_cond_wait@@GLIBC_2.3.2 () from /usr/lib64/libpthread.so.0
  21   Thread 0x7fdf54dc3700 (LWP 44124) 0x00007fe45313545c in pthread_cond_wait@@GLIBC_2.3.2 () from /usr/lib64/libpthread.so.0
  22   Thread 0x7fdf545c2700 (LWP 44125) 0x00007fe4539190e1 in poll () from /usr/lib64/libc.so.6
  23   Thread 0x7fdf53dc1700 (LWP 44126) 0x00007fe45313545c in pthread_cond_wait@@GLIBC_2.3.2 () from /usr/lib64/libpthread.so.0
  24   Thread 0x7fdf535c0700 (LWP 44127) 0x00007fe45313545c in pthread_cond_wait@@GLIBC_2.3.2 () from /usr/lib64/libpthread.so.0
  25   Thread 0x7fdf52dbf700 (LWP 44128) 0x00007fe45313545c in pthread_cond_wait@@GLIBC_2.3.2 () from /usr/lib64/libpthread.so.0
  26   Thread 0x7fdf525be700 (LWP 44129) 0x00007fe45313545c in pthread_cond_wait@@GLIBC_2.3.2 () from /usr/lib64/libpthread.so.0
  27   Thread 0x7fdf51dbd700 (LWP 44130) 0x00007fe45313545c in pthread_cond_wait@@GLIBC_2.3.2 () from /usr/lib64/libpthread.so.0
  28   Thread 0x7fdf515bc700 (LWP 44131) 0x00007fe45313545c in pthread_cond_wait@@GLIBC_2.3.2 () from /usr/lib64/libpthread.so.0
  29   Thread 0x7fdf50dbb700 (LWP 44132) 0x00007fe45313545c in pthread_cond_wait@@GLIBC_2.3.2 () from /usr/lib64/libpthread.so.0
  30   Thread 0x7fdf505ba700 (LWP 44133) 0x00007fe45313545c in pthread_cond_wait@@GLIBC_2.3.2 () from /usr/lib64/libpthread.so.0
  31   Thread 0x7fdf4fdb9700 (LWP 44134) 0x00007fe45313545c in pthread_cond_wait@@GLIBC_2.3.2 () from /usr/lib64/libpthread.so.0
  32   Thread 0x7fdf4f5b8700 (LWP 44135) 0x00007fe45313545c in pthread_cond_wait@@GLIBC_2.3.2 () from /usr/lib64/libpthread.so.0
  33   Thread 0x7fdf4edb7700 (LWP 44136) 0x00007fe45313545c in pthread_cond_wait@@GLIBC_2.3.2 () from /usr/lib64/libpthread.so.0
  34   Thread 0x7fdf4e5b6700 (LWP 44137) 0x00007fe4531357aa in pthread_cond_timedwait@@GLIBC_2.3.2 () from /usr/lib64/libpthread.so.0
  35   Thread 0x7fdf4ddb5700 (LWP 44138) 0x00007fe4531357aa in pthread_cond_timedwait@@GLIBC_2.3.2 () from /usr/lib64/libpthread.so.0
  36   Thread 0x7fdf4d5b4700 (LWP 44139) 0x00007fe4531357aa in pthread_cond_timedwait@@GLIBC_2.3.2 () from /usr/lib64/libpthread.so.0

@wuyiwai
Copy link
Author

wuyiwai commented Jan 17, 2023

我去看了下 源码里面的 table.cc的lock方法的描述:

lock process[%d] not exists, force unlock 对应
/**
 * The process occupied by the resource no longer exists,
 * indicating that OOM occurred during the locking process,
 * forced to unlock
 */


timeout, force unlock 对应:
/**
 * The deadlock time exceeds 2 seconds (SW_TABLE_FORCE_UNLOCK_TIME),
 * indicating that the lock process has OOM,
 * and the PID has been reused, forcing the unlock
 */

不过我没能想到我的代码里有哪些应用场景有可能会触发这些。

  1. 目前swoole应用层没有能调用lock和unlock的地方,之前移除了。我看了下fixed_pool.cc,感觉可能是在set或者incr时会先获取内存,但pool满了,拿不到就报错了。我基于这个猜测展开下面的定位:
  2. 我目前的业务代码有两处调用set和incr。
    1. 初始化业务数据时
    $key = $this->buildUserMailFolderKey($userMailId, $folderId);
    $countData = $countMap[$userMailId][$folderId] ?? $defaultCountData;
    $setRet = $this->userMailFolderMap->set($key, $countData);
    
    1. 对业务数据进行更新时
    foreach ($params as $item)
    {
      $key = $this->buildUserMailFolderKey($userMailId, $folderId);
      $countRet = $this->userMailFolderMap->incr($key, 'count', $count);
      $companyCountRet = $this->userMailFolderMap->incr($key, 'company_count', $companyCount);
      $colleagueCountRet = $this->userMailFolderMap->incr($key, 'colleague_count', $colleagueCount);
      $contactCountRet = $this->userMailFolderMap->incr($key, 'contact_count', $contactCount);
    }
    
  3. 业务代码只有上面两处是有做set和incr的操作的。我预先分配了 256 * 65536 的size,按理说也不会不够用。目前的思路是以上这些
    @matyhtf @NathanFreeman

@wuyiwai
Copy link
Author

wuyiwai commented Jan 30, 2023

@matyhtf @NathanFreeman 我还需要提供什么信息以推进该问题的定位吗

@wuyiwai
Copy link
Author

wuyiwai commented Jan 31, 2023

@matyhtf @NathanFreeman 刚刚我去对比了swoole的版本更新日志,涉及table的有多个迭代,其中一些bug修复提到了修复相关的内存错误,我想到之前应该是有issue或者有什么方式是可以正确的反馈能帮助定位table的问题的。我额外想到一些不知道有没有关系的信息:我之前的服务在4.4.16版本上运行了2年多没有报错,在5.0.1、4.8版本上运行后的core dump里有关于table的错误。以下是关于table的更新的一些列举:

@matyhtf
Copy link
Member

matyhtf commented Feb 3, 2023

@wuyiwai 可能是在hash冲突分配动态内存时发生了 OOM,查看 dmesg -T 日志,是否有 OOM 信息,错误日志中 lock pid 检测说明有一个 worker 进程在 lock 阶段异常退出了。不清楚 crash 的顺序:

  1. 是先发生 crash ,然后出现 lock pid not exists 错误
  2. 还是先有 OOM ,然后出现 lock pid not exists 错误,最后第二个 worker 进程发生 crash

@wuyiwai
Copy link
Author

wuyiwai commented Feb 3, 2023

$ dmesg -T | grep --color -E 'kill|oom|out of memory'
$ 无日志

$ dmesg -T | grep --color -E 'error|warn|fatal|fail|oom'
没有发现oom的信息,我将其他日志贴上来,信息有些多,我大概省略一些用...表示相同部分

[Tue Jan 17 05:05:18 2023] php[811442]: segfault at 0 ip 00007fe44fe06378 sp 00000000026e86f8 error 4 in swoole.so[7fe44fc92000+291000]
...
[Tue Jan 17 20:06:49 2023] php[923999]: segfault at 10 ip 00007f3faf606390 sp 0000000002dc6cb8 error 6 in swoole.so[7f3faf492000+291000]
[Tue Jan 17 20:46:06 2023] php[2141929]: segfault at 0 ip 00007f3faf606378 sp 00000000023c19a8 error 4 in swoole.so[7f3faf492000+291000]
...

其中error 6比较少,error 4比较多

 * Page fault error code bits:
 *
 *   bit 0 == 0: no page found1: protection fault
 *   bit 1 == 0: read access1: write access
 *   bit 2 == 0: kernel-mode access1: user-mode access
 *   bit 3 == 1: use of reserved bit detected
 *   bit 4 == 1: fault was an instruction fetch

先针对error 4进行分析
error 4(100 = user-mode access & read access): IP 指令指针寄存器 地址为 00007f3faf606378 偏移量 7f3faf492000 => 0x174378
error 6(110 = user-mode access & write access): IP 指令指针寄存器 地址为 00007f3faf606390 偏移量 7f3faf492000 => 0x174390

$ addr2line -e swoole.so -fCi 0x174378
swoole::FixedPool::alloc(unsigned int)
/usr/local/src/swoole/src/memory/fixed_pool.cc:151

$ addr2line -e swoole.so -fCi 0x174390
swoole::FixedPool::alloc(unsigned int)
/usr/local/src/swoole/src/memory/fixed_pool.cc:162

相关`fixed_pool.cc`的5.0.1版本代码
void *FixedPool::alloc(uint32_t size) {
    FixedPoolSlice *slice = impl->head;
    if (slice->lock) { // 第151行
        swoole_set_last_error(SW_ERROR_MALLOC_FAIL);
        assert(get_number_of_spare_slice() == 0);
        return nullptr;
    }

    slice->lock = 1;
    impl->slice_use++;

    // move next slice to head (idle list)
    impl->head = slice->next;
    impl->head->prev = nullptr; // 第162行

    // move this slice to tail (busy list)
    impl->tail->next = slice;
    slice->next = nullptr;
    slice->prev = impl->tail;
    impl->tail = slice;

    return slice->data;
}

@matyhtf

@matyhtf
Copy link
Member

matyhtf commented Feb 14, 2023

@wuyiwai 加一下我的微信,19921030512 方便沟通,需要本地压测试试看能不能重现

@wuyiwai
Copy link
Author

wuyiwai commented Jun 21, 2023

重新用php 8.1.20 和swoole 5.0.3 部署了还是会有这个情况

php --ri swoole

swoole

Swoole => enabled
Author => Swoole Team <[email protected]>
Version => 5.1.0-dev
Built => Jun 20 2023 04:02:10
coroutine => enabled with boost asm context
epoll => enabled
eventfd => enabled
signalfd => enabled
cpu_affinity => enabled
spinlock => enabled
rwlock => enabled
openssl => OpenSSL 1.1.1k  FIPS 25 Mar 2021
dtls => enabled
http2 => enabled
json => enabled
curl-native => enabled
zlib => 1.2.11
mutex_timedlock => enabled
pthread_barrier => enabled
futex => enabled
async_redis => enabled
coroutine_pgsql => enabled

Directive => Local Value => Master Value
swoole.enable_coroutine => On => On
swoole.enable_library => On => On
swoole.enable_fiber_mock => Off => Off
swoole.enable_preemptive_scheduler => Off => Off
swoole.display_errors => On => On
swoole.use_shortname => On => On
swoole.unixsock_buffer_size => 8388608 => 8388608

sh-4.4# php -v
PHP 8.1.20 (cli) (built: Jun 20 2023 03:59:52) (NTS)
Copyright (c) The PHP Group
Zend Engine v4.1.20, Copyright (c) Zend Technologies
    with Zend OPcache v8.1.20, Copyright (c), by Zend Technologies

最新捕捉的上下文:
coredump相关

(gdb) bt
#0  0x00007fdaf4ad3acf in raise () from /usr/lib64/libc.so.6
#1  0x00007fdaf4aa6ea5 in abort () from /usr/lib64/libc.so.6
#2  0x00007fdaf4aa6d79 in __assert_fail_base.cold.0 () from /usr/lib64/libc.so.6
#3  0x00007fdaf4acc426 in __assert_fail () from /usr/lib64/libc.so.6
#4  0x00007fdaf0fef262 in swoole::FixedPool::alloc (this=0x32d0630, size=<optimized out>)
    at /usr/local/src/swoole-src-master/src/memory/fixed_pool.cc:149
#5  swoole::FixedPool::alloc (this=0x32d0630, size=<optimized out>) at /usr/local/src/swoole-src-master/src/memory/fixed_pool.cc:149
#6  0x00007fdaf0ff0dc0 in swoole::Table::alloc_row (this=0x7fdad0c6c010) at /usr/local/src/swoole-src-master/include/swoole_table.h:278
#7  swoole::Table::set (this=this@entry=0x7fdad0c6c010, key=key@entry=0x7fd56fe97d08 "48221197:2520849", keylen=keylen@entry=16,
    rowlock=rowlock@entry=0x35c55f8, out_flags=out_flags@entry=0x35c55f4) at /usr/local/src/swoole-src-master/src/memory/table.cc:318
#8  0x00007fdaf0fc784f in zim_swoole_table_set (execute_data=0x7fd56fc40a20, return_value=0x7fd56fc409a0)
    at /usr/local/src/swoole-src-master/ext-src/swoole_table.cc:295
#9  0x00000000007f7744 in ZEND_DO_FCALL_SPEC_RETVAL_USED_HANDLER () at /usr/local/src/php/Zend/zend_vm_execute.h:1870
#10 execute_ex (ex=0x2) at /usr/local/src/php/Zend/zend_vm_execute.h:55811
#11 0x0000000000783927 in zend_call_function (fci=fci@entry=0x7fd5ed0e3a68, fci_cache=fci_cache@entry=0x7fd5ed0e3aa8)
    at /usr/local/src/php/Zend/zend_execute_API.c:929
#12 0x00007fdaf0f3c8b1 in swoole::PHPCoroutine::main_func (_args=<optimized out>)
    at /usr/local/src/swoole-src-master/ext-src/swoole_coroutine.cc:694
#13 0x00007fdaf0fd7480 in std::function<void (void*)>::operator()(void*) const (__args#0=<optimized out>, this=0x36530d0)
    at /usr/include/c++/8/bits/std_function.h:682
#14 swoole::coroutine::Context::context_func (arg=0x36530d0) at /usr/local/src/swoole-src-master/src/coroutine/context.cc:142
#15 0x00007fdaf1061991 in swoole_make_fcontext ()
    at /usr/local/src/swoole-src-master/thirdparty/boost/asm/make_x86_64_sysv_elf_gas.S:70
#16 0x0000000000200010 in ?? ()
#17 0x0000000000000111 in ?? ()
#18 0x000000060000011a in ?? ()
#19 0x0000000000000001 in ?? ()
#20 0x0000000002000d00 in ?? ()
#21 0x0000000000000000 in ?? ()

9点54分:同时间的业务日志:

event worker: /usr/local/src/swoole-src-master/src/memory/fixed_pool.cc:153: virtual void* swoole::FixedPool::alloc(uint32_t): Assertion `get_number_of_spare_slice() == 0' failed.

相关代码如下:

foreach ($map as $key => $item)
{
    $table->set($key, $value);
}
也没复杂调用,可以确认没有并发 @matyhtf @NathanFreeman 

下面是我的一些额外的猜想:
程序初始化table的时候代码如下:

$table = new Table((int)(256 * 65536));  也就是size = 16777216 table默认的$conflict_proportion是0.2
$table->column('count', Table::TYPE_INT, 4);
$table->column('company_count',Table::TYPE_INT, 4);
$table->column('colleague_count', Table::TYPE_INT, 4);
$table->column('contact_count', Table::TYPE_INT, 4);
$table->create();
$this->userMailFolderMap = $table;

为了尽可能捕捉报错的信息,我额外打印了一些日志:

              [tablename] [size]                          [memory]                              [percent]
9.52分:[    map    ] [         3512268         ] [   0.45000015199184 G    ] [          20.93          ]

更早一些的日志:

9:42 [    userMailFolderMap    ] [         3500278         ] [   0.45000015199184 G    ] [          20.86          ]
9:32 [    userMailFolderMap    ] [         3487470         ] [   0.45000015199184 G    ] [          20.79          ]
9:22 [    userMailFolderMap    ] [         3472632         ] [   0.45000015199184 G    ] [          20.7           ]

这个记录是十分钟记录一次memory信息 .不清楚这个conflict_proportion = 0.2 和table的设置有没有关系

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants