-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
2.6.2: Kernel warning refcount_warn_saturate #29
Comments
Hi @bernhardschmidt thanks a lot! I will dig into it! |
the master branch contains what we believe to be a fix for this issue. would you be able to give it a try? |
v0.2.20230426 + 7b7a28f still prints
|
Thanks for reporting. This confirms that here we are dealing with a different issue. |
the way is similar to #18, but not 100% reliable to reproduce...
not sure if the following findings is related |
not a double free, but basically we are using an object having refcount equal to 0 already. Now, in So something is wrong around the assumption of already holding a ref in |
I think I should at least add a WARN_ON(!ovpn_peer_hold()) in |
haha... maybe i should say "double put" |
i automated the test by some scripts and now it loops itself. i tried to set if you have new ideas i could help to try kernel panic log[ 629.462856] OpenVPN data channel offload (ovpn-dco) 2.0.0 -- (C) 2020-2023 OpenVPN, Inc. [ 629.566333] === OVPN UP === [ 630.045953] === OVPN CONNECTED === [ 634.508026] === OVPN DISCONNECTED === [ 641.872456] ovs (unregistered): deleting peer with id 0, reason 1, refcount 0 [ 642.095452] OpenVPN data channel offload (ovpn-dco) 2.0.0 -- (C) 2020-2023 OpenVPN, Inc. [ 642.208358] === OVPN UP === [ 643.058373] === OVPN CONNECTED === [ 647.145359] === OVPN DISCONNECTED === [ 653.924496] ovs (unregistered): deleting peer with id 0, reason 1, refcount 0 [ 654.299363] OpenVPN data channel offload (ovpn-dco) 2.0.0 -- (C) 2020-2023 OpenVPN, Inc. [ 654.401859] === OVPN UP === [ 655.113708] === OVPN CONNECTED === [ 659.342124] === OVPN DISCONNECTED === [ 659.392530] ovs (unregistering): deleting peer with id 0, reason 1, refcount 0 [ 659.795397] OpenVPN data channel offload (ovpn-dco) 2.0.0 -- (C) 2020-2023 OpenVPN, Inc. [ 659.902599] === OVPN UP === [ 660.580699] === OVPN CONNECTED === [ 664.839997] === OVPN DISCONNECTED === [ 672.529352] ovs (unregistered): deleting peer with id 0, reason 1, refcount 0 [ 672.692321] OpenVPN data channel offload (ovpn-dco) 2.0.0 -- (C) 2020-2023 OpenVPN, Inc. [ 672.796617] === OVPN UP === [ 673.719704] === OVPN CONNECTED === [ 677.740144] === OVPN DISCONNECTED === [ 684.506142] ovs (unregistered): deleting peer with id 0, reason 1, refcount 0 [ 684.656086] OpenVPN data channel offload (ovpn-dco) 2.0.0 -- (C) 2020-2023 OpenVPN, Inc. [ 684.754853] === OVPN UP === [ 685.685571] === OVPN CONNECTED === [ 689.706047] === OVPN DISCONNECTED === [ 696.517813] ovs (unregistered): deleting peer with id 0, reason 1, refcount 0 [ 696.660450] OpenVPN data channel offload (ovpn-dco) 2.0.0 -- (C) 2020-2023 OpenVPN, Inc. [ 696.761643] === OVPN UP === [ 697.705879] === OVPN CONNECTED === [ 701.706752] === OVPN DISCONNECTED === [ 701.727836] failed to hold peer d4a83ca5 refcount 0 [ 701.733010] failed to hold peer d4a83ca5 refcount 0 [ 701.760522] ovs (unregistering): deleting peer with id 0, reason 1, refcount -1073741824 [ 702.196648] OpenVPN data channel offload (ovpn-dco) 2.0.0 -- (C) 2020-2023 OpenVPN, Inc. [ 702.295662] === OVPN UP === [ 702.977901] === OVPN CONNECTED === [ 707.243996] === OVPN DISCONNECTED === [ 707.263184] failed to hold peer 27d98334 refcount 0 [ 707.268291] failed to hold peer 27d98334 refcount 0 [ 707.295777] ovs (unregistering): deleting peer with id 0, reason 1, refcount -1073741824 [ 707.304636] CPU 1 Unable to handle kernel paging request at virtual address 00000179, epc == 8ca1162c, ra == 8ca1161c [ 707.315324] Oops[#1]: [ 707.317667] CPU: 1 PID: 77 Comm: kworker/1:1 Tainted: G W 5.4.238 #0 [ 707.325360] Workqueue: ovpn-crypto-wq-ovs ovpn_decrypt_work [ovpn_dco_v2] [ 707.332157] $ 0 : 00000000 00000001 00000011 00000000 [ 707.337383] $ 4 : 8e4fdb3c 00019961 00000001 000ff8b5 [ 707.342607] $ 8 : 8cc08000 8cc08000 0000007c 00000032 [ 707.347827] $12 : 33323130 37363534 8da3895f 5234e015 [ 707.353050] $16 : 8e4fd84c 8e49d000 8e1d1750 00000000 [ 707.358274] $20 : 8e4fd8e8 00000000 8e4fd800 8ca19ae0 [ 707.363495] $24 : 00000000 3957c031 [ 707.368709] $28 : 8fe12000 8fe13e20 8ca19ab8 8ca1161c [ 707.373925] Hi : 0d04b0dd [ 707.376792] Lo : bae21fe7 [ 707.379682] epc : 8ca1162c ovpn_decrypt_work+0x2b0/0x77c [ovpn_dco_v2] [ 707.386365] ra : 8ca1161c ovpn_decrypt_work+0x2a0/0x77c [ovpn_dco_v2] [ 707.393034] Status: 11007c03 KERNEL EXL IE [ 707.397209] Cause : 40800008 (ExcCode 02) [ 707.401201] BadVA : 00000179 [ 707.404068] PrId : 0001992f (MIPS 1004Kc) [ 707.408145] Modules linked in: ovpn_dco_v2 pppoe ppp_async l2tp_ppp wireguard pppox ppp_generic mt76x2e mt76x2_common mt76x02_lib mt7603e mt76 mac80211 libchacha20poly1305 ipt_REJECT ebtable_nat ebtable_filter ebtable_broute cfg80211 cdc_ncm xt_u32 xt_time xt_tcpudp xt_tcpmss xt_string xt_statistic xt_state xt_socket xt_recent xt_quota xt_pkttype xt_physdev xt_owner xt_nat xt_multiport xt_mark xt_mac xt_limit xt_length xt_iprange xt_hl xt_helper xt_ecn xt_dscp xt_conntrack xt_connmark xt_connlimit xt_connbytes xt_comment xt_bpf xt_addrtype xt_WGOBFS xt_TPROXY xt_TEE xt_TCPMSS xt_TARPIT xt_REDIRECT xt_NFQUEUE xt_NETMAP xt_MASQUERADE xt_LOG xt_HL xt_FLOWOFFLOAD xt_EOIP xt_DSCP xt_CT xt_CLASSIFY usbnet ts_kmp ts_fsm ts_bm tcprst tcp_bbr slhc poly1305_mips nfnetlink_queue nf_tproxy_ipv6 nf_tproxy_ipv4 nf_socket_ipv6 nf_socket_ipv4 nf_reject_ipv4 nf_log_ipv4 nf_flow_table_hw nf_flow_table nf_dup_ipv6 nf_dup_ipv4 nf_conntrack_netlink nf_conncount macvlan libcurve25519_generic iptable_raw [ 707.408428] iptable_nat iptable_mangle iptable_filter ipt_ECN ip_tables ebtables ebt_vlan ebt_stp ebt_snat ebt_redirect ebt_pkttype ebt_mark_m ebt_mark ebt_limit ebt_ip ebt_dnat ebt_arpreply ebt_arp ebt_among ebt_802_3 crc_ccitt compat_xtables compat chacha_mips br_netfilter act_nat sciu2s usbserial sch_tbf sch_ingress sch_htb sch_hfsc em_u32 cls_u32 cls_route cls_matchall cls_fw cls_flow cls_basic act_skbedit act_mirred act_gact ledtrig_usbport xt_set ip_set_list_set ip_set_hash_netportnet ip_set_hash_netport ip_set_hash_netnet ip_set_hash_netiface ip_set_hash_net ip_set_hash_mac ip_set_hash_ipportnet ip_set_hash_ipportip ip_set_hash_ipport ip_set_hash_ipmark ip_set_hash_ipmac ip_set_hash_ip ip_set_bitmap_port ip_set_bitmap_ipmac ip_set_bitmap_ip ip_set nfnetlink ip6table_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip6t_NPT ip6t_rt ip6t_mh ip6t_ipv6header ip6t_hbh ip6t_frag ip6t_eui64 ip6t_ah nf_log_ipv6 nf_log_common ip6table_mangle ip6table_filter ip6_tables ip6t_REJECT [ 707.495194] x_tables nf_reject_ipv6 ip6_gre ip_gre gre l2tp_netlink l2tp_core ip6_tunnel tunnel6 ip_tunnel veth tun vxlan udp_tunnel ip6_udp_tunnel leds_gpio xhci_plat_hcd xhci_pci xhci_mtk xhci_hcd gpio_button_hotplug usbcore nls_base usb_common mii [last unloaded: ovpn_dco_v2] [ 707.606640] Process kworker/1:1 (pid: 77, threadinfo=852d4f30, task=06ff9492, tls=00000000) [ 707.614951] Stack : 00000cc0 00000001 8fde5580 81013b40 ff7d6e00 00000000 00000040 8ca1b580 [ 707.623286] 8ca20000 8ca20000 00000001 00000003 8e4fd84c 8fde5580 81013b40 ff7d6d00 [ 707.631622] 00000020 00000040 00000000 80750000 81013b40 80047664 81013b40 81013b40 [ 707.639956] 00000008 81013b58 80750000 80750000 8fde5580 8fde5594 81013b40 00000008 [ 707.648290] 81013b58 80750000 80750000 80047a20 80680000 808b0000 00000001 805e2fc0 [ 707.656626] ... [ 707.659068] Call Trace: [ 707.661528] [<8ca1162c>] ovpn_decrypt_work+0x2b0/0x77c [ovpn_dco_v2] [ 707.667894] [<80047664>] process_one_work+0x244/0x498 [ 707.672933] [<80047a20>] worker_thread+0x168/0x5ec [ 707.677713] [<8004d654>] kthread+0x140/0x148 [ 707.681980] [<80006878>] ret_from_kernel_thread+0x14/0x1c [ 707.687362] Code: 24020011 8c630080 8c630010 <90630179> 106200b7 00000000 8e240054 2606030c 0c0c0da1 [ 707.697092] [ 707.699228] ---[ end trace f2d670d55287fe67 ]--- [ 707.703942] Kernel panic - not syncing: Fatal exception [ 707.709241] Rebooting in 3 seconds.. |
yeah, you are basically "evading" the problematic situation and thus avoiding the crash. However, what's left to understand is "why" are we getting to the point that the hold() is failing |
We have upgraded our production eduVPN server to 2.6.2 + 1c2c84e . It looks a lot better than before, it has been running for close to 6 hours now. This is the only kernel WARNING we got so far
The text was updated successfully, but these errors were encountered: