excess load leads to reboots #1317

tmeinlschmidt · 2024-07-20T12:27:48Z

hw: ccgx

Some time after upgrading to 3.40 I'm experiencing random reboots, data loss or incorrect data displayed/sent to vrm. Tried to downgrade to 3.33, issue persisted. Tried latest beta as well, no change. CCGX worked for two and half years with no issue.

After ssh to ccgx I found in log

Jul 20 11:30:14 ccgx daemon.err watchdog[375]: loadavg 9 7 6 is higher than the given threshold 0 6 6!
Jul 20 11:30:31 ccgx daemon.err watchdog[375]: repair binary /usr/sbin/store_watchdog_error.sh returned 253 = 'load average too high'
Jul 20 11:30:32 ccgx daemon.alert watchdog[375]: shutting down the system because of error 253 = 'load average too high’

top says

  734   724 root     R    26520  11%  14% {dbus_systemcalc} /usr/bin/python3 -u /opt/victronenergy/dbus-systemcalc-py/dbus_systemca
  715   706 root     S<   81120  33%  11% /opt/victronenergy/gui/gui -nomouse -display Multi: LinuxFb: VNC:size=480x272:depth=32:pa
  674   658 root     R    41552  17%  10% {vrmlogger.py} /usr/bin/python3 -u /opt/victronenergy/vrmlogger/vrmlogger.py
  741   732 root     R    21372   9%   6% {dbus_generator.} /usr/bin/python3 -u /opt/victronenergy/dbus-generator-starter/dbus_gene
 2013  2004 root     S    53552  22%   6% /usr/bin/flashmq
  438   436 messageb R     3644   1%   5% dbus-daemon --system --nofork
  753   730 root     S    45580  19%   4% {dbus-modbus-cli} /usr/bin/python3 -u /opt/victronenergy/dbus-modbus-client/dbus-modbus-c

anything I can check/modify? Thanks

The text was updated successfully, but these errors were encountered:

mpvader · 2024-07-20T19:16:35Z

Hi, is it ok if we login remotely to analyse?

Then I need the vrm url + you need to enable remote support.

tmeinlschmidt · 2024-07-20T19:43:01Z

interesting thing is, it's working for six hours with no restart, with no change from my side

Load average: 4.42 4.53 4.56

where I can send the url and other info privately?

tmeinlschmidt · 2024-07-20T19:48:52Z

you can see these gaps - reboots for today and yesterday

yesterday:

tmeinlschmidt · 2024-07-20T20:43:05Z

hm, crashed again in 7min intervals.. seeing interesting logs

2024-07-20 13:14:54.843322500 020.309 ERR.task: error connecting to settings service
2024-07-20 13:14:55.011840500 *** starting dbus-adc ***
2024-07-20 13:14:55.625427500 dbus-adc 1.44

2024-07-20 10:34:58.002227500 INFO     Scan completed in 4 seconds
2024-07-20 11:31:08.033966500 *** CCGX booted (30253) ***
2024-07-20 11:34:05.666595500 /opt/victronenergy/serial-starter/run-service.sh: line 5: shift: 2: shift count out of range

2024-07-20 11:32:03.767700500 *** starting dbus-fzsonick-48tl ***
2024-07-20 11:33:04.054473500 INFO:root:starting /opt/victronenergy/dbus-fzsonick-48tl/dbus-fzsonick-48tl.py
2024-07-20 11:33:04.131042500 INFO:root:identifying battery...
2024-07-20 11:33:04.430816500 INFO:root:failed to identify battery at 1 : failed to get slave id from 1 : Modbus Error: [Input/Outp
ut] Modbus Error: [Invalid Message] Incomplete message received, expected at least 2 bytes (0 received)

2024-07-20 08:33:33.339233500 *** starting dbus-imt-si-rs485tc ***
2024-07-20 08:34:09.987365500 INFO:main:starting dbus-imt-si-rs485tc.py
2024-07-20 08:34:10.079742500 INFO:imt_si_rs485_sensor:searching for IMT SI-RS485 sensor on /dev/ttyUSB0
2024-07-20 08:34:11.166534500 ERROR:main:no IMT SI-RS485 sensor found
2024-07-20 08:34:11.168395500 INFO:main:dbus-imt-si-rs485tc.py has shut down

2024-07-20 10:35:07.808502500 *** starting gps-dbus ***
2024-07-20 10:35:07.911132500 gps_dbus 1.07
2024-07-20 10:35:09.966491500 002.054 ERR.VALUES: timeout connecting device

2024-07-20 12:42:22.485900500 *** starting mk2-dbus ***
2024-07-20 12:42:24.182952500 Error org.freedesktop.DBus.Error.ServiceUnknown: The name com.victronenergy.settings was not provided by any .service files

2024-07-20 12:44:23.738883500 ERROR "Error processing function code 3, unit id 100, start address 808, quantity 19, src 192.168.88.129: Error finding service with device type system at device instance 0"
2024-07-20 12:44:44.958457500 INFO  "[DBusService] Service online: com.victronenergy.system (0)"

2024-07-20 13:15:10.638939500 Error org.freedesktop.DBus.Error.ServiceUnknown: The name com.victronenergy.settings was not provided by any .service files
2024-07-20 13:15:20.244530500 Context available IP/Host 127.0.0.1

mpvader · 2024-07-21T12:48:06Z

conversation is in the email, but here are some screenshots.

top on v3.33, gui not on moving ants page:

top on v3.33, after disabling dbus-generator (not in use); and gui not on moving ants page:

And variation is mostly vrmlogger (normal, it has its cycle), and dbus-modbus-cli. It
and when vrmlogger is in its cycle or evcharger is, then cpu usage is higher. For EV Charger, you see dbus-modbus-cli spiking up to 15% on this CCGX.

top on v3.22 (mosquitto instead of flashmq), dbus_generator also running since not needed; so running it always seems not to be a regression (but still an opportunity to not run it always)

dbus-round trip time of the last 6 months:

A normal common max number is 200ms, and what the graph shows is that per May 22nd it first started being higher, here is a close up:

But for a long time its too high already. For example here at July 13th its already too high:

root@ccgx:~# cat /data/log/swupdate/* | tai64nlocal | grep "Starting swupdate"
2023-06-19 22:06:33.247904500 Starting swupdate to install version 20230529211714 v3.00 ...
2023-07-17 22:55:46.041958500 Starting swupdate to install version 20230717162050 v3.01 ...
2023-09-25 10:52:26.934692500 Starting swupdate to install version 20230911151152 v3.10 ...
2023-10-16 23:12:33.099795500 Starting swupdate to install version 20231009045315 v3.11 ...
2023-11-14 21:01:45.581329500 Starting swupdate to install version 20231030212506 v3.12 ...
2023-12-12 09:54:25.943729500 Starting swupdate to install version 20231205211402 v3.13 ...
2024-02-16 09:14:20.269516500 Starting swupdate to install version 20240213074236 v3.20 ...
2024-02-25 17:22:23.683956500 Starting swupdate to install version 20240216163909 v3.21 ...
2024-03-04 12:59:34.594388500 Starting swupdate to install version 20240226113741 v3.22 ...
2024-03-11 21:57:09.927505500 Starting swupdate to install version 20240306173555 v3.30~13 ...
2024-04-01 10:54:34.380889500 Starting swupdate to install version 20240319200807 v3.40~1 ...
2024-04-21 21:13:50.712280500 Starting swupdate to install version 20240418190500 v3.31~4 ...
2024-05-02 15:05:59.760462500 Starting swupdate to install version 20240426135145 v3.40~10 ...
2024-05-07 21:21:01.115356500 Starting swupdate to install version 20240502131129 v3.40~13 ...
2024-05-20 09:48:57.651678500 Starting swupdate to install version 20240517184948 v3.40~16 ...
2024-05-26 22:25:09.216091500 Starting swupdate to install version 20240422090512 v3.31 ...
2024-06-02 20:10:36.665802500 Starting swupdate to install version 20240523125018 v3.32 ...
2024-06-13 16:28:20.817932500 Starting swupdate to install version 20240611052142 v3.33 ...
2024-07-18 11:36:08.174682500 Starting swupdate to install version 20240716134534 v3.40 ...
2024-07-18 11:48:18.312164500 Starting swupdate to install version 20240716134534 v3.40 ...
2024-07-19 10:13:13.006469500 Starting swupdate to install version 20240712153300 v3.40~39 ...
2024-07-20 08:53:02.277984500 Starting swupdate to install version 20240611052142 v3.33 ...
2024-07-20 09:12:55.593589500 Starting swupdate to install version 20240611052142 v3.33 ...
2024-07-21 12:09:15.272370500 Starting swupdate to install version https://updates.victronenergy.com/feeds/venus/
release/images/ccgx/venus-swu-ccgx-20240226113741-v3.22.swu ...
root@ccgx:~#

dbus roundtrip time while running the older version, v3.22:

mpvader · 2024-07-21T12:50:31Z

for comparison: d-bus roundtrip time in May is , aside from a spike, perhaps a firmware update, a lot lower - much clearer in the healthy zone.

tmeinlschmidt · 2024-07-21T12:57:09Z

in May I installed grid meter (EM540), then in June installed EV charger

mpvader · 2024-07-21T14:09:18Z

both add a bit of load.

Here is the same system again, running v3.40, gui not on the moving ants page, and dbus-generator (the genset start/stop subsystem) disabled.

this is at a quiet moment, ie both vrmlogger not at its interval; and dbus-modbus-cli also at low percentage.

mpvader · 2024-07-21T14:12:18Z

CPU Load by energy meter is mostly the caused by sending multiple measurement updates per second, which cost a lot of CPU per process receiving those.

CPU load by the EV Charging Station will, I think, mostly be in the dbus-modbus-cli spike every x seconds.

Here is the number of dbus msges per second, made with dbus_signal_cntr.py.

tmeinlschmidt · 2024-07-21T14:14:20Z

grid.cgwacs_ttyUSB0_mb1 - looks like grid meter hits a lot.

mpvader · 2024-07-21T14:23:18Z

that is by design. The new Victron meter hits even more.

The problem with that is mostly the CPU load used by all the python process in receiving aka processing the updates.

mpvader mentioned this issue Jul 21, 2024

Run dbus-generator (genset start/stop) and dbus-parallelbms only when needed #1318

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

excess load leads to reboots #1317

excess load leads to reboots #1317

tmeinlschmidt commented Jul 20, 2024

mpvader commented Jul 20, 2024

tmeinlschmidt commented Jul 20, 2024

tmeinlschmidt commented Jul 20, 2024 •

edited

Loading

tmeinlschmidt commented Jul 20, 2024

mpvader commented Jul 21, 2024

mpvader commented Jul 21, 2024

tmeinlschmidt commented Jul 21, 2024

mpvader commented Jul 21, 2024

mpvader commented Jul 21, 2024

tmeinlschmidt commented Jul 21, 2024

mpvader commented Jul 21, 2024 •

edited

Loading

excess load leads to reboots #1317

excess load leads to reboots #1317

Comments

tmeinlschmidt commented Jul 20, 2024

mpvader commented Jul 20, 2024

tmeinlschmidt commented Jul 20, 2024

tmeinlschmidt commented Jul 20, 2024 • edited Loading

tmeinlschmidt commented Jul 20, 2024

mpvader commented Jul 21, 2024

mpvader commented Jul 21, 2024

tmeinlschmidt commented Jul 21, 2024

mpvader commented Jul 21, 2024

mpvader commented Jul 21, 2024

tmeinlschmidt commented Jul 21, 2024

mpvader commented Jul 21, 2024 • edited Loading

tmeinlschmidt commented Jul 20, 2024 •

edited

Loading

mpvader commented Jul 21, 2024 •

edited

Loading