Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

excess load leads to reboots #1317

Open
tmeinlschmidt opened this issue Jul 20, 2024 · 11 comments
Open

excess load leads to reboots #1317

tmeinlschmidt opened this issue Jul 20, 2024 · 11 comments

Comments

@tmeinlschmidt
Copy link

hw: ccgx

Some time after upgrading to 3.40 I'm experiencing random reboots, data loss or incorrect data displayed/sent to vrm. Tried to downgrade to 3.33, issue persisted. Tried latest beta as well, no change. CCGX worked for two and half years with no issue.

After ssh to ccgx I found in log

Jul 20 11:30:14 ccgx daemon.err watchdog[375]: loadavg 9 7 6 is higher than the given threshold 0 6 6!
Jul 20 11:30:31 ccgx daemon.err watchdog[375]: repair binary /usr/sbin/store_watchdog_error.sh returned 253 = 'load average too high'
Jul 20 11:30:32 ccgx daemon.alert watchdog[375]: shutting down the system because of error 253 = 'load average too high’

top says

  734   724 root     R    26520  11%  14% {dbus_systemcalc} /usr/bin/python3 -u /opt/victronenergy/dbus-systemcalc-py/dbus_systemca
  715   706 root     S<   81120  33%  11% /opt/victronenergy/gui/gui -nomouse -display Multi: LinuxFb: VNC:size=480x272:depth=32:pa
  674   658 root     R    41552  17%  10% {vrmlogger.py} /usr/bin/python3 -u /opt/victronenergy/vrmlogger/vrmlogger.py
  741   732 root     R    21372   9%   6% {dbus_generator.} /usr/bin/python3 -u /opt/victronenergy/dbus-generator-starter/dbus_gene
 2013  2004 root     S    53552  22%   6% /usr/bin/flashmq
  438   436 messageb R     3644   1%   5% dbus-daemon --system --nofork
  753   730 root     S    45580  19%   4% {dbus-modbus-cli} /usr/bin/python3 -u /opt/victronenergy/dbus-modbus-client/dbus-modbus-c

anything I can check/modify? Thanks

@mpvader
Copy link
Contributor

mpvader commented Jul 20, 2024

Hi, is it ok if we login remotely to analyse?

Then I need the vrm url + you need to enable remote support.

@tmeinlschmidt
Copy link
Author

interesting thing is, it's working for six hours with no restart, with no change from my side

Load average: 4.42 4.53 4.56

where I can send the url and other info privately?

@tmeinlschmidt
Copy link
Author

tmeinlschmidt commented Jul 20, 2024

you can see these gaps - reboots for today and yesterday

image

yesterday:
image

@tmeinlschmidt
Copy link
Author

hm, crashed again in 7min intervals.. seeing interesting logs

2024-07-20 13:14:54.843322500 020.309 ERR.task: error connecting to settings service
2024-07-20 13:14:55.011840500 *** starting dbus-adc ***
2024-07-20 13:14:55.625427500 dbus-adc 1.44

2024-07-20 10:34:58.002227500 INFO     Scan completed in 4 seconds
2024-07-20 11:31:08.033966500 *** CCGX booted (30253) ***
2024-07-20 11:34:05.666595500 /opt/victronenergy/serial-starter/run-service.sh: line 5: shift: 2: shift count out of range

2024-07-20 11:32:03.767700500 *** starting dbus-fzsonick-48tl ***
2024-07-20 11:33:04.054473500 INFO:root:starting /opt/victronenergy/dbus-fzsonick-48tl/dbus-fzsonick-48tl.py
2024-07-20 11:33:04.131042500 INFO:root:identifying battery...
2024-07-20 11:33:04.430816500 INFO:root:failed to identify battery at 1 : failed to get slave id from 1 : Modbus Error: [Input/Outp
ut] Modbus Error: [Invalid Message] Incomplete message received, expected at least 2 bytes (0 received)

2024-07-20 08:33:33.339233500 *** starting dbus-imt-si-rs485tc ***
2024-07-20 08:34:09.987365500 INFO:main:starting dbus-imt-si-rs485tc.py
2024-07-20 08:34:10.079742500 INFO:imt_si_rs485_sensor:searching for IMT SI-RS485 sensor on /dev/ttyUSB0
2024-07-20 08:34:11.166534500 ERROR:main:no IMT SI-RS485 sensor found
2024-07-20 08:34:11.168395500 INFO:main:dbus-imt-si-rs485tc.py has shut down

2024-07-20 10:35:07.808502500 *** starting gps-dbus ***
2024-07-20 10:35:07.911132500 gps_dbus 1.07
2024-07-20 10:35:09.966491500 002.054 ERR.VALUES: timeout connecting device

2024-07-20 12:42:22.485900500 *** starting mk2-dbus ***
2024-07-20 12:42:24.182952500 Error org.freedesktop.DBus.Error.ServiceUnknown: The name com.victronenergy.settings was not provided by any .service files

2024-07-20 12:44:23.738883500 ERROR "Error processing function code 3, unit id 100, start address 808, quantity 19, src 192.168.88.129: Error finding service with device type system at device instance 0"
2024-07-20 12:44:44.958457500 INFO  "[DBusService] Service online: com.victronenergy.system (0)"

2024-07-20 13:15:10.638939500 Error org.freedesktop.DBus.Error.ServiceUnknown: The name com.victronenergy.settings was not provided by any .service files
2024-07-20 13:15:20.244530500 Context available IP/Host 127.0.0.1

@mpvader
Copy link
Contributor

mpvader commented Jul 21, 2024

conversation is in the email, but here are some screenshots.

top on v3.33, gui not on moving ants page:

image

top on v3.33, after disabling dbus-generator (not in use); and gui not on moving ants page:

image

And variation is mostly vrmlogger (normal, it has its cycle), and dbus-modbus-cli. It
and when vrmlogger is in its cycle or evcharger is, then cpu usage is higher. For EV Charger, you see dbus-modbus-cli spiking up to 15% on this CCGX.

top on v3.22 (mosquitto instead of flashmq), dbus_generator also running since not needed; so running it always seems not to be a regression (but still an opportunity to not run it always)

image

dbus-round trip time of the last 6 months:

image

A normal common max number is 200ms, and what the graph shows is that per May 22nd it first started being higher, here is a close up:

image

But for a long time its too high already. For example here at July 13th its already too high:

image

root@ccgx:~# cat /data/log/swupdate/* | tai64nlocal | grep "Starting swupdate"
2023-06-19 22:06:33.247904500 Starting swupdate to install version 20230529211714 v3.00 ...
2023-07-17 22:55:46.041958500 Starting swupdate to install version 20230717162050 v3.01 ...
2023-09-25 10:52:26.934692500 Starting swupdate to install version 20230911151152 v3.10 ...
2023-10-16 23:12:33.099795500 Starting swupdate to install version 20231009045315 v3.11 ...
2023-11-14 21:01:45.581329500 Starting swupdate to install version 20231030212506 v3.12 ...
2023-12-12 09:54:25.943729500 Starting swupdate to install version 20231205211402 v3.13 ...
2024-02-16 09:14:20.269516500 Starting swupdate to install version 20240213074236 v3.20 ...
2024-02-25 17:22:23.683956500 Starting swupdate to install version 20240216163909 v3.21 ...
2024-03-04 12:59:34.594388500 Starting swupdate to install version 20240226113741 v3.22 ...
2024-03-11 21:57:09.927505500 Starting swupdate to install version 20240306173555 v3.30~13 ...
2024-04-01 10:54:34.380889500 Starting swupdate to install version 20240319200807 v3.40~1 ...
2024-04-21 21:13:50.712280500 Starting swupdate to install version 20240418190500 v3.31~4 ...
2024-05-02 15:05:59.760462500 Starting swupdate to install version 20240426135145 v3.40~10 ...
2024-05-07 21:21:01.115356500 Starting swupdate to install version 20240502131129 v3.40~13 ...
2024-05-20 09:48:57.651678500 Starting swupdate to install version 20240517184948 v3.40~16 ...
2024-05-26 22:25:09.216091500 Starting swupdate to install version 20240422090512 v3.31 ...
2024-06-02 20:10:36.665802500 Starting swupdate to install version 20240523125018 v3.32 ...
2024-06-13 16:28:20.817932500 Starting swupdate to install version 20240611052142 v3.33 ...
2024-07-18 11:36:08.174682500 Starting swupdate to install version 20240716134534 v3.40 ...
2024-07-18 11:48:18.312164500 Starting swupdate to install version 20240716134534 v3.40 ...
2024-07-19 10:13:13.006469500 Starting swupdate to install version 20240712153300 v3.40~39 ...
2024-07-20 08:53:02.277984500 Starting swupdate to install version 20240611052142 v3.33 ...
2024-07-20 09:12:55.593589500 Starting swupdate to install version 20240611052142 v3.33 ...
2024-07-21 12:09:15.272370500 Starting swupdate to install version https://updates.victronenergy.com/feeds/venus/
release/images/ccgx/venus-swu-ccgx-20240226113741-v3.22.swu ...
root@ccgx:~# 

dbus roundtrip time while running the older version, v3.22:

image

@mpvader
Copy link
Contributor

mpvader commented Jul 21, 2024

for comparison: d-bus roundtrip time in May is , aside from a spike, perhaps a firmware update, a lot lower - much clearer in the healthy zone.

image

image

@tmeinlschmidt
Copy link
Author

in May I installed grid meter (EM540), then in June installed EV charger

@mpvader
Copy link
Contributor

mpvader commented Jul 21, 2024

both add a bit of load.

Here is the same system again, running v3.40, gui not on the moving ants page, and dbus-generator (the genset start/stop subsystem) disabled.

this is at a quiet moment, ie both vrmlogger not at its interval; and dbus-modbus-cli also at low percentage.

image

@mpvader
Copy link
Contributor

mpvader commented Jul 21, 2024

CPU Load by energy meter is mostly the caused by sending multiple measurement updates per second, which cost a lot of CPU per process receiving those.

CPU load by the EV Charging Station will, I think, mostly be in the dbus-modbus-cli spike every x seconds.

Here is the number of dbus msges per second, made with dbus_signal_cntr.py.

image

@tmeinlschmidt
Copy link
Author

grid.cgwacs_ttyUSB0_mb1 - looks like grid meter hits a lot.

@mpvader
Copy link
Contributor

mpvader commented Jul 21, 2024

that is by design. The new Victron meter hits even more.

The problem with that is mostly the CPU load used by all the python process in receiving aka processing the updates.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants