Users reporting 4GB Pi 4 still reboots #47

chrisys · 2020-05-06T15:29:57Z

Users are still reporting that 4GB Pi 4s reset when working on 4 tasks. Reports say 3 tasks are OK. 2GB devices are now running stably with 1 task.

Reduce the number of Pi 4 allocated tasks to 3 by setting CPU usage percent to 75.

ptrm · 2020-05-06T19:11:11Z

Just an observation connected with simult. task count reliability: for several days my 4GB raspberry had been running stable with 4 cores occupied, but when I started fiddling with overclocking yesterday, it started rebooting by itself. That might just mean for the previous days I got specific tasks less demanding of the CPU, one device is surely too little to judge, but that made me think if fiddling with underclocking, instead of limiting core count, would give any benefit for stability that would make overall task count higher than the current solution. Or, setting a kernel cgroup limit to use certain percentage of the cpu (available as compose config field)

chrisys · 2020-05-08T09:44:54Z

@ptrm that's definitely something we should investigate. If you think about it from a fleet level we have the ability to deploy to thousands of devices and gather metrics on how they perform in order to figure out what the settings are that result in most work units being completed.

chrisys · 2020-05-08T12:24:02Z

@ptrm on a fleet level we're still seeing a lot of Pi 4 reboots which does seem to be 4GB boards, even when limited to 3 tasks.

ptrm · 2020-05-08T13:15:42Z

There are certailny many more indicators than I wrote about below, but here is what I managed to do for my two rpi4s to get stable under current load settings (1 core for 2GB rpi, 3 cores for the 4GB one).

One thing that turned out to be reboting my devices was undervoltage and underpowering. It's a common problem, especially for rpi4. Basically raspberrys, and the rpi4 the most, require the voltage to be stable and possibly closest to 5V, and most general use and even high-current chargers provide ~4.9 or less voltage under zero load, and then even less as the current rises (which is ok for charging 3.7V li-ion/poli batteries).

I came up with this snippet as a helpful tool to paste into balena os shell (rpi3 balenaos seems to not have vc tools installed):

while true; do \
  sleep 1; \
  clear; \
  date --iso-8601=s; \
  echo -ne 'vcgencmd get_throttled:\t\t'; \
  echo "ibase=16;obase=2;$(vcgencmd get_throttled| sed -E 's/^[^=]+=0x//')"|bc ; \
  echo -ne 'vcgencmd measure_clock arm:\t'; vcgencmd measure_clock arm; \
done

If something more than zero is output in get_throttled, it means some undervoltage occured, and it was usually corellating with reboots of my device. See the docs under get_throttled. There are separate flags for freq capping, undervoltage, and temperature excess for the past and current moment.

Here is my properly powered rpi4 for example (overclocked to 1.7Ghz), and if it would ever have been underpowered since last reboot, the get_throttled value would look something like 01010000000000000000 and the clock value might indicate around 600MHz. In the edge cases, my overclocked rpi4 with 4gb rebooted without visible changes in the the get_throttled output. So at 1800MHz for example, everything looked good but it would reboot every ~30min. So that might mean other things related with overclocking caused reboot, or the above ones are very sudden.

ptrm · 2020-05-08T13:35:38Z

And fleetwise, it might be good to write something on the project's webpage about good (or official) power supply.

Plus, now I remembered that after first deployents to balena I got the device-level variable RESIN_HOST_CONFIG_avoid_warnings set to 1 by default, which hides the warning icons overlaid on top of the screen contents. This might be a helpful indicator, but then guess little users use displays for their pis in such use case.

ptrm · 2020-05-09T09:20:14Z

The fun fact is, I can get my 2gb rpi4 to run at 2,1GHz with one task, but it failed to run on standard clock settings with 2 tasks with the same decent power source :/

ptrm · 2020-05-09T14:16:35Z

on a fleet level we're still seeing a lot of Pi 4 reboots which does seem to be 4GB boards, even when limited to 3 tasks.

How to distinguish reboots from "last online" status btw? Does the http API provide more options? I have a machine that's said to be online for 2 hours, but it's uptime is in balena OS is 23:19, so indicates no reboots at all :o

chrisys · 2020-05-12T11:50:04Z

@ptrm that's a good point you make and something I hadn't considered. Initially when we were looking at this issue, reboots were definitely occurring and resetting the device uptime as expected. However now I'm looking at a sample of devices from the fleet that have been online for a few minutes, and their uptimes are all measured in days. Perhaps the limitation to 3 tasks had a more substantial effect than I first thought.

We did see a marked jump in output after the fleet was updated on Friday morning: https://www.boincstats.com/stats/14/team/detail/18832/charts

The balenaCloud dashboard does have a per-device diagnostics facility which checks for undercurrent/underpower events (see here), but there's no way to run this on an entire fleet and correlate results at the moment.

chrisys · 2020-05-12T12:02:39Z

Added issue regarding missing vcgencmd here: balena-os/balena-raspberrypi#485

ptrm · 2020-05-12T12:04:49Z

Yeah, I noticed it can be checked here as well: https://dashboard.balena-cloud.com/devices/<device id>/diagnostics – it's marked as experimental, and indeed running the whole diagnostics even on idle rpi4 is lengthy.

Glad it's opensourced, though, the scripts look very useful.

EDIT: would be good to have them run separately, and also, maybe there's a way to tag a machine from the supervisor level to see in the fleet a flat regarding having ever been underpowered? (Seeing tags can have values, I assume even underpower counts might get into play)

And yeah, the chart looks impressive

chrisys self-assigned this May 6, 2020

chrisys linked a pull request May 6, 2020 that will close this issue

Reduce pi44gb tasks #48

Merged

chrisys closed this as completed in #48 May 7, 2020

chrisys reopened this May 8, 2020

chrisys mentioned this issue May 12, 2020

Utilize RPi throttled state to provide more information balena-io-modules/device-diagnostics#209

Closed

saintaardvark mentioned this issue May 28, 2020

vcgencmd missing balena-os/balena-raspberrypi#485

Open

ptrm mentioned this issue Jun 4, 2020

monitoring: add support for builtin netdata #51

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Users reporting 4GB Pi 4 still reboots #47

Users reporting 4GB Pi 4 still reboots #47

chrisys commented May 6, 2020

ptrm commented May 6, 2020 •

edited

Loading

chrisys commented May 8, 2020

chrisys commented May 8, 2020

ptrm commented May 8, 2020 •

edited

Loading

ptrm commented May 8, 2020 •

edited

Loading

ptrm commented May 9, 2020

ptrm commented May 9, 2020

chrisys commented May 12, 2020

chrisys commented May 12, 2020

ptrm commented May 12, 2020 •

edited

Loading

Users reporting 4GB Pi 4 still reboots #47

Users reporting 4GB Pi 4 still reboots #47

Comments

chrisys commented May 6, 2020

ptrm commented May 6, 2020 • edited Loading

chrisys commented May 8, 2020

chrisys commented May 8, 2020

ptrm commented May 8, 2020 • edited Loading

ptrm commented May 8, 2020 • edited Loading

ptrm commented May 9, 2020

ptrm commented May 9, 2020

chrisys commented May 12, 2020

chrisys commented May 12, 2020

ptrm commented May 12, 2020 • edited Loading

ptrm commented May 6, 2020 •

edited

Loading

ptrm commented May 8, 2020 •

edited

Loading

ptrm commented May 8, 2020 •

edited

Loading

ptrm commented May 12, 2020 •

edited

Loading