Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Sensor error/offline requires restart of octoprint #93

Open
puterboy opened this issue Jan 8, 2023 · 15 comments
Open

BUG: Sensor error/offline requires restart of octoprint #93

puterboy opened this issue Jan 8, 2023 · 15 comments
Labels
help wanted Extra attention is needed

Comments

@puterboy
Copy link

puterboy commented Jan 8, 2023

I had a sensor offline for a while so that the custom command returned nothing (null string)
When the sensor came back online, it still had the display frozen at the last good value and the graph showing no data with the x-axis labeled as all zeros and the avg/min/max showing Null.

Refreshing the web browser didn't help.
Going into the settings for TopTemp and running 'test' showed that the sensor command was working again... just somehow the lack of good data had frozen the display and borked the graph history.

The only way I could fix it was by restarting octoprint (which is obviously non-ideal for many reasons)

The lack of sensor data (or even the presentation of 'bad' sensor data) should be handled gracefully... :)

@puterboy
Copy link
Author

puterboy commented Jan 8, 2023

Actually, it seems like even one bad sample can trigger this...

@puterboy
Copy link
Author

puterboy commented Jan 8, 2023

Note my ideal behavior in case of invalid or absent sensor data would be the following:

  1. Sensor displays NA in the ribbon
  2. Histogram skips the data point

@LazeMSS
Copy link
Owner

LazeMSS commented Jan 8, 2023

This I can not recreate - tried using a file for input and deleting it - basically it just skips the data
Also tried creating a file returning junk - no problems
Also tried creating a command that would cause a timeout.

Could you explain in details how to recreate - what data does the sensor return when failing?

@puterboy
Copy link
Author

puterboy commented Jan 8, 2023

It seems to break if the command returns nothing.

Until that is fixed, try this to replicate:

bash -c "if [ -e /tmp/crap ]; then /bin/true; else date +%M; fi"

It should give a graph cycling from 0 to 60 (corresponding to the time in minutes)

Then after a few samples, type "touch /tmp/crap" in a shell (which simulates the sensor returning nothing)
The displayed data is frozen at the last valid minute (as expected perhaps, though I would argue, the display should be NA).
And then the histogram goes to all zeros labels on the X-axis and avg/min/max all equal to Null which.

Then after a few minutes type "rm /tmp/crap" in a shell (which will simulate sensor returning to normal).
However, the above error state still persists!
Even worse, it seems to require a restart of Octoprint to fix.

I believe this is related to my latest comment re-opening #88
i.e., having a command that returns nothing (e.g., /bin/true) causes the sampling to hang forever.

@puterboy
Copy link
Author

Were you able to replicate with the above explanation?

@LazeMSS
Copy link
Owner

LazeMSS commented Jan 15, 2023

Have not tried yet.

LazeMSS added a commit that referenced this issue Jan 15, 2023
(re)fixed problems with custom cmd returning blank/true - #93 #88
@LazeMSS LazeMSS mentioned this issue Jan 15, 2023
@LazeMSS
Copy link
Owner

LazeMSS commented Jan 15, 2023

@LazeMSS LazeMSS closed this as completed Jan 15, 2023
@puterboy
Copy link
Author

I am still getting the problem whereby if a sensor is offline (and thereby returns an error) that the graph fails even when the sensor resumes... And that the problem can only be reset by restarting Octoprint.

i.e., still get a blank graph with NA for avg/min/max..

The correct behavior would be just to "skip" the time series points without valid data -- i.e. don't display the value (just display NA) or the graph (blank or gap in the graph) for those points and don't include them in the min/max/average.

@puterboy
Copy link
Author

@LazeMSS - any ideas why this is still occuring?

@LazeMSS
Copy link
Owner

LazeMSS commented Jan 30, 2023

I'm on holiday will look later

@puterboy
Copy link
Author

puterboy commented Jan 30, 2023 via email

@LazeMSS
Copy link
Owner

LazeMSS commented Feb 3, 2023

I still can't recreate this.

@puterboy
Copy link
Author

puterboy commented May 15, 2023

OK here is a command you can use to recreate the problem:

bash -c '(DATE=$(date +%s); if [ $DATE -lt TSTART-o $DATE -gt TEND]; then expr \( $DATE / 10 \) % 100; else sleep 100000 && echo 10; fi)'

where for TSTART substitute a value say a 100 seconds after the current epoch time (date +%s) and for TEND substitute a time say 200 seconds after the current epoch time. Set the sampling frequency to say 10 seconds.

@puterboy
Copy link
Author

puterboy commented May 15, 2023

Note that fixing the problem in #88 doesn't fix this problem,
Specifically, after fixing #88, the above command expression works under 'test' (whether before TSTART, between TSTART and TEND or after TEND) -- i.e. it returns the appropriate response (or timesout) when run manually as a 'test'.

However, when running as a sensor it displays updated sample data until TSTART, then it presumably timesout between TSTART and TEND with no new data -- so it is left displaying the last valid data before TSTART.
However, after TEND, it doesn't resume displaying data. i.e. it is "stuck" with the last value before TSTART.

I wonder whether the problem occurs in the function 'runCustomMon' where a timeout would mean that self.handleCustomData is not called leaving potentially a 'hole' in the data time series which causes the display to fail/stop?
Or perhaps the timeout occurs after the next time slice is called causing it to get confused?

Either way, this should allow you to reproduce the problem... and it is important to fix it since it's a very real case where the sensor may temporarily time out (or give invalid data).

Let me know if the above makes sense or you need any other info to reproduce.

Thanks

@puterboy
Copy link
Author

Anything I can help do to fix the problem?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants