Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data corruption #29

Open
matiasandina opened this issue Nov 8, 2021 · 6 comments
Open

Data corruption #29

matiasandina opened this issue Nov 8, 2021 · 6 comments
Labels
bug Something isn't working

Comments

@matiasandina
Copy link
Owner

matiasandina commented Nov 8, 2021

This issue will try to put examples of data corruption affecting FEDWatcher functioning

1. Data corruption in FED number

FED 17 sent this string in the place of Session_type.

Pav\x83SPlusMinusl17

Problem

There's the more common data corruption of changing one character by \x[0-9]{2}.
There's the speciffic problem of not splitting the 17 (Device_Number) and the program title.
It then populated Device_Number with the value in Battery_Voltage

This happened in the middle of the session. Which means this row had to be manually inserted in the proper place on the proper table.

This error was not a one time event, several FEDWatcher had this. Not sure if related but the RPi were being pushed at 100% CPU usage. I would think it's data corruption on the emission not on the receiver side.

@matiasandina
Copy link
Owner Author

We could potentially implement a data cleanup routine.
We could send each message 3 times and take the "average" of each character. This will likely reduce gibberish events.
It will probably increase latency.

@matiasandina
Copy link
Owner Author

matiasandina commented May 18, 2023

Here's an example. Instead of sending FreefFeed, 34, 3.97, ... It's sending FreeFeed\x06l34, which breaks the int() call somewhere else and stops the proper data saving

05/18/2023 06:02:32,1.9.2,FreeFeed\x06l34,3.97,1,1,Pellet,Left,31,76,59,0,20.49,31,nan
Process Process-2:
Traceback (most recent call last):
  File "/usr/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
    self.run()
  File "/usr/lib/python3.7/multiprocessing/process.py", line 99, in run
    self._target(*self._args, **self._kwargs)
  File "/home/pi/FEDWatcher/fedwatcher/src/fedwatcher.py", line 234, in runHelper
    self._save_all_df()
  File "/home/pi/FEDWatcher/fedwatcher/src/fedwatcher.py", line 452, in _save_all_df
    self._save_to_csv(df_data)
  File "/home/pi/FEDWatcher/fedwatcher/src/fedwatcher.py", line 417, in _save_to_csv
    filename = f"FED{int(df_data[0]['Device_Number']):03d}_{timestr}_{self.session_num:02d}.csv"
ValueError: invalid literal for int() with base 10: '3.97'

@matiasandina
Copy link
Owner Author

matiasandina commented May 19, 2023

It might have been that the issue was that there wasn't enough space for the termination character. Maybe your classic off-by-one error ? Will test before closing

Update

I see the errors continue only when the Event is LeftWithPellet or RightWithPellet. This is probably due to the length of the assigned character. So far, I haven't had any more breaking lines, but this issue is still not fully fixed.

@matiasandina
Copy link
Owner Author

This is not fully done. For example, this is 12 being passed as q2 and then the correct error handling and breaking of FEDWatcher

Error: Unable to convert 'Device_Number' to an integer.
Process Process-2:
Traceback (most recent call last):
  File "/home/pi/FEDWatcher/fedwatcher/src/fedwatcher.py", line 431, in _save_to_csv
    device_number = int(float(df_data[0]['Device_Number']))
ValueError: could not convert string to float: 'q2'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
    self.run()
  File "/usr/lib/python3.7/multiprocessing/process.py", line 99, in run
    self._target(*self._args, **self._kwargs)
  File "/home/pi/FEDWatcher/fedwatcher/src/fedwatcher.py", line 249, in runHelper
    self._save_all_df()
  File "/home/pi/FEDWatcher/fedwatcher/src/fedwatcher.py", line 479, in _save_all_df
    self._save_to_csv(df_data)
  File "/home/pi/FEDWatcher/fedwatcher/src/fedwatcher.py", line 435, in _save_to_csv
    raise ValueError(error_msg)
ValueError: Unable to convert 'Device_Number' to an integer.

@matiasandina matiasandina reopened this Nov 29, 2023
@matiasandina
Copy link
Owner Author

Another example here 1w:44:54 gets parsed to 2023-12-07 01:44:54 and might affect how the pipeline works because of relying on datetime in other functions which might arrange by datetime (e.g., read_fed, recalculate_pellets).

# A tibble: 6 × 4
  `MM:DD:YYYY hh:mm:ss` Pi_Time             datetime            Pellet_Count
  <chr>                 <dttm>              <dttm>                     <dbl>
1 12/07/2023 17:43:43   2023-12-07 17:40:02 2023-12-07 17:43:43         1070
2 12/07/2023 17:44:14   2023-12-07 17:40:33 2023-12-07 17:44:14         1071
3 12/07/2023 1w:44:54   2023-12-07 17:41:14 2023-12-07 01:44:54         1072
4 12/07/2023 18:44:21   2023-12-07 18:40:40 2023-12-07 18:44:21         1073
5 12/07/2023 18:44:46   2023-12-07 18:41:06 2023-12-07 18:44:46         1074
6 12/07/2023 18:45:03   2023-12-07 18:41:23 2023-12-07 18:45:03         1075

@matiasandina
Copy link
Owner Author

There are predictable sources of "corruption" and somewhat unpredictable sources of corruption.

Predictable

durationStr seems to not handle RightWithPellet and LeftWithPellet

char durationStr[20];
if (Event == "Pellet"){
strcpy(durationStr, "nan");
}
else if (Left) {
sprintf(durationStr, "%.2f", leftInterval/1000.0);
}
else if (Right) {
sprintf(durationStr, "%.2f", rightInterval/1000.0);
}

This could be changed to

char durationStr[20];
if (Event == "Pellet") {
    strcpy(durationStr, "nan");
} else if (Event == "LeftWithPellet" || Event == "Left") {
    sprintf(durationStr, "%.2f", leftInterval / 1000.0);
} else if (Event == "RightWithPellet" || Event == "Right") {
    sprintf(durationStr, "%.2f", rightInterval / 1000.0);
} else {
    strcpy(durationStr, "nan");
}

Unpredictable

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant