-
-
Notifications
You must be signed in to change notification settings - Fork 64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] Integration causing constant flash writes on queried device #389
Comments
there is no way integration can do any writes on mikrotik device. could be you logging something extensive. |
I understand, and this is how I would assume the device should work in most cases. Nevertheless, there have been previous similar issues (apparently confirmed by Mikrotik). The following post on Reddit first got me thinking the reason for the writes is SNMP (link to conversation). This, however, turned out not to be the case after disabling the SNMP service completely and still seeing the issue. I'm not suggesting the integration itself is causing the writes, but something related to the queries over the api. Currently I've had a total of about 60 writes in 6 hours since I disabled the addon/integration, which seems reasonable. As soon is I enable it, I start getting 2 writes on every query made like clockwork (depending on the update interval currently set in the integration - at 30 seconds this would be 240 per hour or 40000+ per week, non-stop). As I mentioned, I've disabled all logging, graphing etc. in RouterOS, so those are not causing the writes. I don't know. Just thought I'd bring the issue forward (easy to check if it's just me... set the update interval to 1 second and open resources in Winbox 😄 ). But I suppose if there are issues, they are more in the direction of librouteros and Mikrotik. |
these are just API queries, so it shoudl not happen. |
You can check the writes with Mikrotik's Winbox or through the devices Webfig pages. They can be found under System/Resources (or cli /system/resource/print) |
Guys- it seems the API behaves on the MT side as an SNMP query would. There is a known problem with MT's implementation, where for whatever reason they are caching 'high complexity' calcs to flash. People have found that by running the query more frequently (sub 30 sec-ish) gets around this, where apparently a new request sort of resets the logic and the calcs are redone. Anyway, its known, and as of a year ago, MT had put it in the 'fix' queue. No ETA reported. There are people with very high write values (mine shows 9 million sectors on a HAP AC2) who show no bad sectors, so the flash may outlive the useful life of the router/switch depending on the flash config. There is a thread about this SNMP thing on Reddit under r/mikrotik. This thread is specifically about SNMP, but another I read implied that the API is simply acting as an SNMP emulator on reads. There are other things that contribute- DHCP lease time, logging level, etc, but yea, when you see the sector write count increment by 2 every 30 sec (where the Integration on mine is set to read every 30 sec), that is the bulk of the writes for my system. |
Hi. I wanted to add some information here. I captured via logging the actual read commands being sent via the client API to the Mikrotik. If those commands are run locally (Winbox), none of those from my setup cause a sector write. So, it isnt the physical process that causes this in RouterOS. I also checked the core MT logs, and while I see an initial login from the Integration/API, I dont see subsequent entries (i.e. the Integration isnt dropping and reconnecting each query cycle). So, that isnt contributing. There may be some internal logging where just the process of sending a request- any request- via the API yields a sector write (2 in my case). It may also be that there are some precursor calls in the Integration for each read cycle that are not being logged in HA, but are triggering the sector writes. I have asked MT support for help in finding a way to ID what is going on- if that is some special script, or a logging config that would catch it etc. There are reports on the MT forum that even proper SNMP OID reads are triggering sector writes. So, this may be, as suggested above, just some RouterOS bug where a new read 'session' on an open/established connection yields a sector write thing. Does the native integration also cause this? I havent checked. |
It is due to |
Confirmed- Lieta2's find on the check-for-updates, if commented out, results in no sector writes on my HAP AC2 for my config. Great catch! |
Thanks for the tip, but could you please elaborate ? |
One route is to edit the coordinator.py file in the install folder for your integration. There is a section commented as get_firmware_update. You can just replace the logic in the function to a 'return', and so the system will just bounce back out of that with no data read from the MT on firmware. It may be that there is a different way to read the firmware version, or if there is a pending update, that would not trigger a flash sector write. I posted this question to MT support- and it seems they are actually looking at it... Maybe it is a bug in RouterOS, or maybe there is an change to the query that would get around it... |
Thanks, I found the function and if I understand correctly, it should be modified that way ;
All the code is removed. |
Hi- This has been a real help to me (being able to query every 30 seconds or so), since part of the benefit in my application is in identifying unusual traffic from IOT vlans. Best I can tell, Router OS integrates TX/RX rate as a total TX/RX divided by the number of seconds since the last read, so it is averaging. With that, on a 30min sample time, you would never see a burst of 500kb of data that should not be there if that is a vlan that is moving 10MB of data on average per 30 min window etc... |
Thanks (again), I just had a bad time seeing my Chateau 18ax having 2 000 000+ writes for "nothing"... now "Sector Writes Since Reboot" is very stable. The code has been altered this way :
|
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. |
Not stale. |
are you sure its get_firmware_update? that should be executed only once every 4 hours or after device reboot. |
This is the one item I commented out, and the flash writes effectively stopped. There are always going to be some writes, but Im seeing something like 100 per now vs 10,000/day before I made this one change to the code. And, I think it would make sense to break out that firmware check as an optional process setup in the config, so that the read request is not sent vs the response being ignored if the user doesnt want notifications on firmware bumps. |
it could be possible there is some bug that triggers that check more often, or every query. |
It is certainly correlated 1:1 with the update frequency. It is easy to validate just by watching the Resources window in the MT Winbox, where perhaps you set update rate at 30sec, and then 60 sec. At least for my config, I see 2 writes accumulated for each read made by the plugin. It is possible its not strictly the firmware query, but the size of the query vs a buffer thing or who knows, but the net is that removing that query stops the writes correlated in time with the API queries. |
I received a note just now from MT Support that they have implemented a bug fix on their end for this flash write issue on the API call. However, they wont say when this will be rolled out. I assume it would be in the next dev release, but they are close to final for this cycle and may not want to add new fixes until the next minor rev. |
I don't know if this counts as a bug or an unfortunate feature... however. When enabled, the Mikrotik integration will cause flash writes on my CRS switch on every query. If I set the update interval to 30 seconds, the router will write every 30 seconds, if I set it to 5 seconds, it will write to flash every 5 seconds.
I tried disabling every sensor, one by one, but the writes still keep happening until I disable the entire integration.
Steps to reproduce the behavior:
Since the router SNMP data can be queried with minimal flash writes, I assume this would be possible with the integration?
Nearly all the writes in the screenshot are from the integration, which has been installed for a few weeks. Last reboot and the consecutive writes are after the update to RouterOS 7.16.2 (version upgrade had no effect, but you can see how the writes keep accumulating).
Some version info:
Noticed the issue a little while back, and initially thought it was the SNMP queries on the router. However after disabling SNMP completely, disabling all logging, checking every possible service running regularly, nothing seemed to affect the writes... until I remembered I had the HA integration running.
The text was updated successfully, but these errors were encountered: