-
-
Notifications
You must be signed in to change notification settings - Fork 350
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Problem with NUTDRV_QX, subdriver armac (add EOL support) #2003
Conversation
…upstools#1978] Authored by Tomasz bla Fortuna @blaa at networkupstools#1978 (comment) Signed-off-by: Jim Klimov <[email protected]>
Signed-off-by: Jim Klimov <[email protected]>
Hey, waaaait a bit with this. ;-) As I've wrote in the issue - this might break compatibility with other Armac/powermanagerii devices though. The end of line test is certainly SAFE, but the -1 added to the copying look rather suspicious. I want to test it with my UPS (I'm on vacation and can do this in september ;s), or get a traffic dump from the @bdkacz for whom the previous version worked. So, blindly merging that can break compatibility. ;s |
@blaa: I have a question about the logic in this proposed change:
Am I right with a gut feeling that you would lose one character in the middle of a reply from each loop here under the "right" conditions? Should there be an |
Converted to "draft" as asked by @blaa (to avoid merge before regression testing) ;) |
memcpy(buf + bufpos, tmpbuf + 1, bytes_available - 1); | ||
bufpos += bytes_available - 1; | ||
|
||
if (tmpbuf[bytes_available - 1] == 0x0d) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be safe in all cases... (although it uses the bytes_available as well and it's suspicious that it worked without -1 previously).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At least, it is unsigned and non-zero (per previous checks), so array-wise the expression is safe :)
bytes_available = (unsigned char)tmpbuf[0] & 0x0f; | ||
if (bytes_available == 0) { | ||
/* End of transfer */ | ||
break; | ||
} | ||
|
||
memcpy(buf + bufpos, tmpbuf + 1, bytes_available); | ||
bufpos += bytes_available; | ||
memcpy(buf + bufpos, tmpbuf + 1, bytes_available - 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't look like a simple mistake in dropping -1
. My UPS rather wouldn't work with it. Maybe it needs some kind of a detection on how this byte is coded. I need traces from both UPSes to make certain it will cover both cases.
Maaaybe the original issue were I did this driver has one...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Otherwise, initially copying all of bytes_available
in the original loop seemed reasonable. Say the usb method returned one new byte, here now you'd copy 1 - 1 == 0
bytes (and check if that one is end of line)...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there seem to be only a trace where UPS send 1 byte at a time during each interrupt transfer:
[D3] armac_command: Got a known command 'Q1\r'
2.121903 [D3] read: ret 6 buf c1: 28 30 30 30 30 >(0000<
2.126943 [D3] read: ret 6 buf 81: 32 30 30 30 30 >20000<
2.130978 [D3] read: ret 6 buf c1: 33 30 30 30 30 >30000<
2.133998 [D3] read: ret 6 buf 81: 38 30 30 30 30 >80000<
(...)
Here bytes available (low nibble: tmpbuf[0] & 0x0f) seems to be equal to 1 and convey a 1 byte. Interrupt transmits 6 bytes at a time, but the rest is filled with literal ASCII 0. (would be cool if they used ZERO 0x00).
Current trace:
0.083301 [D4] armac command Q1
0.164847 [D4] read: ret 6 buf a6: 28 32 34 31 2e >(241.<
0.184839 [D4] read: ret 6 buf 86: 35 20 30 30 30 >5 000<
0.205851 [D4] read: ret 6 buf a6: 2e 30 20 32 33 >.0 23<
0.226849 [D4] read: ret 6 buf 86: 30 2e 33 20 30 >0.3 0<
The first nibble oscillates between a
and 8
- previously c
and 8
.
But TBH, that seems like an easy fix. I'd assume this 6
means this UPS has available 6 bytes in a buffer. This interrupt transfer transferred 5... it can't transfer more. It's just FASTER than the previous model.
Instead of -1
one could:
bytes_available = (unsigned char)tmpbuf[0] & 0x0f;
if (bytes_available == 0) {
/* End of transfer */
break;
}
if (bytes_available > 5) {
/* More bytes in UPS buffer, but this transfer only handles up to 5. */
bytes_available = 5;
}
Or even use the 'ret' from usb interrupt in case it can transfer more later?
if (bytes_available > ret - 1) {
/* More bytes in UPS buffer, but this transfer only handles up to 5. */
bytes_available = ret - 1;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My current take is this one:
diff --git a/drivers/nutdrv_qx.c b/drivers/nutdrv_qx.c
index 350c555a0..550eb4071 100644
--- a/drivers/nutdrv_qx.c
+++ b/drivers/nutdrv_qx.c
@@ -1825,6 +1825,9 @@ static int armac_command(const char *cmd, char *buf, size_t buflen)
* (which will happen after successfully writing a command
* to the UPS) */
if (ret != 6) {
+ /* NOTE: If end condition is invalid for particular UPS we might make one
+ * request more and get this error. If bufpos > (say) 10 this could be ignored
+ * and the reply correctly read. */
upsdebugx(1,
"interrupt read error: %s (%d)",
ret ? nut_usb_strerror(ret) : "timeout",
@@ -1838,15 +1841,32 @@ static int armac_command(const char *cmd, char *buf, size_t buflen)
tmpbuf[0], tmpbuf[1], tmpbuf[2], tmpbuf[3], tmpbuf[4], tmpbuf[5],
tmpbuf[1], tmpbuf[2], tmpbuf[3], tmpbuf[4], tmpbuf[5]);
+ /*
+ * On most tested devices (including R/2000I/PSW) this was equal to the number of
+ * bytes returned in the buffer, but on some newer UPS (R/3000I/PF1) it was 1 more
+ * (1 control + 5 bytes transferred and bytes_available equal to 6 instead of 5).
+ *
+ * Current assumption is that this is number of bytes available on the UPS side
+ * with up to 5 (ret - 1) transferred.
+ */
bytes_available = (unsigned char)tmpbuf[0] & 0x0f;
if (bytes_available == 0) {
/* End of transfer */
break;
}
+ if (bytes_available > 5) {
+ /* Single interrupt transfer has 1 control + 5 data bytes */
+ bytes_available = 5;
+ }
memcpy(buf + bufpos, tmpbuf + 1, bytes_available);
bufpos += bytes_available;
+ if (tmpbuf[bytes_available] == 0x0d) {
+ /* End of line is an end of the message as well */
+ break;
+ }
+
if (bytes_available <= 2) {
/* Slow down, let the UPS buffer more bytes */
usleep(15000);
It compiles, but could use some testing. ;-) Hopefully it will handle both cases without additional -1
magic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, looking at the data dumps in #1978 (comment) :
0.414746 [D4] read: ret 6 buf 86: 2e 30 20 30 30 >.0 00<
0.435756 [D4] read: ret 6 buf a6: 30 30 30 30 30 >00000<
000<.939819 [D4] read: ret 6 buf 83: 31 0d 30 30 30 >1
1.940207 [D1] interrupt read error: Operation timed out (-7)
So for a line where we have lower nibble (first byte & 0x0f
) equal to 6
and ret==6
, we have 5 protocol content bytes. However in the last practical line, with the 0x0d, the nibble is 3
and there are 2 content bytes (and the UPS stops sending more, hence the timeout).
I believe, what the driver did originally, with your UPS where its nibble was 1
it copied one byte into the buffer. Where the nibble was 6
it probably copied 6 bytes (starting from tmpbuf+1
so overflowing the tmpbuf
defined with size 6 exactly... wondering why compilers or run-times did not complain/segfault - not a write so don't care?) and probably adding garbage into that extra byte of the resulting string for each loop :\
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I've dropped ret - 1 because 6 is hardcoded above. (Original driver does reads always by 6 bytes). But that's equivalent - if ret != 6 we bail out anyway.
In general we can introduce a constant: const int READ_SIZE = 6;
Use it in usb_interrupt_read, in following if and later instead of ret
- it will be cleaner and I can prepare a patch like that.
Architecture-wise, I believe the UPS works like this:
([controller] --uart or rs232-- [usb bridge with buffer]) --usb-- [PC]
And this USB interface is a really poor USB Serial implementation. ;)
So the bytes_available
is not a length of a whole serial buffer, just the amount that currently is stored within the usb bridge. MAYBE. As far as we can tell. It's important to differentiate a TRUE data from a fill (0, 0x30), when there's less data then the read size certainly. Would not be necessary if the fill was 0x00. I think deducting 5 is worthless, but we could store info like more_available = true
if it's larger than READ_SIZE - 1
.
Unsure what to do with this information though. If we find \x0d
but more_available == true
that's probably an error and we could print a message - not much more though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, looking at the data dumps in #1978 (comment) :
0.414746 [D4] read: ret 6 buf 86: 2e 30 20 30 30 >.0 00< 0.435756 [D4] read: ret 6 buf a6: 30 30 30 30 30 >00000< 000<.939819 [D4] read: ret 6 buf 83: 31 0d 30 30 30 >1 1.940207 [D1] interrupt read error: Operation timed out (-7)
So for a line where we have lower nibble (first byte
& 0x0f
) equal to6
andret==6
, we have 5 protocol content bytes. However in the last practical line, with the 0x0d, the nibble is3
and there are 2 content bytes (and the UPS stops sending more, hence the timeout).
Watchout for the \r that shifts the 0 in the output in those lines. But yes, nibble is 3 and there are pretty much 2 bytes sent. I forgot about it today - and it suggest I should scan for \d and not just check the last byte.
I believe, what the driver did originally, with your UPS where its nibble was
1
it copied one byte into the buffer. Where the nibble was6
it probably copied 6 bytes (starting fromtmpbuf+1
so overflowing thetmpbuf
defined with size 6 exactly... wondering why compilers or run-times did not complain/segfault - not a write so don't care?) and probably adding garbage into that extra byte of the resulting string for each loop :\
Driver when the nibble was 6 could be reading 1 byte over tmpbuf. C compilers and run-times are quirky. :S In Rust that would certainly be a runtime Panic, maybe compilation error. tmpbuf is on the stack, so there's a big chance it was reading 1 byte more from the stack just.
Certainly adding a garbage - that changed the meaning of this nibble in a way not observed in other UPSes.
I'll take a look at this final byte too, maybe it needs special handling to find \r correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it will be more productive for you to cherry-pick the changes I've proposed so far and start another PR from your fork to override this one, to avoid me posting your patches from discussion into git :) WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, certainly. The problem is - testing. I can probably get this changes tested on the new UPS. Trouble is, to check on mine/other ones. When my sister is watering the plants I'll ask her to power it on; maaaybe I could do it remotely. (;
Closes: #1978
Thanks @blaa for proposing the fix, and @convicte for testing it.