Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem with NUTDRV_QX, subdriver armac (add EOL support) #2003

Closed
wants to merge 2 commits into from

Conversation

jimklimov
Copy link
Member

Closes: #1978

Thanks @blaa for proposing the fix, and @convicte for testing it.

@jimklimov jimklimov added bug Qx protocol driver Driver based on Megatec Q<number> such as new nutdrv_qx, or obsoleted blazer and some others impacts-release-2.8.0 Issues reported against NUT release 2.8.0 (maybe vanilla or with minor packaging tweaks) labels Jul 31, 2023
@jimklimov jimklimov added this to the 2.8.1 milestone Jul 31, 2023
@blaa
Copy link

blaa commented Jul 31, 2023

Hey, waaaait a bit with this. ;-) As I've wrote in the issue - this might break compatibility with other Armac/powermanagerii devices though.

The end of line test is certainly SAFE, but the -1 added to the copying look rather suspicious. I want to test it with my UPS (I'm on vacation and can do this in september ;s), or get a traffic dump from the @bdkacz for whom the previous version worked.

So, blindly merging that can break compatibility. ;s

@jimklimov
Copy link
Member Author

@blaa: I have a question about the logic in this proposed change:

  • previously, you received the bytes_available amount of bytes via tmpbuf populated by usb_interrupt_read(), and either copied them all to buf or saw a zero-sized reply and aborted.
  • now you copy one byte less to buf and drop out if the remaining byte is 0x0d; however you do not copy it into buf if it is something else. And then I think you loop to usb_interrupt_read() again so lose the old tmpbuf contents.

Am I right with a gut feeling that you would lose one character in the middle of a reply from each loop here under the "right" conditions? Should there be an else to that comparison to 0x0d clause?

@jimklimov jimklimov marked this pull request as draft July 31, 2023 11:14
@jimklimov
Copy link
Member Author

Converted to "draft" as asked by @blaa (to avoid merge before regression testing) ;)

memcpy(buf + bufpos, tmpbuf + 1, bytes_available - 1);
bufpos += bytes_available - 1;

if (tmpbuf[bytes_available - 1] == 0x0d) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be safe in all cases... (although it uses the bytes_available as well and it's suspicious that it worked without -1 previously).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least, it is unsigned and non-zero (per previous checks), so array-wise the expression is safe :)

bytes_available = (unsigned char)tmpbuf[0] & 0x0f;
if (bytes_available == 0) {
/* End of transfer */
break;
}

memcpy(buf + bufpos, tmpbuf + 1, bytes_available);
bufpos += bytes_available;
memcpy(buf + bufpos, tmpbuf + 1, bytes_available - 1);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't look like a simple mistake in dropping -1. My UPS rather wouldn't work with it. Maybe it needs some kind of a detection on how this byte is coded. I need traces from both UPSes to make certain it will cover both cases.

Maaaybe the original issue were I did this driver has one...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be #1238 and #1239 I believe.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise, initially copying all of bytes_available in the original loop seemed reasonable. Say the usb method returned one new byte, here now you'd copy 1 - 1 == 0 bytes (and check if that one is end of line)...

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there seem to be only a trace where UPS send 1 byte at a time during each interrupt transfer:

                      [D3] armac_command: Got a known command 'Q1\r'                         
   2.121903     [D3] read: ret 6 buf c1: 28 30 30 30 30  >(0000<                  
   2.126943     [D3] read: ret 6 buf 81: 32 30 30 30 30  >20000<
   2.130978     [D3] read: ret 6 buf c1: 33 30 30 30 30  >30000<
   2.133998     [D3] read: ret 6 buf 81: 38 30 30 30 30  >80000<      
(...)

Here bytes available (low nibble: tmpbuf[0] & 0x0f) seems to be equal to 1 and convey a 1 byte. Interrupt transmits 6 bytes at a time, but the rest is filled with literal ASCII 0. (would be cool if they used ZERO 0x00).

Current trace:

   0.083301     [D4] armac command Q1
   0.164847     [D4] read: ret 6 buf a6: 28 32 34 31 2e  >(241.<
   0.184839     [D4] read: ret 6 buf 86: 35 20 30 30 30  >5 000<
   0.205851     [D4] read: ret 6 buf a6: 2e 30 20 32 33  >.0 23<
   0.226849     [D4] read: ret 6 buf 86: 30 2e 33 20 30  >0.3 0<

The first nibble oscillates between a and 8 - previously c and 8.

But TBH, that seems like an easy fix. I'd assume this 6 means this UPS has available 6 bytes in a buffer. This interrupt transfer transferred 5... it can't transfer more. It's just FASTER than the previous model.

Instead of -1 one could:

bytes_available = (unsigned char)tmpbuf[0] & 0x0f;
if (bytes_available == 0) {
    /* End of transfer */
    break;
}
if (bytes_available > 5) {
    /* More bytes in UPS buffer, but this transfer only handles up to 5. */
    bytes_available = 5;
}

Or even use the 'ret' from usb interrupt in case it can transfer more later?

if (bytes_available > ret - 1) {
    /* More bytes in UPS buffer, but this transfer only handles up to 5. */
    bytes_available = ret - 1;
}

Copy link

@blaa blaa Jul 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My current take is this one:

diff --git a/drivers/nutdrv_qx.c b/drivers/nutdrv_qx.c
index 350c555a0..550eb4071 100644
--- a/drivers/nutdrv_qx.c
+++ b/drivers/nutdrv_qx.c
@@ -1825,6 +1825,9 @@ static int        armac_command(const char *cmd, char *buf, size_t buflen)
                 * (which will happen after successfully writing a command
                 * to the UPS) */
                if (ret != 6) {
+                       /* NOTE: If end condition is invalid for particular UPS we might make one
+                        * request more and get this error. If bufpos > (say) 10 this could be ignored
+                        * and the reply correctly read. */
                        upsdebugx(1,
                                "interrupt read error: %s (%d)",
                                ret ? nut_usb_strerror(ret) : "timeout",
@@ -1838,15 +1841,32 @@ static int      armac_command(const char *cmd, char *buf, size_t buflen)
                        tmpbuf[0], tmpbuf[1], tmpbuf[2], tmpbuf[3], tmpbuf[4], tmpbuf[5],
                        tmpbuf[1], tmpbuf[2], tmpbuf[3], tmpbuf[4], tmpbuf[5]);
 
+               /*
+                * On most tested devices (including R/2000I/PSW) this was equal to the number of
+                * bytes returned in the buffer, but on some newer UPS (R/3000I/PF1) it was 1 more
+                * (1 control + 5 bytes transferred and bytes_available equal to 6 instead of 5).
+                *
+                * Current assumption is that this is number of bytes available on the UPS side
+                * with up to 5 (ret - 1) transferred.
+                */
                bytes_available = (unsigned char)tmpbuf[0] & 0x0f;
                if (bytes_available == 0) {
                        /* End of transfer */
                        break;
                }
+               if (bytes_available > 5) {
+                       /* Single interrupt transfer has 1 control + 5 data bytes */
+                       bytes_available = 5;
+               }
 
                memcpy(buf + bufpos, tmpbuf + 1, bytes_available);
                bufpos += bytes_available;
 
+               if (tmpbuf[bytes_available] == 0x0d) {
+                       /* End of line is an end of the message as well */
+                       break;
+               }
+
                if (bytes_available <= 2) {
                        /* Slow down, let the UPS buffer more bytes */
                        usleep(15000);

It compiles, but could use some testing. ;-) Hopefully it will handle both cases without additional -1 magic.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, looking at the data dumps in #1978 (comment) :

   0.414746     [D4] read: ret 6 buf 86: 2e 30 20 30 30  >.0 00<
   0.435756     [D4] read: ret 6 buf a6: 30 30 30 30 30  >00000<
000<.939819     [D4] read: ret 6 buf 83: 31 0d 30 30 30  >1
   1.940207     [D1] interrupt read error: Operation timed out (-7)

So for a line where we have lower nibble (first byte & 0x0f) equal to 6 and ret==6, we have 5 protocol content bytes. However in the last practical line, with the 0x0d, the nibble is 3 and there are 2 content bytes (and the UPS stops sending more, hence the timeout).

I believe, what the driver did originally, with your UPS where its nibble was 1 it copied one byte into the buffer. Where the nibble was 6 it probably copied 6 bytes (starting from tmpbuf+1 so overflowing the tmpbuf defined with size 6 exactly... wondering why compilers or run-times did not complain/segfault - not a write so don't care?) and probably adding garbage into that extra byte of the resulting string for each loop :\

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I've dropped ret - 1 because 6 is hardcoded above. (Original driver does reads always by 6 bytes). But that's equivalent - if ret != 6 we bail out anyway.

In general we can introduce a constant: const int READ_SIZE = 6;
Use it in usb_interrupt_read, in following if and later instead of ret - it will be cleaner and I can prepare a patch like that.

Architecture-wise, I believe the UPS works like this:
([controller] --uart or rs232-- [usb bridge with buffer]) --usb-- [PC]
And this USB interface is a really poor USB Serial implementation. ;)

So the bytes_available is not a length of a whole serial buffer, just the amount that currently is stored within the usb bridge. MAYBE. As far as we can tell. It's important to differentiate a TRUE data from a fill (0, 0x30), when there's less data then the read size certainly. Would not be necessary if the fill was 0x00. I think deducting 5 is worthless, but we could store info like more_available = true if it's larger than READ_SIZE - 1.

Unsure what to do with this information though. If we find \x0d but more_available == true that's probably an error and we could print a message - not much more though.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, looking at the data dumps in #1978 (comment) :

   0.414746     [D4] read: ret 6 buf 86: 2e 30 20 30 30  >.0 00<
   0.435756     [D4] read: ret 6 buf a6: 30 30 30 30 30  >00000<
000<.939819     [D4] read: ret 6 buf 83: 31 0d 30 30 30  >1
   1.940207     [D1] interrupt read error: Operation timed out (-7)

So for a line where we have lower nibble (first byte & 0x0f) equal to 6 and ret==6, we have 5 protocol content bytes. However in the last practical line, with the 0x0d, the nibble is 3 and there are 2 content bytes (and the UPS stops sending more, hence the timeout).

Watchout for the \r that shifts the 0 in the output in those lines. But yes, nibble is 3 and there are pretty much 2 bytes sent. I forgot about it today - and it suggest I should scan for \d and not just check the last byte.

I believe, what the driver did originally, with your UPS where its nibble was 1 it copied one byte into the buffer. Where the nibble was 6 it probably copied 6 bytes (starting from tmpbuf+1 so overflowing the tmpbuf defined with size 6 exactly... wondering why compilers or run-times did not complain/segfault - not a write so don't care?) and probably adding garbage into that extra byte of the resulting string for each loop :\

Driver when the nibble was 6 could be reading 1 byte over tmpbuf. C compilers and run-times are quirky. :S In Rust that would certainly be a runtime Panic, maybe compilation error. tmpbuf is on the stack, so there's a big chance it was reading 1 byte more from the stack just.

Certainly adding a garbage - that changed the meaning of this nibble in a way not observed in other UPSes.

I'll take a look at this final byte too, maybe it needs special handling to find \r correctly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it will be more productive for you to cherry-pick the changes I've proposed so far and start another PR from your fork to override this one, to avoid me posting your patches from discussion into git :) WDYT?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, certainly. The problem is - testing. I can probably get this changes tested on the new UPS. Trouble is, to check on mine/other ones. When my sister is watering the plants I'll ask her to power it on; maaaybe I could do it remotely. (;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug impacts-release-2.8.0 Issues reported against NUT release 2.8.0 (maybe vanilla or with minor packaging tweaks) Qx protocol driver Driver based on Megatec Q<number> such as new nutdrv_qx, or obsoleted blazer and some others
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Problem with NUTDRV_QX, subdriver armac - unable start driver
2 participants