Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem with NUTDRV_QX, subdriver armac (add EOL support) #2003

Closed
wants to merge 2 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions NEWS
Original file line number Diff line number Diff line change
Expand Up @@ -99,6 +99,8 @@ as part of https://github.com/networkupstools/nut/issues/1410 solution.
- nutdrv_qx updates:
* the `voltronic_qs_protocol` should now accept both "V" (as before)
and newly "H" dialects, which otherwise seem interchangeable [#1623]
* the `armac` subdriver was enhanced to support devices with a different
response pattern than previously expected per initial contribution [#1978]

- usbhid-ups updates:
* cps-hid subdriver now applies same report descriptor fixing logic to
Expand Down
10 changes: 8 additions & 2 deletions drivers/nutdrv_qx.c
Original file line number Diff line number Diff line change
Expand Up @@ -1838,14 +1838,20 @@ static int armac_command(const char *cmd, char *buf, size_t buflen)
tmpbuf[0], tmpbuf[1], tmpbuf[2], tmpbuf[3], tmpbuf[4], tmpbuf[5],
tmpbuf[1], tmpbuf[2], tmpbuf[3], tmpbuf[4], tmpbuf[5]);

/* This includes status/length byte on R/3000I/PF1 */
bytes_available = (unsigned char)tmpbuf[0] & 0x0f;
if (bytes_available == 0) {
/* End of transfer */
break;
}

memcpy(buf + bufpos, tmpbuf + 1, bytes_available);
bufpos += bytes_available;
memcpy(buf + bufpos, tmpbuf + 1, bytes_available - 1);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't look like a simple mistake in dropping -1. My UPS rather wouldn't work with it. Maybe it needs some kind of a detection on how this byte is coded. I need traces from both UPSes to make certain it will cover both cases.

Maaaybe the original issue were I did this driver has one...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be #1238 and #1239 I believe.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise, initially copying all of bytes_available in the original loop seemed reasonable. Say the usb method returned one new byte, here now you'd copy 1 - 1 == 0 bytes (and check if that one is end of line)...

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there seem to be only a trace where UPS send 1 byte at a time during each interrupt transfer:

                      [D3] armac_command: Got a known command 'Q1\r'                         
   2.121903     [D3] read: ret 6 buf c1: 28 30 30 30 30  >(0000<                  
   2.126943     [D3] read: ret 6 buf 81: 32 30 30 30 30  >20000<
   2.130978     [D3] read: ret 6 buf c1: 33 30 30 30 30  >30000<
   2.133998     [D3] read: ret 6 buf 81: 38 30 30 30 30  >80000<      
(...)

Here bytes available (low nibble: tmpbuf[0] & 0x0f) seems to be equal to 1 and convey a 1 byte. Interrupt transmits 6 bytes at a time, but the rest is filled with literal ASCII 0. (would be cool if they used ZERO 0x00).

Current trace:

   0.083301     [D4] armac command Q1
   0.164847     [D4] read: ret 6 buf a6: 28 32 34 31 2e  >(241.<
   0.184839     [D4] read: ret 6 buf 86: 35 20 30 30 30  >5 000<
   0.205851     [D4] read: ret 6 buf a6: 2e 30 20 32 33  >.0 23<
   0.226849     [D4] read: ret 6 buf 86: 30 2e 33 20 30  >0.3 0<

The first nibble oscillates between a and 8 - previously c and 8.

But TBH, that seems like an easy fix. I'd assume this 6 means this UPS has available 6 bytes in a buffer. This interrupt transfer transferred 5... it can't transfer more. It's just FASTER than the previous model.

Instead of -1 one could:

bytes_available = (unsigned char)tmpbuf[0] & 0x0f;
if (bytes_available == 0) {
    /* End of transfer */
    break;
}
if (bytes_available > 5) {
    /* More bytes in UPS buffer, but this transfer only handles up to 5. */
    bytes_available = 5;
}

Or even use the 'ret' from usb interrupt in case it can transfer more later?

if (bytes_available > ret - 1) {
    /* More bytes in UPS buffer, but this transfer only handles up to 5. */
    bytes_available = ret - 1;
}

Copy link

@blaa blaa Jul 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My current take is this one:

diff --git a/drivers/nutdrv_qx.c b/drivers/nutdrv_qx.c
index 350c555a0..550eb4071 100644
--- a/drivers/nutdrv_qx.c
+++ b/drivers/nutdrv_qx.c
@@ -1825,6 +1825,9 @@ static int        armac_command(const char *cmd, char *buf, size_t buflen)
                 * (which will happen after successfully writing a command
                 * to the UPS) */
                if (ret != 6) {
+                       /* NOTE: If end condition is invalid for particular UPS we might make one
+                        * request more and get this error. If bufpos > (say) 10 this could be ignored
+                        * and the reply correctly read. */
                        upsdebugx(1,
                                "interrupt read error: %s (%d)",
                                ret ? nut_usb_strerror(ret) : "timeout",
@@ -1838,15 +1841,32 @@ static int      armac_command(const char *cmd, char *buf, size_t buflen)
                        tmpbuf[0], tmpbuf[1], tmpbuf[2], tmpbuf[3], tmpbuf[4], tmpbuf[5],
                        tmpbuf[1], tmpbuf[2], tmpbuf[3], tmpbuf[4], tmpbuf[5]);
 
+               /*
+                * On most tested devices (including R/2000I/PSW) this was equal to the number of
+                * bytes returned in the buffer, but on some newer UPS (R/3000I/PF1) it was 1 more
+                * (1 control + 5 bytes transferred and bytes_available equal to 6 instead of 5).
+                *
+                * Current assumption is that this is number of bytes available on the UPS side
+                * with up to 5 (ret - 1) transferred.
+                */
                bytes_available = (unsigned char)tmpbuf[0] & 0x0f;
                if (bytes_available == 0) {
                        /* End of transfer */
                        break;
                }
+               if (bytes_available > 5) {
+                       /* Single interrupt transfer has 1 control + 5 data bytes */
+                       bytes_available = 5;
+               }
 
                memcpy(buf + bufpos, tmpbuf + 1, bytes_available);
                bufpos += bytes_available;
 
+               if (tmpbuf[bytes_available] == 0x0d) {
+                       /* End of line is an end of the message as well */
+                       break;
+               }
+
                if (bytes_available <= 2) {
                        /* Slow down, let the UPS buffer more bytes */
                        usleep(15000);

It compiles, but could use some testing. ;-) Hopefully it will handle both cases without additional -1 magic.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, looking at the data dumps in #1978 (comment) :

   0.414746     [D4] read: ret 6 buf 86: 2e 30 20 30 30  >.0 00<
   0.435756     [D4] read: ret 6 buf a6: 30 30 30 30 30  >00000<
000<.939819     [D4] read: ret 6 buf 83: 31 0d 30 30 30  >1
   1.940207     [D1] interrupt read error: Operation timed out (-7)

So for a line where we have lower nibble (first byte & 0x0f) equal to 6 and ret==6, we have 5 protocol content bytes. However in the last practical line, with the 0x0d, the nibble is 3 and there are 2 content bytes (and the UPS stops sending more, hence the timeout).

I believe, what the driver did originally, with your UPS where its nibble was 1 it copied one byte into the buffer. Where the nibble was 6 it probably copied 6 bytes (starting from tmpbuf+1 so overflowing the tmpbuf defined with size 6 exactly... wondering why compilers or run-times did not complain/segfault - not a write so don't care?) and probably adding garbage into that extra byte of the resulting string for each loop :\

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I've dropped ret - 1 because 6 is hardcoded above. (Original driver does reads always by 6 bytes). But that's equivalent - if ret != 6 we bail out anyway.

In general we can introduce a constant: const int READ_SIZE = 6;
Use it in usb_interrupt_read, in following if and later instead of ret - it will be cleaner and I can prepare a patch like that.

Architecture-wise, I believe the UPS works like this:
([controller] --uart or rs232-- [usb bridge with buffer]) --usb-- [PC]
And this USB interface is a really poor USB Serial implementation. ;)

So the bytes_available is not a length of a whole serial buffer, just the amount that currently is stored within the usb bridge. MAYBE. As far as we can tell. It's important to differentiate a TRUE data from a fill (0, 0x30), when there's less data then the read size certainly. Would not be necessary if the fill was 0x00. I think deducting 5 is worthless, but we could store info like more_available = true if it's larger than READ_SIZE - 1.

Unsure what to do with this information though. If we find \x0d but more_available == true that's probably an error and we could print a message - not much more though.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, looking at the data dumps in #1978 (comment) :

   0.414746     [D4] read: ret 6 buf 86: 2e 30 20 30 30  >.0 00<
   0.435756     [D4] read: ret 6 buf a6: 30 30 30 30 30  >00000<
000<.939819     [D4] read: ret 6 buf 83: 31 0d 30 30 30  >1
   1.940207     [D1] interrupt read error: Operation timed out (-7)

So for a line where we have lower nibble (first byte & 0x0f) equal to 6 and ret==6, we have 5 protocol content bytes. However in the last practical line, with the 0x0d, the nibble is 3 and there are 2 content bytes (and the UPS stops sending more, hence the timeout).

Watchout for the \r that shifts the 0 in the output in those lines. But yes, nibble is 3 and there are pretty much 2 bytes sent. I forgot about it today - and it suggest I should scan for \d and not just check the last byte.

I believe, what the driver did originally, with your UPS where its nibble was 1 it copied one byte into the buffer. Where the nibble was 6 it probably copied 6 bytes (starting from tmpbuf+1 so overflowing the tmpbuf defined with size 6 exactly... wondering why compilers or run-times did not complain/segfault - not a write so don't care?) and probably adding garbage into that extra byte of the resulting string for each loop :\

Driver when the nibble was 6 could be reading 1 byte over tmpbuf. C compilers and run-times are quirky. :S In Rust that would certainly be a runtime Panic, maybe compilation error. tmpbuf is on the stack, so there's a big chance it was reading 1 byte more from the stack just.

Certainly adding a garbage - that changed the meaning of this nibble in a way not observed in other UPSes.

I'll take a look at this final byte too, maybe it needs special handling to find \r correctly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it will be more productive for you to cherry-pick the changes I've proposed so far and start another PR from your fork to override this one, to avoid me posting your patches from discussion into git :) WDYT?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, certainly. The problem is - testing. I can probably get this changes tested on the new UPS. Trouble is, to check on mine/other ones. When my sister is watering the plants I'll ask her to power it on; maaaybe I could do it remotely. (;

bufpos += bytes_available - 1;

if (tmpbuf[bytes_available - 1] == 0x0d) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be safe in all cases... (although it uses the bytes_available as well and it's suspicious that it worked without -1 previously).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least, it is unsigned and non-zero (per previous checks), so array-wise the expression is safe :)

/* End of line is an end of the message as well */
break;
}

if (bytes_available <= 2) {
/* Slow down, let the UPS buffer more bytes */
Expand Down