Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

select() error 87 parameter is incorrect #113

Open
swarwick opened this issue Nov 20, 2018 · 22 comments
Open

select() error 87 parameter is incorrect #113

swarwick opened this issue Nov 20, 2018 · 22 comments

Comments

@swarwick
Copy link

(fragment deploy): 2018-11-20 10:35:33.118000-0600 [22508:COM3] ERROR com.tibco.ep.dtm.stderr: log: < select(1,[0],[],[],jtermios.TimeVal@64242d75) => 1
(fragment deploy): 2018-11-20 10:35:33.118000-0600 [22508:COM3] ERROR com.tibco.ep.dtm.stderr: log: < ioctl(0,1074030207,[[1]]) => 0
(fragment deploy): 2018-11-20 10:35:33.118000-0600 [22508:COM3] ERROR com.tibco.ep.dtm.stderr: log: < select(1,[0],[],[],jtermios.TimeVal@24c9f3b7) => 1
(fragment deploy): 2018-11-20 10:35:33.119000-0600 [22508:COM3] ERROR com.tibco.ep.dtm.stderr: log: < read(0,[1024,0x30,0x30,0x0D,0x0A,0x30,0x4D,0x0A,0x30...],1024) => 1
(fragment deploy): 2018-11-20 10:35:33.119000-0600 [22508:COM3] ERROR com.tibco.ep.dtm.stderr: log: < ioctl(0,1074030207,[[0]]) => 0
(fragment deploy): 2018-11-20 10:35:33.119000-0600 [22508:COM3] ERROR com.tibco.ep.dtm.stderr: log: < select(1,[0],[],[],jtermios.TimeVal@64242d75) => 1
(fragment deploy): 2018-11-20 10:35:33.119000-0600 [22508:COM3] ERROR com.tibco.ep.dtm.stderr: log: < ioctl(0,1074030207,[[1]]) => 0
(fragment deploy): 2018-11-20 10:35:33.119000-0600 [22508:COM3] ERROR com.tibco.ep.dtm.stderr: log: < select(1,[0],[],[],jtermios.TimeVal@24c9f3b7) => 1
(fragment deploy): 2018-11-20 10:35:33.119000-0600 [22508:COM3] ERROR com.tibco.ep.dtm.stderr: log: < read(0,[1024,0x2E,0x30,0x0D,0x0A,0x30,0x4D,0x0A,0x30...],1024) => 1
(fragment deploy): 2018-11-20 10:35:33.119000-0600 [22508:COM3] ERROR com.tibco.ep.dtm.stderr: log: < ioctl(0,1074030207,[[0]]) => 0
(fragment deploy): 2018-11-20 10:35:33.120000-0600 [22508:COM3] ERROR com.tibco.ep.dtm.stderr: log: < select(1,[0],[],[],jtermios.TimeVal@64242d75) => 1
(fragment deploy): 2018-11-20 10:35:33.120000-0600 [22508:COM3] ERROR com.tibco.ep.dtm.stderr: log: < ioctl(0,1074030207,[[1]]) => 0
(fragment deploy): 2018-11-20 10:35:33.121000-0600 [22508:COM3] ERROR com.tibco.ep.dtm.stderr: log: < select(1,[0],[],[],jtermios.TimeVal@24c9f3b7) => 1
(fragment deploy): 2018-11-20 10:35:33.121000-0600 [22508:COM3] ERROR com.tibco.ep.dtm.stderr: log: < read(0,[1024,0x30,0x30,0x0D,0x0A,0x30,0x4D,0x0A,0x30...],1024) => 1
(fragment deploy): 2018-11-20 10:35:33.121000-0600 [22508:COM3] ERROR com.tibco.ep.dtm.stderr: log: < ioctl(0,1074030207,[[0]]) => 0
(fragment deploy): 2018-11-20 10:35:33.122000-0600 [22508:COM3] ERROR com.tibco.ep.dtm.stderr: log: < select(1,[0],[],[],jtermios.TimeVal@64242d75) => 1
(fragment deploy): 2018-11-20 10:35:33.122000-0600 [22508:COM3] ERROR com.tibco.ep.dtm.stderr: log: < ioctl(0,1074030207,[[2]]) => 0
(fragment deploy): 2018-11-20 10:35:33.123000-0600 [22508:COM3] ERROR com.tibco.ep.dtm.stderr: log: < select(1,[0],[],[],jtermios.TimeVal@24c9f3b7) => 1
(fragment deploy): 2018-11-20 10:35:33.123000-0600 [22508:COM3] ERROR com.tibco.ep.dtm.stderr: log: < read(0,[1024,0x30,0x30,0x0D,0x0A,0x30,0x4D,0x0A,0x30...],1024) => 2
(fragment deploy): 2018-11-20 10:35:33.123000-0600 [22508:COM3] ERROR com.tibco.ep.dtm.stderr: log: < ioctl(0,1074030207,[[0]]) => 0
(fragment deploy): 2018-11-20 10:35:33.124000-0600 [22508:COM3] ERROR com.tibco.ep.dtm.stderr: log: < select(1,[0],[],[],jtermios.TimeVal@64242d75) => 1
(fragment deploy): 2018-11-20 10:35:33.124000-0600 [22508:COM3] ERROR com.tibco.ep.dtm.stderr: log: < ioctl(0,1074030207,[[1]]) => 0
(fragment deploy): 2018-11-20 10:35:33.124000-0600 [22508:COM3] ERROR com.tibco.ep.dtm.stderr: log: < select(1,[0],[],[],jtermios.TimeVal@24c9f3b7) => 1
(fragment deploy): 2018-11-20 10:35:33.124000-0600 [22508:COM3] ERROR com.tibco.ep.dtm.stderr: log: < read(0,[1024,0x30,0x30,0x0D,0x0A,0x30,0x4D,0x0A,0x30...],1024) => 1
(fragment deploy): 2018-11-20 10:35:33.124000-0600 [22508:COM3] ERROR com.tibco.ep.dtm.stderr: log: < ioctl(0,1074030207,[[0]]) => 0
(fragment deploy): 2018-11-20 10:35:33.125000-0600 [22508:COM3] ERROR com.tibco.ep.dtm.stderr: log: fail() class jtermios.windows.JTermiosImpl line 924, Windows GetLastError()= 87, The parameter is incorrect.
(fragment deploy): 2018-11-20 10:35:33.125000-0600 [22508:COM3] ERROR com.tibco.ep.dtm.stderr:
(fragment deploy): 2018-11-20 10:35:33.171000-0600 [22508:COM3] ERROR com.tibco.ep.dtm.stderr: jtermios.windows.JTermiosImpl$Fail
(fragment deploy): 2018-11-20 10:35:33.171000-0600 [22508:COM3] ERROR com.tibco.ep.dtm.stderr: at jtermios.windows.JTermiosImpl$Port.fail(JTermiosImpl.java:100)
(fragment deploy): 2018-11-20 10:35:33.171000-0600 [22508:COM3] ERROR com.tibco.ep.dtm.stderr: at jtermios.windows.JTermiosImpl.select(JTermiosImpl.java:924)
(fragment deploy): 2018-11-20 10:35:33.171000-0600 [22508:COM3] ERROR com.tibco.ep.dtm.stderr: at jtermios.JTermios.select(JTermios.java:473)
(fragment deploy): 2018-11-20 10:35:33.171000-0600 [22508:COM3] ERROR com.tibco.ep.dtm.stderr: at purejavacomm.PureJavaSerialPort$3.run(PureJavaSerialPort.java:1225)
(fragment deploy): 2018-11-20 10:35:33.171000-0600 [22508:COM3] ERROR com.tibco.ep.dtm.stderr: at java.lang.Thread.run(Thread.java:748)
(fragment deploy): 2018-11-20 10:35:33.171000-0600 [22508:COM3] ERROR com.tibco.ep.dtm.stderr: log: < select(1,[],[],[],jtermios.TimeVal@64242d75) => -1
(fragment deploy): 2018-11-20 10:35:33.171000-0600 [22508:COM3] ERROR com.tibco.ep.dtm.stderr: log: < errno() => 24
(fragment deploy): 2018-11-20 10:35:33.171000-0600 [22508:COM3] ERROR com.tibco.ep.dtm.stderr: log: select() or poll() returned -1, errno 24
(fragment deploy): 2018-11-20 10:35:33.171000-0600 [22508:COM3] ERROR com.tibco.ep.dtm.stderr: log: < fcntl(0, 3, 0) => 131074
(fragment deploy): 2018-11-20 10:35:33.171000-0600 [22508:COM3] ERROR com.tibco.ep.dtm.stderr: log: < fcntl(0, 4, 131078) => 0
(fragment deploy): 2018-11-20 10:35:33.221000-0600 [22508:COM3] ERROR com.tibco.ep.dtm.stderr: log: < close(0) => 0

Connecting to COMM3 on a Windows 10 machine is randomly getting error 87 The parameter is incorrect error messages. Also there does not seem to be a good way to recover from this error as the line

n = select(m_FD + 1, rset, wset, null, m_HaveNudgePipe ? null : timeout);
In PureJavaSerialPort.java is throwing a Fail exception which is cause but just kills the thread and no report of that error is given downstream to the end user of the library.

@swarwick
Copy link
Author

Note this is running with

com.github.purejavacomm purejavacomm 1.0.2.RELEASE

but it seems like the code in this repo is still at 1.0.1-RELEASE and I do not see a branch of otherwise for 1.0.2-RELEASE so where is that code?

@nyholku
Copy link
Owner

nyholku commented Nov 20, 2018

Hi, thanks for reporting. As to code: There seems to be some mistake (me and Maven are not even at nodding terms, I only push stuff to Maven cause some people 'insist' on it). I will try to rectify this within two weeks.

As to the error I will investigate but these random failures on random machine are hard. I will check if I could just sweep this under the carpet by making the use of select tolerant to this problem, i.e. if select() throws on exception we could just re-try it.

@swarwick
Copy link
Author

From looking around at the error description for other projects on windows it looks like the 'GetOverlappedResult' call is supposed to be called with a zero'd out structure. Its not clear to me that is happening via java but I have not dug into enough to know. Error 87 could also could just be flaky hardware, I respect the fact it's very difficult to try and debug this when its probably almost impossible to recreate as well.

If it helps at all the USB to serial dongle being used is

Sabrent USB 2.0 to Serial (9-Pin) DB-9 RS-232 Converter Cable, Prolific Chipset, Hexnuts, [Windows 10/8.1/8/7/VISTA/XP, Mac OS X 10.6 and Above] 2.5 Feet (CB-DB9P)

On a side note, I get the maven issues, we use maven and it drives me bonkers most days. Even with that though is there an issue with GitHub and pushing the code up so the rest of us can try and help debug thing when an issue arises?

@swarwick
Copy link
Author

if select() throws on exception we could just re-try it. Maybe retry X number of times (X being a system variable you can set) and if that still fails then give some indication to the upper level application via an exception or callback/event?

Maybe inside the Runnable
} catch (InterruptedException ie) { } finally { m_ThreadRunning = false; }
add something like notifyOnReadWriteError(Exception e)

At least then the upper level application can spit something out in their own log and or try to do some issue management at runtime.

@nyholku
Copy link
Owner

nyholku commented Nov 21, 2018

The GetOverlappedResult should be zero by default. Also because the error is rare and random I don't think that is the issue. More likely I think this has to do with some sort of synchronisation issue i.e. the OVERLAPPED structure is re-used all the time and at some point OS changes that when PJC is not expecting that or vice versa and the Java side copy of the structure gets out of sync with the OS managed real memory and then it fails.

No there is no reason not to push stuff to GitHub once I have time, now very busy for two weeks.

@swarwick
Copy link
Author

Are you ok with the idea of adding a new callback as well so the resulting application that uses the library can be notified if there is a unrecoverable failure?

@nyholku
Copy link
Owner

nyholku commented Nov 27, 2018

I might, though using the user would break compatibility with the original JavaComm (RIP) and RXTX. But if the user don't mind or does not use it then it would no be a problem for him/her.

@swarwick
Copy link
Author

The only other way I see around this is to keep polling for
((PureJavaSerialPort)serialPort).isInternalThreadRunning()
But that method even as named really should not be used, so there should be same way to alert the end user that an unrecoverable failure occurred. If you can think of a better way than a callback or using the existing event system that would be great as well. I am trying to build something that can be a little robust for even things like someone tripping over a cord and it gets unplugged, I want to be able to detect the error and close the serial port and retry X number of times to open it. Currently I only get the message in the log but no real other way I can see to check if the serial port has completely failed.

@swarwick
Copy link
Author

I did find another way around this without using the internal method but it still doesnt sit well with me. I started to poll and check the getReceiveTimeout() method which calls "checkState" first and throws PureJavaIllegalStateException if the serial port is in a bad state. I feel like polling is not the best way to go with this though, but at this point I dont see any other alternative, especially if I only read and never write, since if there is no data I will not get a data ready event.

@nyholku
Copy link
Owner

nyholku commented Nov 27, 2018

We are talking about two different things here.

This Issue started from the error 87 which ideally should not happen as it is basically on indication of some sort of bug or logic error in the PJC code. And ideally we would find the error and fix it. As this is difficult given that I cannot reproduce this we discussed re-trying the select() and if the error is transient then this would solve the issue.

Now you brought into to discussion detecting and/or recovering from fatal errors via a callback or some such. This could added as I indicated. Weather or not it would solve the issue if re-trying the select does not work is another matter. And detecting the catastrophic failure is also problematic.

My first stab at this would be to add a re-try to the select() and do that internally in the Windows backend where the select() is implemented. So first find out where exactly the in jtermios.windows.JTermiosImpl.select() the error is first detected and modify the call to retry the failed call or call sequence there.

@swarwick
Copy link
Author

yes you are correct sorry I did start to blend the two issues together. I will create another issue to track the fail/recovery by user scenario.

@derikd
Copy link

derikd commented Nov 27, 2018 via email

@nyholku
Copy link
Owner

nyholku commented Nov 27, 2018 via email

@swarwick
Copy link
Author

Could this be an issue that GetOverlappedResult is returning ERROR_IO_INCOMPLETE but the code is not checking for that? It looks like select checks for that but write doesnt (although it seems to later down the code but doesnt seem like it would reacht that code path).

@nyholku
Copy link
Owner

nyholku commented Nov 28, 2018

Could be, if you can debug that, it would be great!

@swarwick
Copy link
Author

I am having a hard time debugging this myself as well (I found the 1.0.2 source jar and I am trying to use that but not 100% sure its valid). Also this is happening at a customer site and really I have to do old school debugging with logs as I do not have direct access to the hardware.

I was curious what you thought on line 838
if (!GetOverlappedResult(port.m_Comm, port.m_SelOVL, port.m_SelN, false))
if (GetLastError() != ERROR_IO_INCOMPLETE)
port.fail();
After that it current checks if error code is ERROR_IO_INCOMPLETE, should it also check for ERROR_INVALID_PARAMETER?
if (!GetOverlappedResult(port.m_Comm, port.m_SelOVL, port.m_SelN, false))
if (GetLastError() != ERROR_IO_INCOMPLETE && GetLastError() != ERROR_INVALID_PARAMETER)
port.fail();

@nyholku
Copy link
Owner

nyholku commented Nov 28, 2018

I don't think that is correct approach although it might sweep the problem under the carpet. The issue here is that (most likely) the OVERLAPPED structure is still in use by previous invocation of whatever operation my code used it for last time. So I think we should just loop there

while (GetOverlappedResult() || GetLastError()== ERROR_IO_INCOMPLETE || GetLastError() == ERROR_INVALID_PARAMETER)

@swarwick
Copy link
Author

I was given access to the box that has the error and was able to debug with live data. I got the error with existing code of:

if (!GetOverlappedResult(port.m_Comm, port.m_SelOVL, port.m_SelN, false)) {
if (GetLastError() != ERROR_IO_INCOMPLETE)
port.fail();
}

Changing the code to

if (!GetOverlappedResult(port.m_Comm, port.m_SelOVL, port.m_SelN, false)) {
if (GetLastError() != ERROR_IO_INCOMPLETE && GetLastError() != ERROR_INVALID_PARAMETER)
port.fail();
if (GetLastError() == ERROR_INVALID_PARAMETER) {
log = log && log(1, "Invalid parameter event\n");
}
}

Resolved the issue and let the software keep running and I did see log event for invalid parameter still but it skipped over and kept getting data.

@swarwick
Copy link
Author

it is a rare event, it happened 2 times over 5-6 mins

@swarwick
Copy link
Author

After awhile of running the machine blue screen of death appears so I am not sure if thats from this fix or some other issue but thought I would mention it.

@swarwick
Copy link
Author

So interesting note to anyone else that might encounter this issue. The cable being used was a prolific USB to serial cable. We switched over to a FDTI chipset cable and the issue went away.

@derikd
Copy link

derikd commented Nov 30, 2018 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants