Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reader + Processor threads #55

Open
dangelcz opened this issue Jul 13, 2018 · 11 comments
Open

Reader + Processor threads #55

dangelcz opened this issue Jul 13, 2018 · 11 comments

Comments

@dangelcz
Copy link

dangelcz commented Jul 13, 2018

Hi,
first of all I want to thank you for this great api implementation, I really appreciate it.

I'm using your api for managing devices over small city network (about 1500 devices). For that purpouse, I wrote small java program, that provides batch changes. Problem I got is about closing/terminating your Reader and Processor threads. After running one of my jobs, which checks configuration of 739 devices, there are 216 threads alive just before program ends, which is problem, because my monitoring server runs out of resources. Here you can see screen captured from VisualVM on my development PC
image
For sending commands through your api, I am using following code
image

I am suspicious of that problem occures when there is some connection problem like ip unreachable, timeout or unable to login. Thats where some inner exception occures and after that, close method does not work properly and threads stuck on reading from empty stream.

Thanks in advance for your help.

@GideonLeGrange
Copy link
Owner

I am suspicious of that problem occures when there is some connection problem like ip unreachable, timeout or unable to login. Thats where some inner exception occures and after that, close method does not work properly and threads stuck on reading from empty stream.

I'm trying to work this out but am having a hard time reproducing the error.

  • IP unreachable won't do this - the connection fails before the threads are started
  • Timeout on creating the connection won't do this - same as above
  • Login problem - I tested this (wrote a loop that creates a connection and tries to log in with bad credentials) and it is not here

I think possible causes should be a form of timeout after a successful login, or a command that fails in an unexpected way (and basically 'hangs' the connection until it times out).

Questions:

Can you please provide me with the following:

  • An example of what is in cmd in your example above?
  • An example of a stack trace for the exception caught in line 68

@dangelcz
Copy link
Author

dangelcz commented Jul 17, 2018

Hi,
sorry for long reply, I am taking vacation at the moment.
With VisualVM I tried to take threads snapshot at the end of execution. There are methods where Processor and Readers stucks (hoping that I understand it well).

Processors
processor

Readers
reader

Answer for your question

Commands, that are being used in current job are
/system/resource/print
/system/routerboard/print

Exception messages that occures at line 68

  • cannot log in
  • Remote host closed connection during handshake
  • Command timed out after 10000 ms (10 seconds is my custom timeout)

Stack traces:

me.legrange.mikrotik.MikrotikApiException: cannot log in
	at me.legrange.mikrotik.impl.ApiConnectionImpl$SyncListener.getResults(ApiConnectionImpl.java:472)
	at me.legrange.mikrotik.impl.ApiConnectionImpl$SyncListener.access$100(ApiConnectionImpl.java:426)
	at me.legrange.mikrotik.impl.ApiConnectionImpl.execute(ApiConnectionImpl.java:110)
	at me.legrange.mikrotik.impl.ApiConnectionImpl.execute(ApiConnectionImpl.java:68)
	at me.legrange.mikrotik.impl.ApiConnectionImpl.login(ApiConnectionImpl.java:63)
	at libs.mikrotik.MikroticApi.prepareConnection(MikroticApi.java:85)
	at libs.mikrotik.MikroticApi.commandBatch(MikroticApi.java:64)
	at libs.mikrotik.MikroticApi.processCmd(MikroticApi.java:27)
	at libs.mikrotik.MikroticApi.processCmd(MikroticApi.java:33)
	at main.logic.JobExecutor.loadDevice(JobExecutor.java:66)
	at execution.ScanUserJobExecutor.scanDevice(ScanUserJobExecutor.java:74)
	at execution.ScanUserJobExecutor.executeJob(ScanUserJobExecutor.java:98)
	at execution.JobTypes.executeJob(JobTypes.java:28)
	at main.logic.JobExecutor.executeJob(JobExecutor.java:55)
	at main.Main.executeJob(Main.java:144)
	at main.MainThreadPool$ThreadWorker.run(MainThreadPool.java:135)

me.legrange.mikrotik.ApiConnectionException: Remote host closed connection during handshake
	at me.legrange.mikrotik.impl.ApiConnectionImpl.execute(ApiConnectionImpl.java:122)
	at me.legrange.mikrotik.impl.ApiConnectionImpl.execute(ApiConnectionImpl.java:109)
	at me.legrange.mikrotik.impl.ApiConnectionImpl.execute(ApiConnectionImpl.java:68)
	at me.legrange.mikrotik.impl.ApiConnectionImpl.login(ApiConnectionImpl.java:58)
	at libs.mikrotik.MikroticApi.prepareConnection(MikroticApi.java:85)
	at libs.mikrotik.MikroticApi.commandBatch(MikroticApi.java:64)
	at libs.mikrotik.MikroticApi.processCmd(MikroticApi.java:27)
	at libs.mikrotik.MikroticApi.processCmd(MikroticApi.java:33)
	at main.logic.JobExecutor.loadDevice(JobExecutor.java:66)
	at execution.ScanUserJobExecutor.scanDevice(ScanUserJobExecutor.java:74)
	at execution.ScanUserJobExecutor.executeJob(ScanUserJobExecutor.java:98)
	at execution.JobTypes.executeJob(JobTypes.java:28)
	at main.logic.JobExecutor.executeJob(JobExecutor.java:55)
	at main.Main.executeJob(Main.java:144)
	at main.MainThreadPool$ThreadWorker.run(MainThreadPool.java:135)
Caused by: javax.net.ssl.SSLHandshakeException: Remote host closed connection during handshake
	at sun.security.ssl.SSLSocketImpl.readRecord(Unknown Source)
	at sun.security.ssl.SSLSocketImpl.performInitialHandshake(Unknown Source)
	at sun.security.ssl.SSLSocketImpl.writeRecord(Unknown Source)
	at sun.security.ssl.AppOutputStream.write(Unknown Source)
	at sun.security.ssl.AppOutputStream.write(Unknown Source)
	at java.io.DataOutputStream.write(Unknown Source)
	at me.legrange.mikrotik.impl.Util.encode(Util.java:140)
	at me.legrange.mikrotik.impl.Util.write(Util.java:25)
	at me.legrange.mikrotik.impl.ApiConnectionImpl.execute(ApiConnectionImpl.java:118)
	... 14 more
Caused by: java.io.EOFException: SSL peer shut down incorrectly
	at sun.security.ssl.InputRecord.read(Unknown Source)
	... 23 more

me.legrange.mikrotik.MikrotikApiException: Command timed out after 10000 ms
	at me.legrange.mikrotik.impl.ApiConnectionImpl$SyncListener.getResults(ApiConnectionImpl.java:472)
	at me.legrange.mikrotik.impl.ApiConnectionImpl$SyncListener.access$100(ApiConnectionImpl.java:426)
	at me.legrange.mikrotik.impl.ApiConnectionImpl.execute(ApiConnectionImpl.java:110)
	at me.legrange.mikrotik.impl.ApiConnectionImpl.execute(ApiConnectionImpl.java:68)
	at me.legrange.mikrotik.impl.ApiConnectionImpl.login(ApiConnectionImpl.java:63)
	at libs.mikrotik.MikroticApi.prepareConnection(MikroticApi.java:85)
	at libs.mikrotik.MikroticApi.commandBatch(MikroticApi.java:64)
	at libs.mikrotik.MikroticApi.processCmd(MikroticApi.java:27)
	at libs.mikrotik.MikroticApi.processCmd(MikroticApi.java:33)
	at main.logic.JobExecutor.loadDevice(JobExecutor.java:66)
	at execution.ScanUserJobExecutor.scanDevice(ScanUserJobExecutor.java:74)
	at execution.ScanUserJobExecutor.executeJob(ScanUserJobExecutor.java:98)
	at execution.JobTypes.executeJob(JobTypes.java:28)
	at main.logic.JobExecutor.executeJob(JobExecutor.java:55)
	at main.Main.executeJob(Main.java:144)
	at main.MainThreadPool$ThreadWorker.run(MainThreadPool.java:135)
Caused by: me.legrange.mikrotik.ApiConnectionException: Command timed out after 10000 ms
	at me.legrange.mikrotik.impl.ApiConnectionImpl$SyncListener.getResults(ApiConnectionImpl.java:464)
	... 15 more

@dangelcz
Copy link
Author

I am using 8 threads that takes list of ips and for each one they call this method (with some additional logic of loading and saving Device object from database).

public static void loadDevice(Device device, String ip, String login, String pwd)
{
	String rscCmd = MikrotikCommands.resources();
	Map<String, String> response = MikroticApi.processCmd(rscCmd, ip, login, pwd);

	device.ros_version = response.get("version");
	device.architecture = response.get("architecture-name");

	String rtbCmd = MikrotikCommands.routerboard();
	response = MikroticApi.processCmd(rtbCmd, ip, login, pwd);

	device.current_firmware = response.get("current-firmware");
	device.serial_number = response.get("serial-number");
	device.upgrade_firmware = response.get("upgrade-firmware");
}

@GideonLeGrange
Copy link
Owner

GideonLeGrange commented Jul 27, 2018 via email

@GideonLeGrange
Copy link
Owner

I know what is wrong and will fix it. Not an easy fix though, may require some internal re-write.

@dangelcz
Copy link
Author

dangelcz commented Aug 1, 2018

Hi, I managed to repair problems myself at the weekend. Its working well, but it looks more like hack than nice architecture repair. I’m still busy and have no time to tell you full details. I’ll try to create pull request tommorow so you can check my changes.

So, main problem seems to be, that Reader thread stucks on in.read() in Util.decode() method and later becomes zombie thread, which cannot be killed or stopped. I still don’t know how is it possible, because in Connection.close() I addeed reader.join() which goes through without any problems, so reader should have ended, but in some cases don't.
This behaviour causes two next problems:

  1. eternal loop in Processor.unpack() which returns empty string that throws Exception which is being caught and unpack is called again in parent while cycle.
  2. stucking Processor on take method of blocking queue (because Reader wont put there data anymore)

Main changes I made:

  1. Merged processor and reader to be executed in one thread
  2. With that, I was able to replace blocking queue by standard linked list, because there is no need to block now.
  3. In reader after decode added condition to end, if received string has zero length.
  4. Last one is a bit experimenting and maybe not all changes are nessesary. In Connection.close(), replaced thread.interrupt by join, closed streams and socket separately (to be sure, that are really closed) and after that assigned them to null, to be sure, that there is no reference so garbage collector can clear them. Same made with reader reference
    reader.join(); reader = null;

There are some more changes, but those are mentionable. Maybe, you' ll find some clue for your analysis and patch.

@GideonLeGrange
Copy link
Owner

GideonLeGrange commented Aug 2, 2018

You are correct, the problem is that in.read() blocks. What really should happen is that in.read() should fail.

I've committed a very simple patch that sets the soTimeout on the socket for the TCP connection. This should have been done, but it wasn't, so the setTimeout() call on the API was not as effective as it could have been.

Can you please try the following:

  • Pull and build branch issue-55
  • Add a ApiConnection.setTimeout() call to your code. I suggest a short but sensible timeout, something like 20000 ms

Maybe it helps you. If it works better, I will push that fix to master.

Gideon

@dangelcz
Copy link
Author

dangelcz commented Aug 2, 2018

Hi, I made some tests with this branch, set timeout to 20 seconds, but there is no difference. Threads stuck at the same places as before even if I wait more than 2 minutes after main procedure completes (I removed setDeamon(), so VM won't cancel)

I think, that reading stucks due to half closed socket, which should throw exception, but won't.
Consider, whether using Reader, Processor and SyncListener as inner classes cannot cause problems with theirs specific tight references. Meaning, they prevent to be harvested by GC somehow. When I am debugging stucked Reader, it shows me reference to ApiConnectionImpl.connected equals true even close method was called some time ago, which does not make sense.

Thread count graph
image

Stack traces (15 seconds because I took image early, but it contined)
image

I was not able to prepare pull request, because applying changes without my refactor did not work well. So I uploaded whole project as is to googledrive at least, if you want to see. This version works well for me.
https://drive.google.com/open?id=1gbfV2bkbYwXZbv1pldQRiIiT_I1dgr8j

@iyadfawwaz
Copy link

that is the best way
class Connection extends AsyncTask<String,Integer,ApiConnection>{ protected ApiConnection doInBackgeound(String... strings){ ApiConnection apiConnection; try{ apiConnection = ApiConnection.connect(strings[0]; apiConnection.login(strings[1],strings[2]); return apiConnection; }catch(Exception e){} return null; }}
on MainActivity
Connection conn = new Connection(); conn.execute("IP","ADMIN","PASSWORD");
to get ApiConnection you will do (get()) after )
guards

@iyadfawwaz
Copy link

for all you can dependecy sy.iyad.MikroSy:mikrosy:1.7.24

@18uu
Copy link

18uu commented Oct 17, 2021

  • Mikroti

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants