Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unreliable connection in RTU and TCP modes #776

Open
ahpohl opened this issue Oct 25, 2024 · 10 comments
Open

Unreliable connection in RTU and TCP modes #776

ahpohl opened this issue Oct 25, 2024 · 10 comments

Comments

@ahpohl
Copy link

ahpohl commented Oct 25, 2024

libmodbus version

libmodbus-3.1.11

OS and/or distribution

Arch Linux AUR (libmodbus compiled from source), Docker (Alpine)

Environment

x86_64, aarch64

Description

I am working on a library to read SunSpec compatible photovoltaic inverters via Modbus TCP and RTU. The connection to the inverter is unreliable (better in RTU mode than in TCP mode) and I get frequent connection timeouts when reading multiple registers in a row. I am not sure if this is an issue with my inverter or with libmodbus. I've played around with response and byte timeouts, but changing these doesn't seem to have much effect.

Actual behavior if applicable

I've written a simple test program which reads register 40000 from the inverter and returns the string "SunS". The test program inconsistently times out after reading the first few registers and gives the debug error message "ERROR Connection timed out: select".

Expected behavior or suggestion

Reliable communication in RTU and TCP mode.

Steps to reproduce the behavior (commands or source code)

#include <errno.h>
#include <modbus/modbus.h>
#include <stdio.h>
#include <unistd.h>

int main(int argc, char **argv) {
  modbus_t *ctx;
  uint16_t tab_reg[64];
  int rc;
  int i;

  ctx = modbus_new_tcp_pi("primo.home.arpa", "502");
  if (modbus_connect(ctx) == -1) {
    fprintf(stderr, "Connection failed: %s\n", modbus_strerror(errno));
    modbus_free(ctx);
    return -1;
  }

  if (modbus_set_slave(ctx, 1)) {
    fprintf(stderr, "Invalid slave ID: %s\n", modbus_strerror(errno));
    modbus_free(ctx);
    return -1;
  }

  modbus_set_debug(ctx, 1);

  for (int i = 0; i < 1000; ++i) {
    rc = modbus_read_registers(ctx, 40000, 2, tab_reg);
    if (rc == -1) {
      fprintf(stderr, "%s\n", modbus_strerror(errno));
      return -1;
    }

    char buf[5] = {0};
    for (int j = 0; j < rc; j++) {
      printf("reg[%d]=%d (0x%X)\n", j, tab_reg[j], tab_reg[j]);
      buf[j * 2] = (tab_reg[j] >> 8) & 0xFF;
      buf[j * 2 + 1] = tab_reg[j];
    }
    printf("%d, %s\n", i, buf);
  }

  modbus_close(ctx);
  modbus_free(ctx);

  return 0;
}

libmodbus output with debug mode enabled

[[00][01][00][00][00][06][01][03][9C][40][00][02]
Waiting for a confirmation...
<00><01><00><00><00><07><01><03><04><53><75><6E><53>
reg[0]=21365 (0x5375)
reg[1]=28243 (0x6E53)
0, SunS
...
[00][48][00][00][00][06][01][03][9C][40][00][02]
Waiting for a confirmation...
<00><48><00><00><00><07><01><03><04><53><75><6E><53>
reg[0]=21365 (0x5375)
reg[1]=28243 (0x6E53)
71, SunS
[00][49][00][00][00][06][01][03][9C][40][00][02]
Waiting for a confirmation...
ERROR Connection timed out: select
Connection timed out
@mhei
Copy link
Contributor

mhei commented Oct 31, 2024

I get frequent connection timeouts when reading multiple registers in a row.

Unfortunately, this is not unusual behavior and I have often observed it on many devices.

I am not sure if this is an issue with my inverter or with libmodbus.

I bet, this is a problem with your inverter. I recommend to double-check with any GUI tool of your choice to rule out the library.

You did not write which inverter you are using, maybe you can share a link to the datasheet so that people can comment about similar observed behavior etc.

And in your test program above, I cannot see a small sleep between the queries: I have often seen devices, which need a little delay between each query because their UART and/or Modbus implementation is really poor.

@ahpohl
Copy link
Author

ahpohl commented Oct 31, 2024

I am using a Fronius Primo 4.0 and a self written c++ wrapper library around libmodbus together with Froniusd, a little daemon to read the data from the inverter. I tried adding a delay between register queries and the extra delay doesn't improve the reliability of the connection.

The inverter has both a serial cable and a network cable attached and the data readout over a serial Modbus RTU connection is already running stable since last year with only a few occasional timeouts. Recently I added TCP support to Froniusd to make it more usable for other people who do not want to connect an extra serial cable. But the Modbus TCP connection is almost not usable, both in Froniusd and the little demo program I posted above.

I'll try a Modbus GUI as you suggested to see if in principle the inverter Modbus TCP implementation is usable or not. I also believe this is probably not related to libmodbus. I could try disabling the cloud connection to the solar.web API while reading the data with Modbus TCP. Maybe the little embedded processor in the inverter cannot handle both connections simultaneously very well.

EDIT: disabling the solar.web connection makes no difference, the above program terminates with connection timeout message after 100-200 register reads.

@ahpohl
Copy link
Author

ahpohl commented Oct 31, 2024

image
Tried QModMaster 0.5.2 (libmodbus 3.1.4) with a scan rate of 1000 ms. I got 210 successful reads and 17 timeout and invalid data errors. This is about the same error rate I am getting with my code.

@ahpohl
Copy link
Author

ahpohl commented Nov 1, 2024

I tested the communication again but this time not with libmodbus. A simple python test program written with pymodbus works without timeouts both with sync and async TCP clients. This is the async test code:

import asyncio
from pymodbus.client import AsyncModbusTcpClient

async def run():
  client = AsyncModbusTcpClient('primo.home.arpa')
  await client.connect()
  for x in range(1000):
    rr = await client.read_holding_registers(address=40000, count=2, slave=1)
    print(x, [hex(y) for y in rr.registers])
  client.close()

if __name__ == "__main__":
  asyncio.run(run())

Output:

0 ['0x5375', '0x6e53']
...
999 ['0x5375', '0x6e53']

The loop runs 1000 times without interruption. No extra delays were added in the code example, but I guess that internally pymodbus adds delays at various stages when reading or writing holding registers.

Do you have a suggestion where to add the delay when using libmodbus? I tried adding a 100 ms delay between reading the two registers inside the loop, but the timeouts still happen.

  for (int i = 0; i < 1000; ++i) {
    rc = modbus_read_registers(ctx, 40000, 2, tab_reg);
    if (rc == -1) {
      fprintf(stderr, "%s\n", modbus_strerror(errno));
      return -1;
    }
    usleep(100000);
  }

@karlp
Copy link
Contributor

karlp commented Nov 1, 2024

At this stage I'd be scoping the bus lines. You're looking at both the inter frame time, but also, depending on your hardware, the time between DE being asserted and the data being transmitted, and between the end of your data and the DE being de-asserted.

@ahpohl
Copy link
Author

ahpohl commented Nov 2, 2024

Setting the response timeout to something greater than 500 ms seems to solve the issue. Reading register 40000 for 1000 times in a loop like I did before gives the following error rates:

Timeout:  300 ms, errors: 707
Timeout:  400 ms, errors: 112
Timeout:  500 ms, errors: 5
Timeout:  600 ms, errors: 0
Timeout:  700 ms, errors: 0
Timeout:  800 ms, errors: 0
Timeout:  900 ms, errors: 0
Timeout: 1000 ms, errors: 0

Below 300 ms the error rate is 100 % and it cannot read the register at all. Without setting the response timeout the error rate is comparable to a response timeout of 400 ms. What is the behavior of the libmodbus library when not setting the response timeout at all?

@ahpohl
Copy link
Author

ahpohl commented Nov 3, 2024

Correct me if I am wrong, calling modbus_get_response_timeout() results in 500.000 usec. So when not setting explicitly a response timeout, the library uses 500 ms, which is just slightly too short for the Fronius Primo. I now use 800 ms (just to give a little headroom) and the communication with the inverter is rock solid.
I also enable link and protocol error recovery with modbus_set_error_recovery() to automatically flush the buffers after the recovery period and/or reset the connection depending on the error condition.

Thanks for your initial help. I guess you can close this issue now.

@mhei
Copy link
Contributor

mhei commented Nov 3, 2024

Yes, I confirm that not setting a response timeout by yourself results in the usage of the default value of 500 ms.

Thanks for sharing your analysis and your insights, very appreciated.

@karlp
Copy link
Contributor

karlp commented Nov 4, 2024

Fwiw, Fronius should be providing figures on what their max response time is. here's an example from another vendor, in their modbus registers document, explaining it explicitly:
image
(500ms is a kinda slow, "traditional" device IMO, but such is life)

@ahpohl
Copy link
Author

ahpohl commented Nov 4, 2024

All I have is this manual of the Fronius Datamanager, which is freely available on the Fronius website. But it doesn't contain modbus timing diagrams like the one shown above.

But on page 65 it contains the following statement: " Execute the queries with a timeout of at least 10 seconds. Queries at millisecond intervals can lead to long response times."

Anyway, I am glad that I found a solution and libmodbus works great in my project now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants