Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

re-connect at every mqtt command #50

Closed
m-reuter opened this issue Jan 8, 2022 · 18 comments
Closed

re-connect at every mqtt command #50

m-reuter opened this issue Jan 8, 2022 · 18 comments

Comments

@m-reuter
Copy link

m-reuter commented Jan 8, 2022

Hi,

thanks for the great work! I noticed that my var log message file gets spammed with MQTT disconnect and new connect messages. The reason is in this function:

def pushValues(self, vars=None):

which connects for every single command, and I am querying the HPSU every few seconds with multiple commands. Shouldn't the connection be established one time in the beginning and then remains active? That is how my other devices behave.

@segaura
Copy link

segaura commented Jan 22, 2022

There are two different things:

  • the opportunity to reuse the same mqtt connection over and over, which can be discussed with its pros and cons
  • the fact that this kind of events are logged while, maybe, you are not interested in them

I pushed a version which includes log level management, these connection related events are logged at INFO level so if you set minimum logged level as WARNING, they do not reach your log file.
I personally am doing exactly this with the --log_level WARNING option.

Now you can find it in the testing branch or in my fork ...problem is that I made so many changes you have to change a few things if you want to use those versions.

@Spanni26
Copy link
Owner

Spanni26 commented Jan 22, 2022 via email

@m-reuter
Copy link
Author

thanks. I was not worried about the pyhpsu logs. What was problematic is the MQTT broker (mosquitto) as a Home Assistant Add-on by default logging all those details to syslog. I figured out how to shut off most messages there, but the problem are these error messages that get generated when this client re-connects without closing the connection before:
Feb 20 12:28:08 raspberrypi d44fe278ca71[844]: 1645359978: Client rotex_hpsu already connected, closing old connection.

So I think we should either constantly close and re-connect or better:

re-use the existing connection, check if it is lost, and only then re-connect.

@segaura
Copy link

segaura commented Feb 20, 2022

Oh, only now I understand, sorry.
MQTT output plugin code can be easily refactored to mitigate this issue.
As pointed out, this is the problematic section

    def pushValues(self, vars=None):
        
        #self.msgs=[]
        for r in vars:
            self.client.connect(self.brokerhost, port=self.brokerport)
            msgs=[]
            if self.prefix:
                ret=self.client.publish(self.prefix + "/" + r['name'],payload=r['resp'], qos=int(self.qos))
                topic=self.prefix + "/" + r['name']
            else:
                ret=self.client.publish(r['name'],payload=r['resp'], qos=int(self.qos))
                topic=r['name']
            msg={'topic':topic,'payload':r['resp'], 'qos':self.qos, 'retain':False}
            self.client.disconnect()

A first improvement can be moving connect and disconnect outside of the loop, this way

    def pushValues(self, vars=None):
        
        self.client.connect(self.brokerhost, port=self.brokerport)

        #self.msgs=[]
        for r in vars:
            msgs=[]
            if self.prefix:
                ret=self.client.publish(self.prefix + "/" + r['name'],payload=r['resp'], qos=int(self.qos))
                topic=self.prefix + "/" + r['name']
            else:
                ret=self.client.publish(r['name'],payload=r['resp'], qos=int(self.qos))
                topic=r['name']
            msg={'topic':topic,'payload':r['resp'], 'qos':self.qos, 'retain':False}

        self.client.disconnect()

If you execute, let's say, 20 read command every 30 seconds, you now create a connection every 30 seconds, before 20 consecutive connections were created every time and maybe some race conditions happened.

I tried this change in my fork and the log messages are gone (example log grep inside my mosquitto container)

/ $  grep "already connected, closing old connection" /mosquitto/log/mosquitto.log
/ $

and a typical log block is like this

0: New client connected from 192.168.1.102:58009 as pyhpsu-24468 (p2, c1, k60, u'dummy-username').
0: Sending CONNACK to pyhpsu-24468 (0, 0)
0: Received PUBLISH from pyhpsu-24468 (d0, q0, r0, m0, 'rotex/t_hs', ... (5 bytes))
0: Received PUBLISH from pyhpsu-24468 (d0, q0, r0, m0, 'rotex/t_hs_set', ... (4 bytes))
0: Received PUBLISH from pyhpsu-24468 (d0, q0, r0, m0, 'rotex/water_pressure', ... (5 bytes))
0: Received PUBLISH from pyhpsu-24468 (d0, q0, r0, m0, 'rotex/t_dhw', ... (5 bytes))
0: Received PUBLISH from pyhpsu-24468 (d0, q0, r0, m0, 'rotex/t_dhw_set', ... (5 bytes))
0: Received PUBLISH from pyhpsu-24468 (d0, q0, r0, m0, 'rotex/t_return', ... (5 bytes))
0: Received PUBLISH from pyhpsu-24468 (d0, q0, r0, m0, 'rotex/flow_rate', ... (1 bytes))
0: Received PUBLISH from pyhpsu-24468 (d0, q0, r0, m0, 'rotex/t_hc', ... (5 bytes))
0: Received PUBLISH from pyhpsu-24468 (d0, q0, r0, m0, 'rotex/t_hc_set', ... (5 bytes))
0: Received PUBLISH from pyhpsu-24468 (d0, q0, r0, m0, 'rotex/posmix', ... (1 bytes))
0: Received PUBLISH from pyhpsu-24468 (d0, q0, r0, m0, 'rotex/mode_01', ... (1 bytes))
0: Received PUBLISH from pyhpsu-24468 (d0, q0, r0, m0, 'rotex/tvbh2', ... (5 bytes))
0: Received PUBLISH from pyhpsu-24468 (d0, q0, r0, m0, 'rotex/mode', ... (1 bytes))
0: Received PUBLISH from pyhpsu-24468 (d0, q0, r0, m0, 'rotex/pump', ... (1 bytes))
0: Received PUBLISH from pyhpsu-24468 (d0, q0, r0, m0, 'rotex/bpv', ... (3 bytes))
0: Received PUBLISH from pyhpsu-24468 (d0, q0, r0, m0, 'rotex/t_v1', ... (5 bytes))
0: Received PUBLISH from pyhpsu-24468 (d0, q0, r0, m0, 'rotex/t_vbh', ... (5 bytes))
0: Received PUBLISH from pyhpsu-24468 (d0, q0, r0, m0, 'rotex/t_r1', ... (5 bytes))
0: Received PUBLISH from pyhpsu-24468 (d0, q0, r0, m0, 'rotex/t_flow_cooling', ... (4 bytes))
0: Client pyhpsu-24468 closed its connection.

@m-reuter
Copy link
Author

Thanks, this should already help. I am currently pulling #44 . So would you add this to testing?
When is the new MQTT stuff going to be merged from testing into master? I have been using that for a year now without problems. See https://github.com/m-reuter/ha-addons/tree/master/pyhpsu2mqtt

@segaura
Copy link

segaura commented Feb 22, 2022

I can make a new pull request with the few commits I've added, including this last one we discussed here.

What I'm trying to add before creating it is compatibility of the mqttdaemon mode with systemd, managing correctly exeptions (e.g. closing the father process when a child thread fails so that systemd daemon can understand the problem): are you experiencing similar issue leaving pyHPSU running for days?

@m-reuter
Copy link
Author

I did not notice anything (in var/log/messages) . Should I look at another specific log file?
I leave it running for months.

@segaura
Copy link

segaura commented Feb 22, 2022

OK, in this case we can consolidate the current codebase.
I created pull request #51 , still on testing branch.

My issues are related to influxdb output plugin reporting influxdb is sometimes not available and this makes pyHPSU crash in a way that systemd is not able to detect, I will investigate further on my own, thank you.

@Spanni26
Copy link
Owner

Spanni26 commented Feb 23, 2022 via email

@m-reuter
Copy link
Author

@segaura I upgraded to this version #51 and see constant crashing and restarting of my docker container. Not sure why. I am currently reverting to #44 which worked before, to see if it is a build issue (I also switched to a newer build environment on travis) or if it is caused by the difference between those two versions here.

@m-reuter
Copy link
Author

Its also happening with the old version now, so it is probably a build or dependency issue on my side.

@m-reuter
Copy link
Author

Turned out to be a dependency python-can==4.0.0 does not seem to work (see #52 ). Forcing the older version 3.3.4 it is now running - so far without problems. The re-connect race conditions did not appear again. Will keep it running for a few days and if all goes well, we can close this and can also consider merging testing into master branch, I think.

@m-reuter
Copy link
Author

It is still running stable without any more MQTT re-connects or crashes.

@segaura
Copy link

segaura commented Feb 25, 2022

Cool.
After you told me that it is running without crashing for months, I looked deeper at my logs and found that the issue is with a mqtt-UNrelated plugin, the InfluxDB one. Possibly one you are not using.
Problem is that is tries to catch and manage exceptions but sometimes a second exception happens while the first one is being managed.
Simply removing exception management solved the issue (probably in case of exception systemd is restarting the process).
Other output plugins like the mqtt one do not manage exceptions.
I'll test it a few weeks and then I'll contribute a new pull request with this InfluxDB plugin fix.

@m-reuter
Copy link
Author

@segaura great. And you are right, I am not using InfluxDB plugin.

@svde
Copy link

svde commented Feb 25, 2022

After my server running mosquitto reboots for software updates the raspberry running pyHPSU isn't able to send commands (for instance rotex/command/heat_slope) anymore. Reading values keeps working. After restarting the hpsu service sending commands works again. Nothing is mentioned in the logfile (LOG_LEVEL_STRING = WARNING).

@m-reuter
Copy link
Author

@svde I started a new issue for you, as it seems unrelated to the re-connect race condition discussed and fixed here. Or has this happened only recently after the the fix for this issue, and worked correctly before?

@svde
Copy link

svde commented Feb 25, 2022

@svde I started a new issue for you, as it seems unrelated to the re-connect race condition discussed and fixed here. Or has this happened only recently after the the fix for this issue, and worked correctly before?

Its probably unrelated.

Thanks for opening the new issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants