Skip to content
This repository has been archived by the owner on Jun 24, 2021. It is now read-only.

IRCd listening sockets are erroneously inherited by libratbox helper processes (e.g. bandb) on illumos (SunOS 5.11) #291

Open
janicez opened this issue Oct 10, 2019 · 8 comments
Assignees
Labels

Comments

@janicez
Copy link

janicez commented Oct 10, 2019

So far, this has only been reproduced on a fork of 3.5.7. It will be tested on a clean 3.5.7 work tree, and this bug is not to be considered valid until such time as it has been reproduced on clean 3.5.7.

@janicez
Copy link
Author

janicez commented Oct 10, 2019

Reproduced on 3.5.7.

13:06:53 peri141  -- | /home/ellenor/.local/charybdis-3.5/etc/ircd.conf :Rehashing
13:06:53 peri141  -- | hades.arpa: *** Notice -- [email protected]{ellenor2000} is rehashing server config file
13:06:53 peri141  -- | hades.arpa: *** Notice -- Cannot bind for listener on port 14005: Address already in use
13:06:53 peri141  -- | hades.arpa: *** Notice -- Cannot bind for listener on port 14005: Address already in use
13:06:53 peri141  -- | hades.arpa: *** Notice -- Cannot bind for listener on port 14004: Address already in use
13:06:53 peri141  -- | hades.arpa: *** Notice -- Cannot bind for listener on port 14004: Address already in use
13:06:53 peri141  -- | hades.arpa: *** Notice -- Cannot bind for listener on port 14003: Address already in use
13:06:53 peri141  -- | hades.arpa: *** Notice -- Cannot bind for listener on port 14003: Address already in use
13:06:53 peri141  -- | hades.arpa: *** Notice -- Cannot bind for listener on port 14002: Address already in use
13:06:53 peri141  -- | hades.arpa: *** Notice -- Cannot bind for listener on port 14002: Address already in use
13:06:53 peri141  -- | hades.arpa: *** Notice -- Cannot bind for listener on port 14001: Address already in use
13:06:53 peri141  -- | hades.arpa: *** Notice -- Cannot bind for listener on port 14001: Address already in use
13:06:53 peri141  -- | hades.arpa: *** Notice -- Cannot bind for listener on port 14000: Address already in use
13:06:53 peri141  -- | hades.arpa: *** Notice -- Cannot bind for listener on port 14000: Address already in use
13:06:53 peri141  -- | hades.arpa: *** Notice -- Cannot bind for listener on port 14105: Address already in use
13:06:53 peri141  -- | hades.arpa: *** Notice -- Cannot bind for listener on port 14105: Address already in use
13:06:53 peri141  -- | hades.arpa: *** Notice -- Cannot bind for listener on port 14104: Address already in use
13:06:53 peri141  -- | hades.arpa: *** Notice -- Cannot bind for listener on port 14104: Address already in use
13:06:53 peri141  -- | hades.arpa: *** Notice -- Cannot bind for listener on port 14103: Address already in use
13:06:53 peri141  -- | hades.arpa: *** Notice -- Cannot bind for listener on port 14103: Address already in use
13:06:53 peri141  -- | hades.arpa: *** Notice -- Cannot bind for listener on port 14102: Address already in use
13:06:53 peri141  -- | hades.arpa: *** Notice -- Cannot bind for listener on port 14102: Address already in use
13:06:53 peri141  -- | hades.arpa: *** Notice -- Cannot bind for listener on port 14101: Address already in use
13:06:53 peri141  -- | hades.arpa: *** Notice -- Cannot bind for listener on port 14101: Address already in use
13:06:53 peri141  -- | hades.arpa: *** Notice -- Cannot bind for listener on port 14100: Address already in use
13:06:53 peri141  -- | hades.arpa: *** Notice -- Cannot bind for listener on port 14100: Address already in use

lsof for the ports ircd opens:

bandb    4085 ellenor  257u  IPv4 0xfffffe01f72e9840      0t0  TCP *:14005 (LISTEN)
bandb    4085 ellenor  258u  IPv6 0xfffffe01f76d4800      0t0  TCP *:14005 (LISTEN)
bandb    4085 ellenor  259u  IPv4 0xfffffe0f2b171040      0t0  TCP *:14004 (LISTEN)
bandb    4085 ellenor  260u  IPv6 0xfffffe01fab40880      0t0  TCP *:14004 (LISTEN)
bandb    4085 ellenor  261u  IPv4 0xfffffe01f8793800      0t0  TCP *:14003 (LISTEN)
bandb    4085 ellenor  262u  IPv6 0xfffffe01f801a040      0t0  TCP *:14003 (LISTEN)
bandb    4085 ellenor  263u  IPv4 0xfffffe01ed796800      0t0  TCP *:14002 (LISTEN)
bandb    4085 ellenor  264u  IPv6 0xfffffe01ff15b840      0t0  TCP *:14002 (LISTEN)
bandb    4085 ellenor  265u  IPv4 0xfffffe01f7e5f000      0t0  TCP *:14001 (LISTEN)
bandb    4085 ellenor  266u  IPv6 0xfffffe0f2fafe880      0t0  TCP *:14001 (LISTEN)
bandb    4085 ellenor  267u  IPv4 0xfffffe01f72f1880      0t0  TCP *:14000 (LISTEN)
bandb    4085 ellenor  268u  IPv6 0xfffffe01e8598100      0t0  TCP *:14000 (LISTEN)
bandb    4085 ellenor  269u  IPv4 0xfffffe01f5089800      0t0  TCP *:14105 (LISTEN)
bandb    4085 ellenor  270u  IPv6 0xfffffe01f5089080      0t0  TCP *:14105 (LISTEN)
bandb    4085 ellenor  271u  IPv4 0xfffffe01f50ab840      0t0  TCP *:14104 (LISTEN)
bandb    4085 ellenor  272u  IPv6 0xfffffe01f50ab0c0      0t0  TCP *:14104 (LISTEN)
bandb    4085 ellenor  273u  IPv4 0xfffffe0f31051880      0t0  TCP *:14103 (LISTEN)
bandb    4085 ellenor  274u  IPv6 0xfffffe01fee657c0      0t0  TCP *:14103 (LISTEN)
bandb    4085 ellenor  275u  IPv4 0xfffffe01fee65040      0t0  TCP *:14102 (LISTEN)
bandb    4085 ellenor  276u  IPv6 0xfffffe01f87a3840      0t0  TCP *:14102 (LISTEN)
bandb    4085 ellenor  277u  IPv4 0xfffffe01f6b4e880      0t0  TCP *:14101 (LISTEN)
bandb    4085 ellenor  278u  IPv6 0xfffffe0f2b1717c0      0t0  TCP *:14101 (LISTEN)
bandb    4085 ellenor  279u  IPv4 0xfffffe0f31051100      0t0  TCP *:14100 (LISTEN)
bandb    4085 ellenor  280u  IPv6 0xfffffe01f5f83780      0t0  TCP *:14100 (LISTEN)

@aaronmdjones aaronmdjones self-assigned this Oct 12, 2019
@aaronmdjones
Copy link
Contributor

I'd hazard a guess that O_CLOEXEC isn't being set somewhere (or isn't being respected, if it is). I'll have a look into this, but without a system to test and reproduce on, I can't promise anything.

@aaronmdjones aaronmdjones changed the title [for an obsolete version] On illumos (SunOS 5.11), another ircd process (so far only bandb) usurps ircd's sockets upon /rehash, rendering the ircd completely gimped and requiring manual termination of bandb and ssld. IRCd listening sockets are erroneously inherited by libratbox helper processes (e.g. bandb) on Illumos (SunOS 5.11) Oct 12, 2019
@janicez
Copy link
Author

janicez commented Oct 12, 2019

shall i throw you a shell account on my illumos box?

@janicez
Copy link
Author

janicez commented Oct 12, 2019

and yes, I just find|xargs grep'd through the source code of my fork, and O_CLOEXEC is not being set anywhere.

@janicez
Copy link
Author

janicez commented Oct 12, 2019

would it be idiomatic ratbox coding to dig into an rb_fde_t, or are those values to be treated as black boxes? pre-publish edit: it appears to be int rb_get_fd() to get the fd out of an F. good-o.

I'm considering adding a hack to my 3.5.7 fork (and possibly pull-req'ing it back to mainline 3.5.7 if it's idiomatic) that will fcntl F_SETFD FD_CLOEXEC listeners and the sockets created off of accept()ing them.

@janicez janicez changed the title IRCd listening sockets are erroneously inherited by libratbox helper processes (e.g. bandb) on Illumos (SunOS 5.11) IRCd listening sockets are erroneously inherited by libratbox helper processes (e.g. bandb) on illumos (SunOS 5.11) Oct 12, 2019
@janicez
Copy link
Author

janicez commented Oct 12, 2019

By the way, a lowercase i in illumos is correct title case, because the brand is in lowercase.

@janicez
Copy link
Author

janicez commented Oct 12, 2019

By the way, @aaronmdjones, incorporating the fix you suggest seems to work on my illumos system.

 $ lsof -i TCP:14100                                                                       
COMMAND   PID    USER   FD   TYPE             DEVICE SIZE/OFF NODE NAME
weechat  6267 ellenor   15u  IPv4 0xfffffe01feb01780      0t0  TCP perihelion.local:37663->perihelion.local:14100 (ESTABLISHED)
ircd    22066 ellenor  291u  IPv4 0xfffffe01f6b4e880      0t0  TCP *:14100 (LISTEN)
ircd    22066 ellenor  292u  IPv6 0xfffffe01ea298800      0t0  TCP *:14100 (LISTEN)
ssld    22098 ellenor    7u  IPv4 0xfffffe01ee9f2000  0t20712  TCP perihelion.local:14100->perihelion.local:37663 (ESTABLISHED)
 $ : I will shortly do a rehash. Once I have, I will show the output of that lsof command again.
 $ lsof -i TCP:14100                                                                            
COMMAND   PID    USER   FD   TYPE             DEVICE SIZE/OFF NODE NAME
weechat  6267 ellenor   15u  IPv4 0xfffffe01feb01780      0t0  TCP perihelion.local:37663->perihelion.local:14100 (ESTABLISHED)
ircd    22066 ellenor  291u  IPv4 0xfffffe01f6b4e880      0t0  TCP *:14100 (LISTEN)
ircd    22066 ellenor  292u  IPv6 0xfffffe01ea298800      0t0  TCP *:14100 (LISTEN)
ssld    22098 ellenor    7u  IPv4 0xfffffe01ee9f2000  0t23529  TCP perihelion.local:14100->perihelion.local:37663 (ESTABLISHED)

I can even still reconnect.

Adding this line seems to be what fixed it:

fcntl (rb_get_fd(listener->F), F_SETFD, fcntl(rb_get_fd(listener->F), F_GETFD, 0) | FD_CLOEXEC);

janicez added a commit to asterIRC/IRCa that referenced this issue Oct 12, 2019
…d by amdj on charybdis-ircd/charybdis#291 for what I am calling the 'bandb bug'.

per @aaronmdjones, "I'd hazard a guess that O_CLOEXEC isn't being set somewhere (or isn't being respected, if it is). I'll have a look into this, but without a system to test and reproduce on, I can't promise anything."
janicez added a commit to janicez/charybdis that referenced this issue Oct 12, 2019
…s suggested by @aaronmdjones (unsure if the bug manifests on master because I wasn't able to compile the ircd)
@janicez
Copy link
Author

janicez commented Oct 12, 2019

Curious. The same line doesn't seem to fix it in chary 4.

In addition, if I kill the ircd process (or shut it down gracefully, even), neither bandb nor ssld close down correctly.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

2 participants