Skip to content

Commit

Permalink
Merge pull request #292 from martin-belanger/handle-lost-avahi-dcs
Browse files Browse the repository at this point in the history
stafd: Add zeroconf-connections-persistence conf. parameter
  • Loading branch information
martin-belanger authored Dec 6, 2022
2 parents 73332d6 + 233200e commit 523ce71
Show file tree
Hide file tree
Showing 17 changed files with 574 additions and 54 deletions.
15 changes: 15 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,20 @@
# STorage Appliance Services (STAS)

## Changes with release 2.1

* Bug fixes:
* Immediately remove existing connection to Discovery Controllers (DC) discovered through zeroconf (mDNS) when added to `exclude=` in `stafd.conf`. Previously, adding DCs to `exclude=` would only take effect on new connections and would not apply to existing connections.
* When handling "key=value" pairs in the TXT field from Avahi, "keys" need to be case insensitive.
* Strip spaces from Discovery Log Page Entries (DLPE). Some DCs may append extra spaces to DLPEs (e.g. IP addresses with trailing spaces). The kernel driver does not expect extra spaces and therefore they need to be removed.
* In `stafd.conf` and `stacd.conf`, added new configuration parameters to provide parity with `nvme-cli`:
* `nr-io-queues`, `nr-write-queues`, `nr-poll-queues`, `queue-size`, `reconnect-delay`, `ctrl-loss-tmo`, `duplicate-connect`, `disable-sqflow`
* Changes to `stafd.conf`:
* Move `persistent-connections` from the `[Global]` section to a new section named `[Discovery controller connection management]`. `persistent-connections` will still be recognized from the `[Global]` section, but will be deprecated over time.
* Add new configuration parameter `zeroconf-connections-persistence` to section `[Discovery controller connection management]`. This parameter allows to age Discovery Controllers discovered through zeroconf (mDNS) when they are no longer reachable and should be purged from the configuration.
* Added more configuration validation to identify invalid Sections and Options in configuration files (`stafd.conf` and `stacd.conf`).
* Improve dependencies in meson build environment so that missing subprojects won't prevent distros from packaging the `nvme-stas` (i.e. needed when invoking meson with the `--wrap-mode=nodownload` option)
* Improve Read-The-Docs documentation format.

## Changes with release 2.0

Because of incompatibilities between 1.1.6 and 1.2 (ref. `sticky-connections`), it was decided to skip release 1.2 and have a 2.0 release instead. Release 2.0 contains everything listed in 1.2 (below) plus the following:
Expand Down
27 changes: 22 additions & 5 deletions coverage.sh.in
Original file line number Diff line number Diff line change
Expand Up @@ -143,10 +143,13 @@ cat > "${stafd_conf_fname}" <<'EOF'
[Global]
tron=true
kato=10
persistent-connections=false
ip-family=ipv6
johnny=be-good
[Discovery controller connection management]
persistent-connections=false
zeroconf-connections-persistence=1d
[Hello]
hello = bye
EOF
Expand Down Expand Up @@ -206,7 +209,10 @@ log "Change stafd config: tron=true, persistent-connections=false, zeroconf=enab
cat > "${stafd_conf_fname}" <<'EOF'
[Global]
tron=true
[Discovery controller connection management]
persistent-connections=false
zeroconf-connections-persistence=0.5
[Service Discovery]
zeroconf=enabled
Expand All @@ -219,10 +225,13 @@ log "Change stafd config: ip-family=ipv4, kato=10, adding multiple controllers"
cat > "${stafd_conf_fname}" <<'EOF'
[Global]
tron=true
persistent-connections=false
ip-family=ipv4
kato=10
[Discovery controller connection management]
persistent-connections=false
zeroconf-connections-persistence=1:01
[Controllers]
controller = transport = tcp ; traddr = localhost ; ; ;
controller=transport=tcp;traddr=1.1.1.1
Expand All @@ -240,7 +249,7 @@ EOF
reload_cfg "stafd"


log "Change stacd config: tron=true, udev-rule=disabled, sticky-connections=disabled"
log "Change stacd config: tron=true, udev-rule=disabled, disconnect-scope=blah-blah, disconnect-trtypes=boing-boing"
cat > "${stacd_conf_fname}" <<'EOF'
[Global]
tron=true
Expand Down Expand Up @@ -346,7 +355,10 @@ log "Empty configuration and disable zeroconf for stafd"
cat > "${stafd_conf_fname}" <<'EOF'
[Global]
tron=true
[Discovery controller connection management]
persistent-connections=false
zeroconf-connections-persistence=0.5
[Service Discovery]
zeroconf=disabled
Expand All @@ -360,7 +372,10 @@ log "Add single controller (::1) and re-enable zeroconf for stafd"
cat > "${stafd_conf_fname}" <<'EOF'
[Global]
tron=true
[Discovery controller connection management]
persistent-connections=false
zeroconf-connections-persistence=-1
[Controllers]
controller=transport=tcp;traddr=::1;trsvcid=8009
Expand Down Expand Up @@ -424,18 +439,20 @@ log "Run unit test: test-gtimer"
PYTHONPATH=${PYTHON_PATH} coverage run --rcfile=.coveragerc ../test/test-gtimer.py
log "Run unit test: test-iputil"
PYTHONPATH=${PYTHON_PATH} coverage run --rcfile=.coveragerc ../test/test-iputil.py
log "Run unit test: test-version"
PYTHONPATH=${PYTHON_PATH} coverage run --rcfile=.coveragerc ../test/test-version.py
log "Run unit test: test-log"
PYTHONPATH=${PYTHON_PATH} coverage run --rcfile=.coveragerc ../test/test-log.py
log "Run unit test: test-nvme_options"
sudo PYTHONPATH=${PYTHON_PATH} coverage run --rcfile=.coveragerc ../test/test-nvme_options.py
log "Run unit test: test-service"
PYTHONPATH=${PYTHON_PATH} coverage run --rcfile=.coveragerc ../test/test-service.py
log "Run unit test: test-timeparse"
PYTHONPATH=${PYTHON_PATH} coverage run --rcfile=.coveragerc ../test/test-timeparse.py
log "Run unit test: test-transport_id"
PYTHONPATH=${PYTHON_PATH} coverage run --rcfile=.coveragerc ../test/test-transport_id.py
log "Run unit test: test-udev"
PYTHONPATH=${PYTHON_PATH} coverage run --rcfile=.coveragerc ../test/test-udev.py
log "Run unit test: test-version"
PYTHONPATH=${PYTHON_PATH} coverage run --rcfile=.coveragerc ../test/test-version.py

################################################################################
# Stop nvme target simulator
Expand Down
60 changes: 60 additions & 0 deletions doc/stacd.conf.xml
Original file line number Diff line number Diff line change
Expand Up @@ -85,6 +85,66 @@
<xi:include href="standard-conf.xml" xpointer="kato"/>
<xi:include href="standard-conf.xml" xpointer="ip-family"/>

<varlistentry>
<term><varname>nr-io-queues=</varname></term>

<listitem>
<para>
Takes a value in the range 1...N. Overrides the
default number of I/O queues create by the driver.
</para>

<para>Note: This parameter is identical to that provided by nvme-cli.</para>
<para>
Default: Depends on kernel and other run
time factors (e.g. number of CPUs).
</para>
</listitem>
</varlistentry>

<varlistentry>
<term><varname>nr-write-queues=</varname></term>

<listitem>
<para>
Takes a value in the range 1...N. Adds additional
queues that will be used for write I/O.
</para>

<para>Note: This parameter is identical to that provided by nvme-cli.</para>

<para>
Default: Depends on kernel and other run
time factors (e.g. number of CPUs).
</para>
</listitem>
</varlistentry>

<varlistentry>
<term><varname>nr-poll-queues=</varname></term>

<listitem>
<para>
Takes a value in the range 1...N. Adds additional
queues that will be used for polling latency
sensitive I/O.
</para>

<para>Note: This parameter is identical to that provided by nvme-cli.</para>

<para>
Default: Depends on kernel and other run
time factors (e.g. number of CPUs).
</para>
</listitem>
</varlistentry>

<xi:include href="standard-conf.xml" xpointer="queue-size"/>
<xi:include href="standard-conf.xml" xpointer="reconnect-delay"/>
<xi:include href="standard-conf.xml" xpointer="ctrl-loss-tmo"/>
<xi:include href="standard-conf.xml" xpointer="duplicate-connect"/>
<xi:include href="standard-conf.xml" xpointer="disable-sqflow"/>

<varlistentry>
<term><varname>ignore-iface=</varname></term>
<listitem>
Expand Down
88 changes: 73 additions & 15 deletions doc/stafd.conf.xml
Original file line number Diff line number Diff line change
Expand Up @@ -85,21 +85,11 @@
<xi:include href="standard-conf.xml" xpointer="data-digest"/>
<xi:include href="standard-conf.xml" xpointer="kato"/>
<xi:include href="standard-conf.xml" xpointer="ip-family"/>

<varlistentry>
<term><varname>persistent-connections=</varname></term>
<listitem>
<para>
Takes a boolean argument. Whether connections to
Discovery Controllers (DC) are persistent. When
true, connections initiated by stafd will persists
even when stafd is stopped. When
<parameter>false</parameter>, <code>stafd</code>
will disconnect from all DCs it is connected to on
exit. Defaults to <parameter>false</parameter>.
</para>
</listitem>
</varlistentry>
<xi:include href="standard-conf.xml" xpointer="queue-size"/>
<xi:include href="standard-conf.xml" xpointer="reconnect-delay"/>
<xi:include href="standard-conf.xml" xpointer="ctrl-loss-tmo"/>
<xi:include href="standard-conf.xml" xpointer="duplicate-connect"/>
<xi:include href="standard-conf.xml" xpointer="disable-sqflow"/>

<varlistentry>
<term><varname>ignore-iface=</varname></term>
Expand Down Expand Up @@ -203,11 +193,79 @@
themselves over mDNS with the service type
<literal>_nvme-disc._tcp</literal>.
</para>
<para>
Defaults to <parameter>true</parameter>.
</para>
</listitem>
</varlistentry>
</variablelist>
</refsect2>

<refsect2>
<title>[Discovery controller connection management] section</title>

<para>
The following options are available in the
<literal>[Discovery controller connection management]</literal> section:
</para>

<varlistentry>
<term><varname>persistent-connections=</varname></term>
<listitem>
<para>
Takes a boolean argument. Whether connections to
Discovery Controllers (DC) are persistent. When
true, connections initiated by stafd will persists
even when stafd is stopped. When
<parameter>false</parameter>, <code>stafd</code>
will disconnect from all DCs it is connected to on
exit.
</para>
<para>
Defaults to <parameter>false</parameter>.
</para>
</listitem>
</varlistentry>

<varlistentry>
<term><varname>zeroconf-connections-persistence=</varname></term>
<listitem>
<para>
Takes a unit-less value in seconds, or a time span value
such as "72hours" or "5days. A value of 0 means no
persistence. In other words, configuration acquired through
zeroconf (mDNS service discovery) will be removed
immediately when mDNS no longer reports the presence of
a Discovery Controller (DC) and connectivity to that DC
is lost. A value of -1 means that configuration acquired
through zeroconf will persist forever.
</para>

<para>
This is used for the case where a DC that was discovered
through mDNS service discovery no longer advertises
itself through mDNS and can no longer be connected to.
For example, the DC had some catastrophic failure
(e.g. power surge) and needs to be replaced. In that
case, the connection to that DC can never be restored
and a replacement DC will be needed. The replacement
DC will likely have a different NQN (or IP address).
In that scenario, the host won't be able to determine
that the old DC is not coming back. It won't know either
that a newly discovered DC is really the replacement for
the old one. For that reason, the host needs a way to
"age" zeroconf-acquired configuration and remove it
automatically after a certain amount of time. This is
what this parameter is for.
</para>

<para>
Defaults to <parameter>72hours</parameter>.
</para>
</listitem>
</varlistentry>
</refsect2>

<xi:include href="standard-conf.xml" xpointer="controller"/>
</refsect1>

Expand Down
100 changes: 100 additions & 0 deletions doc/standard-conf.xml
Original file line number Diff line number Diff line change
Expand Up @@ -90,6 +90,106 @@
</para>
</listitem>
</varlistentry>

<varlistentry id='queue-size'>
<term><varname>queue-size=</varname></term>

<listitem id='queue-size-text'>
<para>
Takes a value in the range 16...1024.
</para>

<para>
Overrides the default number of elements in the I/O queues
created by the driver. This option will be ignored for
discovery, but will be passed on to the subsequent connect
call.
</para>

<para>Note: This parameter is identical to that provided by nvme-cli.</para>

<para>
Defaults to <parameter>128</parameter>.
</para>
</listitem>
</varlistentry>

<varlistentry id='reconnect-delay'>
<term><varname>reconnect-delay=</varname></term>

<listitem id='reconnect-delay-text'>
<para>
Takes a value in the range 1 to N seconds.
</para>

<para>
Overrides the default delay before reconnect is attempted
after a connect loss.
</para>

<para>Note: This parameter is identical to that provided by nvme-cli.</para>

<para>
Defaults to <parameter>10</parameter>. Retry to connect every 10 seconds.
</para>
</listitem>
</varlistentry>

<varlistentry id='ctrl-loss-tmo'>
<term><varname>ctrl-loss-tmo=</varname></term>

<listitem id='ctrl-loss-tmo-text'>
<para>
Takes a value in the range -1, 0, ..., N seconds. -1 means
retry forever. 0 means do not retry.
</para>

<para>
Overrides the default controller loss timeout period (in seconds).
</para>

<para>Note: This parameter is identical to that provided by nvme-cli.</para>

<para>
Defaults to <parameter>600</parameter> seconds (10 minutes).
</para>
</listitem>
</varlistentry>

<varlistentry id='duplicate-connect'>
<term><varname>duplicate-connect=</varname></term>

<listitem id='duplicate-connect-text'>
<para>
Takes a boolean argument. Allows duplicated connections
between same transport host and subsystem port.
</para>

<para>Note: This parameter is identical to that provided by nvme-cli.</para>

<para>
Defaults to <parameter>false</parameter>.
</para>
</listitem>
</varlistentry>

<varlistentry id='disable-sqflow'>
<term><varname>disable-sqflow=</varname></term>

<listitem id='disable-sqflow-text'>
<para>
Takes a boolean argument. Disables SQ flow control to omit
head doorbell update for submission queues when sending nvme
completions.
</para>

<para>Note: This parameter is identical to that provided by nvme-cli.</para>

<para>
Defaults to <parameter>false</parameter>.
</para>
</listitem>
</varlistentry>
</variablelist>

<refsect2 id='controller'>
Expand Down
Loading

0 comments on commit 523ce71

Please sign in to comment.