New instances in a rebooted sled are unable to reach existing instances in other sleds on their private IPs #5214

askfongjojo · 2024-03-07T02:21:30Z

I noticed this issue after running a bunch of scrimlet reboot tests on rack2. One of the instances in question happens to be on a scrimlet I rebooted at the tail end of the testing. It was however created at least an hour after the reboot happened so it's unclear how it could be related.

Here are the instance details:

#	instance name	uuid	sled	external IP	private IP
1	prov-time-16c-32m	c856f03c-f45a-4288-94fa-c68b3a283482	BRM44220011	172.20.26.186	172.30.0.24
2	sbmysql-9	29e7866a-d504-4629-9ece-8bee96fbab73	BRM42220014	172.20.26.72	172.30.0.21

Instance 1 is able to reach all other instances on their private IPs within the subnet except for instance 2

ubuntu@vm-16c-32m:~$ ping 172.30.0.21
PING 172.30.0.21 (172.30.0.21) 56(84) bytes of data.
^C
--- 172.30.0.21 ping statistics ---
9 packets transmitted, 0 received, 100% packet loss, time 8179ms

ubuntu@vm-16c-32m:~$ ping 172.30.0.9
PING 172.30.0.9 (172.30.0.9) 56(84) bytes of data.
64 bytes from 172.30.0.9: icmp_seq=1 ttl=64 time=0.468 ms
64 bytes from 172.30.0.9: icmp_seq=2 ttl=64 time=0.345 ms
^C
--- 172.30.0.9 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1027ms
rtt min/avg/max/mdev = 0.345/0.406/0.468/0.061 ms

But it can reach instance 2 on its external IP

ubuntu@vm-16c-32m:~$ ping 172.20.26.72
PING 172.20.26.72 (172.20.26.72) 56(84) bytes of data.
64 bytes from 172.20.26.72: icmp_seq=1 ttl=62 time=0.620 ms
64 bytes from 172.20.26.72: icmp_seq=2 ttl=62 time=0.405 ms
^C
--- 172.20.26.72 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1009ms
rtt min/avg/max/mdev = 0.405/0.512/0.620/0.107 ms

The same goes with instance 2 against other instances in the subnet vs instance 1:

ubuntu@sbmysql9:~$ ping 172.30.0.24
PING 172.30.0.24 (172.30.0.24) 56(84) bytes of data.
^C
--- 172.30.0.24 ping statistics ---
10 packets transmitted, 0 received, 100% packet loss, time 9202ms

ubuntu@sbmysql9:~$ ping 172.30.0.9
PING 172.30.0.9 (172.30.0.9) 56(84) bytes of data.
64 bytes from 172.30.0.9: icmp_seq=1 ttl=64 time=0.430 ms
64 bytes from 172.30.0.9: icmp_seq=2 ttl=64 time=0.293 ms
64 bytes from 172.30.0.9: icmp_seq=3 ttl=64 time=0.376 ms
^C
--- 172.30.0.9 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2035ms
rtt min/avg/max/mdev = 0.293/0.366/0.430/0.056 ms

ubuntu@sbmysql9:~$ ping 172.20.26.186 
PING 172.20.26.186 (172.20.26.186) 56(84) bytes of data.
64 bytes from 172.20.26.186: icmp_seq=1 ttl=62 time=0.423 ms
64 bytes from 172.20.26.186: icmp_seq=2 ttl=62 time=0.385 ms
^C
--- 172.20.26.186 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1007ms
rtt min/avg/max/mdev = 0.385/0.404/0.423/0.019 ms

Instance 2 was rebooted (stopped/started) once after it was created. I didn't check the private IP connectivity between the two events so it's unclear if the connectivity was there prior to the instance reboot.

Here are the firewall opte entries from opteadm. The list is VERY long and I haven't been able to interpret what it means... but I'm dumping it here in case it helps.

BRM42220014 # /opt/oxide/opte/bin/opteadm list-ports
LINK                             MAC ADDRESS              IPv4 ADDRESS     EPHEMERAL IPv4   FLOATING IPv4    IPv6 ADDRESS                             EXTERNAL IPv6                            FLOATING IPv6                            STATE   
opte3                            A8:40:25:F9:F8:1B        172.30.0.21      172.20.26.72                      None                                     None                                     None                                     running

opte3-dump-layer-firewall.log
(Note: I was using the instances for some netperf and iperf3 tests. This is why there are a gazillion number of ports in use.)

The text was updated successfully, but these errors were encountered:

askfongjojo · 2024-03-07T05:31:15Z

The firewall entries of the opte port for instance1 look a lot more normal.

BRM44220011 #  /opt/oxide/opte/bin/opteadm list-ports
LINK                             MAC ADDRESS              IPv4 ADDRESS     EPHEMERAL IPv4   FLOATING IPv4    IPv6 ADDRESS                             EXTERNAL IPv6                            FLOATING IPv6                            STATE   
opte0                            A8:40:25:FF:A6:83        172.30.2.5       None             172.20.26.3      None                                     None                                     None                                     running 
opte1                            A8:40:25:F3:E1:06        172.30.0.5       172.20.26.23                      None                                     None                                     None                                     running 
opte3                            A8:40:25:FA:29:E1        172.30.0.24      172.20.26.186                     None                                     None                                     None                                     running 
opte8                            A8:40:25:F7:09:D1        172.30.0.12      172.20.26.51                      None                                     None                                     None                                     running

opte3-dump-layer-firewall-prov-time-16c-32m.log

FelixMcFelix · 2024-03-07T12:15:33Z

So looking into the firewall stats on both sides using kstat -m xde -n opte3_firewall:

Instance 1 Instance 2

module: xde                             instance: 0
name:   opte3_firewall                  class:    net
        add_rule_called                 0
        crtime                          10037.794454369
        flow_ttl                        60
        flows                           0
        in_deny                         0
        in_lft_full                     0
        in_lft_hit                      171
        in_lft_miss                     109
        in_rule_match                   109
        in_rule_nomatch                 0
        in_rules                        26
        lft_capacity                    8096
        out_deny                        0
        out_lft_full                    0
        out_lft_hit                     134
        out_lft_miss                    192
        out_rule_match                  0
        out_rule_nomatch                192
        out_rules                       0
        remove_rule_called              0
        set_rules_called                2
        snaptime                        155957.526347067

module: xde                             instance: 0
name:   opte3_firewall                  class:    net
        add_rule_called                 0
        crtime                          10001.242892785
        flow_ttl                        60
        flows                           0
        in_deny                         0
        in_lft_full                     0
        in_lft_hit                      52
        in_lft_miss                     21
        in_rule_match                   21
        in_rule_nomatch                 0
        in_rules                        26
        lft_capacity                    8096
        out_deny                        0
        out_lft_full                    0
        out_lft_hit                     65
        out_lft_miss                    58
        out_rule_match                  0
        out_rule_nomatch                58
        out_rules                       0
        remove_rule_called              0
        set_rules_called                2
        snaptime                        48804.923530470

It doesn't look like a firewalling issue, which is supported by the default (DEF) deny inbound action having 0 hits. (I've opened oxidecomputer/opte#468 about compressing this output.)

Taking a look at V2P mappings opteadm dump-v2p under VNI 1508093, it looks like BRM42220014 is missing some entries:

kyle@KyleOxide scraps % git diff sled8.log sled16.log
diff --git a/sled8.log b/sled16.log
index 0eddfba..e41d136 100644
--- a/sled8.log
+++ b/sled16.log
@@ -4,15 +4,9 @@ VPC 1508093
 IPv4 mappings
 ----------------------------------------------------------------------
 VPC IP                   VPC MAC ADDR      UNDERLAY IP
-172.30.0.6               A8:40:25:FA:A2:20 fd00:1122:3344:105::1
-172.30.0.8               A8:40:25:FD:E4:2F fd00:1122:3344:105::1
 172.30.0.9               A8:40:25:FB:4B:4C fd00:1122:3344:106::1
 172.30.0.10              A8:40:25:FC:D7:FF fd00:1122:3344:106::1
-172.30.0.11              A8:40:25:F0:F6:95 fd00:1122:3344:105::1
 172.30.0.12              A8:40:25:F7:09:D1 fd00:1122:3344:103::1
-172.30.0.13              A8:40:25:F2:DD:C6 fd00:1122:3344:10a::1
-172.30.0.14              A8:40:25:FA:A8:4D fd00:1122:3344:101::1
-172.30.0.15              A8:40:25:F8:1C:AC fd00:1122:3344:106::1
 172.30.0.16              A8:40:25:F3:7D:A8 fd00:1122:3344:106::1
 172.30.0.17              A8:40:25:FB:E7:50 fd00:1122:3344:10a::1
 172.30.0.18              A8:40:25:F1:A9:EA fd00:1122:3344:105::1
@@ -21,10 +15,6 @@ VPC IP                   VPC MAC ADDR      UNDERLAY IP
 172.30.0.21              A8:40:25:F9:F8:1B fd00:1122:3344:108::1
 172.30.0.22              A8:40:25:F0:1C:50 fd00:1122:3344:105::1
 172.30.0.23              A8:40:25:F4:C8:59 fd00:1122:3344:109::1
-172.30.0.24              A8:40:25:FA:29:E1 fd00:1122:3344:103::1
-172.30.0.25              A8:40:25:F9:D1:DB fd00:1122:3344:105::1
-172.30.0.26              A8:40:25:F1:A3:87 fd00:1122:3344:105::1
-172.30.0.27              A8:40:25:FB:14:E4 fd00:1122:3344:10a::1
 192.168.32.5             A8:40:25:F7:90:73 fd00:1122:3344:101::1
 192.168.32.6             A8:40:25:FD:AD:A7 fd00:1122:3344:106::1
 192.168.32.7             A8:40:25:FC:E0:AE fd00:1122:3344:106::1
@@ -39,11 +29,11 @@ VPC IP                   VPC MAC ADDR      UNDERLAY IP
 192.168.32.16            A8:40:25:F3:F3:4C fd00:1122:3344:10b::1
 192.168.32.17            A8:40:25:F9:B2:26 fd00:1122:3344:109::1
 192.168.32.18            A8:40:25:F8:46:69 fd00:1122:3344:106::1
+192.168.32.19            A8:40:25:F8:55:40 fd00:1122:3344:103::1
 192.168.32.20            A8:40:25:F0:B4:99 fd00:1122:3344:106::1
 192.168.32.21            A8:40:25:F8:3D:31 fd00:1122:3344:10a::1
 192.168.32.22            A8:40:25:F0:B0:86 fd00:1122:3344:101::1
 192.168.32.23            A8:40:25:F2:F5:1C fd00:1122:3344:109::1
-192.168.32.24            A8:40:25:FA:D3:B7 fd00:1122:3344:108::1

 IPv6 mappings
 ----------------------------------------------------------------------

Specifically, BRM42220014 (sled 16) does not have a mapping on your VPC from 172.30.0.24 to fd00:1122:3344:103::1/BRM44220011 (sled 8). But the only OPTE-recorded drops I'm seeing are on the "gateway" layer – I'd expect they should be showing up on the overlay layer when the VPC lookup for the dest fails. E.g.:

https://github.com/oxidecomputer/opte/blob/b85995f92ae94cdc78b97b0a610c69e103e00423/lib/oxide-vpc/src/engine/overlay.rs#L291-L318

askfongjojo · 2024-03-07T17:28:50Z

Here are the most recent start times of the sled-agent and dendrite services on BRM42220014 (sled 16):

BRM42220014 # svcs sled-agent
STATE          STIME    FMRI
online         21:53:17 svc:/oxide/sled-agent:default

root@oxz_switch1:~# svcs dendrite
STATE          STIME    FMRI
online         21:53:59 svc:/oxide/dendrite:default

BRM44220011 has not been rebooted and its sled-agent has been running since the last rack update:

BRM44220011 # svcs sled-agent
STATE          STIME    FMRI
online         1986     svc:/oxide/sled-agent:default

Instance 2 was created after the scrimlet/service restarts:

select id, time_created, time_state_updated, time_deleted from vmm where instance_id = '29e7866a-d504-4629-9ece-8bee96fbab73';
                   id                  |         time_created          |      time_state_updated       |         time_deleted
---------------------------------------+-------------------------------+-------------------------------+--------------------------------
  092e837e-d5af-4a44-835f-2fd56859166b | 2024-03-07 00:30:18.721837+00 | 2024-03-07 00:30:30.829001+00 | NULL
  f57b7848-ef16-4493-ad66-5435f0e74ac8 | 2024-03-07 00:00:59.858202+00 | 2024-03-07 00:27:43.759553+00 | 2024-03-07 00:27:45.674282+00

Instance 1 was created before the scrimlet reboots and remained up and running during the scrimlet/service restarts:

select id, time_created, time_state_updated, time_deleted from vmm where instance_id = 'c856f03c-f45a-4288-94fa-c68b3a283482';
                   id                  |         time_created          |      time_state_updated       |         time_deleted
---------------------------------------+-------------------------------+-------------------------------+--------------------------------
  0933dcfb-da57-4122-af70-485403a5cfbd | 2024-03-02 05:02:53.455631+00 | 2024-03-05 07:05:44.386334+00 | 2024-03-05 07:05:45.051707+00
  cd2e027c-78f8-4a3e-bcea-89a1c83cd295 | 2024-03-05 18:41:40.765275+00 | 2024-03-05 18:41:52.520954+00 | NULL
(2 rows)

askfongjojo · 2024-03-08T00:55:53Z

I've checked that the v2p entries highlighted as missing on BRM42220014 correspond to instances created prior to the reboot. So apparently, the issue is more broadly a failure to backfill v2p entries that exist prior to the sled reboot. This seems to be an area for rpw so I'm reassigning the ticket to @internet-diglett.

askfongjojo · 2024-03-08T23:01:00Z

To be clear, this issue is not a regression and has always been there because v2p mappings are created only during an instance start event. The saga/push approach is a linear way of broadcasting information and doesn't account for exceptions such as sled reboot/panic and sled outage (#4259). The issue is masked to some extent because we usually stop all running instances prior to planned sled reboots or let them fail (and eventually get destroyed) otherwise. We/customers could have run into it in the past during random sled panics but worked around it unknowingly by stopping/starting the unreachable instances.

davepacheco · 2024-04-23T16:13:07Z

For the record: I asked whether this was a blocker for R8 (for delivery of "add sled"). We determined that it's likely not. That's because in R8 we'd be doing "add sled" during the upgrade maintenance window. Because of the way updates work today, all instances would be started after that point (even those that had been running prior to the window). So we shouldn't run into this just because of "add sled".

internet-diglett · 2024-04-23T17:18:00Z

@davepacheco that seems correct. I don't see this causing any issues in that scenario,

TODO --- - [x] Extend db view to include probe v2p mappings - [x] Update sagas to trigger rpw activation instead of directly configuring v2p mappings - [x] Test that the `delete` functionality cleans up v2p mappings Related --- Resolves #5214 Resolves #4259 Resolves #3107 - [x] Depends on oxidecomputer/opte#494 - [x] Depends on oxidecomputer/meta#409 - [x] Depends on oxidecomputer/maghemite#244 --------- Co-authored-by: Levon Tarver <[email protected]>

internet-diglett · 2024-05-30T02:04:48Z

@morlandi7 this should be resolved, but I left it open until someone verifies the work done in #5568 has actually resolved this issue on dogfood.

askfongjojo · 2024-05-31T20:27:29Z

I've checked that the issue is not reproducible on rack2 (which has #5568). I'll repeat my verifications once another related fix in this area (#5845) has landed.

askfongjojo · 2024-06-20T19:26:05Z

Confirmed that the issue can be closed.

askfongjojo assigned FelixMcFelix Mar 7, 2024

FelixMcFelix mentioned this issue Mar 7, 2024

opteadm should summarise firewall src/dst ports oxidecomputer/opte#468

Open

askfongjojo mentioned this issue Mar 7, 2024

Cold boot should handle scrimlet sled-agent restarts #4592

Closed

morlandi7 added this to the 7 milestone Mar 7, 2024

askfongjojo assigned internet-diglett and unassigned FelixMcFelix Mar 8, 2024

askfongjojo modified the milestones: 7, 8 Mar 8, 2024

askfongjojo changed the title ~~A certain pair of instances are unable to reach each other on private IPs~~ New instances in a rebooted sled are unable to reach existing instances in other sleds on their private IPs Mar 8, 2024

askfongjojo added the known issue To include in customer documentation and training label Mar 9, 2024

internet-diglett mentioned this issue Apr 19, 2024

RPW for OPTE v2p Mappings #5568

Merged

7 tasks

askfongjojo modified the milestones: 8, 9 May 7, 2024

internet-diglett closed this as completed in #5568 May 22, 2024

internet-diglett reopened this May 22, 2024

askfongjojo closed this as completed Jun 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New instances in a rebooted sled are unable to reach existing instances in other sleds on their private IPs #5214

New instances in a rebooted sled are unable to reach existing instances in other sleds on their private IPs #5214

askfongjojo commented Mar 7, 2024 •

edited by FelixMcFelix

Loading

askfongjojo commented Mar 7, 2024

FelixMcFelix commented Mar 7, 2024 •

edited

Loading

askfongjojo commented Mar 7, 2024 •

edited

Loading

askfongjojo commented Mar 8, 2024

askfongjojo commented Mar 8, 2024 •

edited

Loading

davepacheco commented Apr 23, 2024

internet-diglett commented Apr 23, 2024

internet-diglett commented May 30, 2024

askfongjojo commented May 31, 2024

askfongjojo commented Jun 20, 2024

New instances in a rebooted sled are unable to reach existing instances in other sleds on their private IPs #5214

New instances in a rebooted sled are unable to reach existing instances in other sleds on their private IPs #5214

Comments

askfongjojo commented Mar 7, 2024 • edited by FelixMcFelix Loading

askfongjojo commented Mar 7, 2024

FelixMcFelix commented Mar 7, 2024 • edited Loading

askfongjojo commented Mar 7, 2024 • edited Loading

askfongjojo commented Mar 8, 2024

askfongjojo commented Mar 8, 2024 • edited Loading

davepacheco commented Apr 23, 2024

internet-diglett commented Apr 23, 2024

internet-diglett commented May 30, 2024

askfongjojo commented May 31, 2024

askfongjojo commented Jun 20, 2024

askfongjojo commented Mar 7, 2024 •

edited by FelixMcFelix

Loading

FelixMcFelix commented Mar 7, 2024 •

edited

Loading

askfongjojo commented Mar 7, 2024 •

edited

Loading

askfongjojo commented Mar 8, 2024 •

edited

Loading