Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Internal routing issues on arcus #1308

Open
millingw opened this issue Feb 21, 2024 · 6 comments
Open

Internal routing issues on arcus #1308

millingw opened this issue Feb 21, 2024 · 6 comments

Comments

@millingw
Copy link
Collaborator

We are seeing internal connectivity issues within arcus between OpenStack nodes and the object storage service.

From an arcus OpenStack VM, the following download fails:

$ wget https://object.arcus.openstack.hpc.cam.ac.uk/swift/v1/AUTH_e216e6b502134b6185380be6ccd0bf09/archive/zeppelin-0.10.1-gaia-dmp-0.1.tar.gz
--2024-02-21 09:39:46--  https://object.arcus.openstack.hpc.cam.ac.uk/swift/v1/AUTH_e216e6b502134b6185380be6ccd0bf09/archive/zeppelin-0.10.1-gaia-dmp-0.1.tar.gz
Resolving object.arcus.openstack.hpc.cam.ac.uk (object.arcus.openstack.hpc.cam.ac.uk)... 128.232.222.148, 128.232.222.24
Connecting to object.arcus.openstack.hpc.cam.ac.uk (object.arcus.openstack.hpc.cam.ac.uk)|128.232.222.148|:443... failed: No route to host.
Connecting to object.arcus.openstack.hpc.cam.ac.uk (object.arcus.openstack.hpc.cam.ac.uk)|128.232.222.24|:443... failed: No route to host.

However, the download works fine when issued from an external VM (in this case, an EIDF OpenStack VM)

# wget https://object.arcus.openstack.hpc.cam.ac.uk/swift/v1/AUTH_e216e6b502134b6185380be6ccd0bf09/archive/zeppelin-0.10.1-gaia-dmp-0.1.tar.gz
--2024-02-21 11:02:20--  https://object.arcus.openstack.hpc.cam.ac.uk/swift/v1/AUTH_e216e6b502134b6185380be6ccd0bf09/archive/zeppelin-0.10.1-gaia-dmp-0.1.tar.gz
Resolving object.arcus.openstack.hpc.cam.ac.uk (object.arcus.openstack.hpc.cam.ac.uk)... 128.232.222.24, 128.232.222.148
Connecting to object.arcus.openstack.hpc.cam.ac.uk (object.arcus.openstack.hpc.cam.ac.uk)|128.232.222.24|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1716996866 (1.6G) [application/gzip]
Saving to: 'zeppelin-0.10.1-gaia-dmp-0.1.tar.gz'

zeppelin-0.10.1-gaia-dmp-0.1.tar.gz          100%[===========================================================================================>]   1.60G   111MB/s    in 16s     

2024-02-21 11:02:37 (100 MB/s) - 'zeppelin-0.10.1-gaia-dmp-0.1.tar.gz' saved [1716996866/1716996866]

External downloads to the arcus VM appears unaffected, e.g. issuing the following download works fine on the arcus VM:

wget https://downloads.apache.org/zeppelin/zeppelin-0.11.0/zeppelin-0.11.0-bin-all.tgz

Therefore, we think there are currently internal routing errors within the arcus service

@millingw
Copy link
Collaborator Author

Details of the VM we've experienced this on:

#  openstack \
        --os-cloud "${cloudname:?}" \
        server list

...

c76e93d4-3709-427c-880d-a4d3a33e6935 | iris-gaia-blue-20240221-zeppelin | ACTIVE | iris-gaia-blue-20240221-internal-network=10.10.3.8, 128.232.226.23 | gaia-dmp-fedora-cloud-38-1.6 | gaia.vm.cclake.54vcpu |

@stvoutsin
Copy link
Collaborator

Potentially related, but it seems that network traffic is failing between different projects in Openstack, using their floating IPs
Info can be found here:
#1304

@Zarquan
Copy link
Collaborator

Zarquan commented Feb 22, 2024

Corresponding Cambridge HPC support ticket:
https://ucam-rcs.atlassian.net/servicedesk/customer/portal/4/HPCSSUP-67058

@Zarquan
Copy link
Collaborator

Zarquan commented Feb 22, 2024

Connection fails trying to ssh from a VM in one project on Arcus (iris-gaia-green) to a VM in another project on Arcus (iris-gaia-data) using the target VMs public IP address (128.232.222.153).

Source VM:

  • project: de5ddc6b4d1e445bb73e45c7b8971673 (iris-gaia-green)
  • server: 76e46802-d35e-4018-8dd7-c6ea302a74af

Target VM:

  • project: e216e6b502134b6185380be6ccd0bf09 (iris-gaia-data)
  • server: 6556a1f3-3182-4d97-8013-01de1c081c95
  • address: 128.232.222.153
hostname

    iris-gaia-green-20231027-zeppelin

host data.gaia-dmp.uk

    data.gaia-dmp.uk is an alias for iris-gaia-data.duckdns.org.
    iris-gaia-data.duckdns.org has address 128.232.222.153

ssh -v data.gaia-dmp.uk

    OpenSSH_8.0p1, OpenSSL 1.1.1d FIPS  10 Sep 2019
    ....
    debug1: Connecting to data.gaia-dmp.uk [128.232.222.153] port 22.
    debug1: connect to address 128.232.222.153 port 22: Connection timed out
    ssh: connect to host data.gaia-dmp.uk port 22: Connection timed out

@stvoutsin
Copy link
Collaborator

Connection also fails trying to connect via HTTP from one project in Arcus (iris-gaia-data) to a VM on a different Arcus project (iris-gaia-red) using the floating IP:

IP of VM on iris-gaia-red: 128.232.226.64

From source VM (on iris-gaia-data):

curl http://128.232.226.64
curl: (7) Failed to connect to 128.232.226.64 port 80: No route to host

From local machine (outside Arcus):

    curl http://128.232.226.64
          <html>
          <head><title>301 Moved Permanently</title></head>
          <body>
          <center><h1>301 Moved Permanently</h1></center>
          <hr><center>nginx/1.24.0</center>
          </body>

@Zarquan
Copy link
Collaborator

Zarquan commented Mar 4, 2024

@millingw Can you check that this is now fixed.
If it is, then we need to update the corresponding ticket on the Cambridge HPC system.
https://ucam-rcs.atlassian.net/servicedesk/customer/portal/4/HPCSSUP-67058

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants