Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provider closing leases randomly on pod restart. #270

Closed
Zblocker64 opened this issue Dec 6, 2024 · 2 comments
Closed

Provider closing leases randomly on pod restart. #270

Zblocker64 opened this issue Dec 6, 2024 · 2 comments

Comments

@Zblocker64
Copy link

Describe the bug
Provider pod is deleting pods but not closing leases on chain for no reason.

To Reproduce
Steps to reproduce the behavior:
Rebooted node to preform maintenance. Scaled down provider pod before shutting down. Scaled up after reboot and all pods were running. After node caught up, see logs bellow.

Expected behavior
Pod should return to normal, and bid, not close active leases with pods running.

Defaulted container "provider" out of: provider, init (init)

WARNING: apt does not have a stable CLI interface. Use with caution in scripts.

16 packages can be upgraded. Run 'apt list --upgradable' to see them.

WARNING: apt does not have a stable CLI interface. Use with caution in scripts.

ca-certificates is already the newest version (20210119).
mawk is already the newest version (1.3.4.20200120-2).
mawk set to manually installed.
The following additional packages will be installed:
  libbrotli1 libcurl4 libjq1 libldap-2.4-2 libncurses6 libnghttp2-14 libonig5
  libpsl5 libreadline8 librtmp1 libsasl2-2 libsasl2-modules-db libssh2-1
  readline-common
Suggested packages:
  readline-doc
Recommended packages:
  libldap-common libgpm2 publicsuffix libsasl2-modules
The following NEW packages will be installed:
  bc curl jq libbrotli1 libcurl4 libjq1 libldap-2.4-2 libncurses6
  libnghttp2-14 libonig5 libpsl5 libreadline8 librtmp1 libsasl2-2
  libsasl2-modules-db libssh2-1 readline-common
debconf: delaying package configuration, since apt-utils is not installed
0 upgraded, 17 newly installed, 0 to remove and 16 not upgraded.
Need to get 2494 kB of archives.
After this operation, 5943 kB of additional disk space will be used.
Selecting previously unselected package readline-common.
(Reading database ... 6987 files and directories currently installed.)
Preparing to unpack .../00-readline-common_8.1-1_all.deb ...
Unpacking readline-common (8.1-1) ...
Selecting previously unselected package libreadline8:amd64.
Preparing to unpack .../01-libreadline8_8.1-1_amd64.deb ...
Unpacking libreadline8:amd64 (8.1-1) ...
Selecting previously unselected package libncurses6:amd64.
Preparing to unpack .../02-libncurses6_6.2+20201114-2+deb11u2_amd64.deb ...
Unpacking libncurses6:amd64 (6.2+20201114-2+deb11u2) ...
Selecting previously unselected package bc.
Preparing to unpack .../03-bc_1.07.1-2+b2_amd64.deb ...
Unpacking bc (1.07.1-2+b2) ...
Selecting previously unselected package libbrotli1:amd64.
Preparing to unpack .../04-libbrotli1_1.0.9-2+b2_amd64.deb ...
Unpacking libbrotli1:amd64 (1.0.9-2+b2) ...
Selecting previously unselected package libsasl2-modules-db:amd64.
Preparing to unpack .../05-libsasl2-modules-db_2.1.27+dfsg-2.1+deb11u1_amd64.deb ...
Unpacking libsasl2-modules-db:amd64 (2.1.27+dfsg-2.1+deb11u1) ...
Selecting previously unselected package libsasl2-2:amd64.
Preparing to unpack .../06-libsasl2-2_2.1.27+dfsg-2.1+deb11u1_amd64.deb ...
Unpacking libsasl2-2:amd64 (2.1.27+dfsg-2.1+deb11u1) ...
Selecting previously unselected package libldap-2.4-2:amd64.
Preparing to unpack .../07-libldap-2.4-2_2.4.57+dfsg-3+deb11u1_amd64.deb ...
Unpacking libldap-2.4-2:amd64 (2.4.57+dfsg-3+deb11u1) ...
Selecting previously unselected package libnghttp2-14:amd64.
Preparing to unpack .../08-libnghttp2-14_1.43.0-1+deb11u2_amd64.deb ...
Unpacking libnghttp2-14:amd64 (1.43.0-1+deb11u2) ...
Selecting previously unselected package libpsl5:amd64.
Preparing to unpack .../09-libpsl5_0.21.0-1.2_amd64.deb ...
Unpacking libpsl5:amd64 (0.21.0-1.2) ...
Selecting previously unselected package librtmp1:amd64.
Preparing to unpack .../10-librtmp1_2.4+20151223.gitfa8646d.1-2+b2_amd64.deb ...
Unpacking librtmp1:amd64 (2.4+20151223.gitfa8646d.1-2+b2) ...
Selecting previously unselected package libssh2-1:amd64.
Preparing to unpack .../11-libssh2-1_1.9.0-2+deb11u1_amd64.deb ...
Unpacking libssh2-1:amd64 (1.9.0-2+deb11u1) ...
Selecting previously unselected package libcurl4:amd64.
Preparing to unpack .../12-libcurl4_7.74.0-1.3+deb11u14_amd64.deb ...
Unpacking libcurl4:amd64 (7.74.0-1.3+deb11u14) ...
Selecting previously unselected package curl.
Preparing to unpack .../13-curl_7.74.0-1.3+deb11u14_amd64.deb ...
Unpacking curl (7.74.0-1.3+deb11u14) ...
Selecting previously unselected package libonig5:amd64.
Preparing to unpack .../14-libonig5_6.9.6-1.1_amd64.deb ...
Unpacking libonig5:amd64 (6.9.6-1.1) ...
Selecting previously unselected package libjq1:amd64.
Preparing to unpack .../15-libjq1_1.6-2.1_amd64.deb ...
Unpacking libjq1:amd64 (1.6-2.1) ...
Selecting previously unselected package jq.
Preparing to unpack .../16-jq_1.6-2.1_amd64.deb ...
Unpacking jq (1.6-2.1) ...
Setting up libpsl5:amd64 (0.21.0-1.2) ...
Setting up libbrotli1:amd64 (1.0.9-2+b2) ...
Setting up libnghttp2-14:amd64 (1.43.0-1+deb11u2) ...
Setting up libsasl2-modules-db:amd64 (2.1.27+dfsg-2.1+deb11u1) ...
Setting up librtmp1:amd64 (2.4+20151223.gitfa8646d.1-2+b2) ...
Setting up libncurses6:amd64 (6.2+20201114-2+deb11u2) ...
Setting up libsasl2-2:amd64 (2.1.27+dfsg-2.1+deb11u1) ...
Setting up libssh2-1:amd64 (1.9.0-2+deb11u1) ...
Setting up readline-common (8.1-1) ...
Setting up libonig5:amd64 (6.9.6-1.1) ...
Setting up libjq1:amd64 (1.6-2.1) ...
Setting up libreadline8:amd64 (8.1-1) ...
Setting up bc (1.07.1-2+b2) ...
Setting up libldap-2.4-2:amd64 (2.4.57+dfsg-3+deb11u1) ...
Setting up jq (1.6-2.1) ...
Setting up libcurl4:amd64 (7.74.0-1.3+deb11u14) ...
Setting up curl (7.74.0-1.3+deb11u14) ...
Processing triggers for libc-bin (2.31-13+deb11u10) ...
curl is /usr/bin/curl
jq is /usr/bin/jq
awk is /usr/bin/awk
bc is /usr/bin/bc
+ apt update

WARNING: apt does not have a stable CLI interface. Use with caution in scripts.

Hit:1 http://deb.debian.org/debian bullseye InRelease
Hit:2 http://deb.debian.org/debian-security bullseye-security InRelease
Hit:3 http://deb.debian.org/debian bullseye-updates InRelease
Reading package lists...
Building dependency tree...
Reading state information...
16 packages can be upgraded. Run 'apt list --upgradable' to see them.
+ apt -yqq install curl jq bc netcat ca-certificates

WARNING: apt does not have a stable CLI interface. Use with caution in scripts.

bc is already the newest version (1.07.1-2+b2).
ca-certificates is already the newest version (20210119).
jq is already the newest version (1.6-2.1).
curl is already the newest version (7.74.0-1.3+deb11u14).
The following additional packages will be installed:
  libbsd0 libmd0 netcat-openbsd
The following NEW packages will be installed:
  libbsd0 libmd0 netcat netcat-openbsd
debconf: delaying package configuration, since apt-utils is not installed
0 upgraded, 4 newly installed, 0 to remove and 16 not upgraded.
Need to get 187 kB of archives.
After this operation, 404 kB of additional disk space will be used.
Selecting previously unselected package libmd0:amd64.
(Reading database ... 7145 files and directories currently installed.)
Preparing to unpack .../libmd0_1.0.3-3_amd64.deb ...
Unpacking libmd0:amd64 (1.0.3-3) ...
Selecting previously unselected package libbsd0:amd64.
Preparing to unpack .../libbsd0_0.11.3-1+deb11u1_amd64.deb ...
Unpacking libbsd0:amd64 (0.11.3-1+deb11u1) ...
Selecting previously unselected package netcat-openbsd.
Preparing to unpack .../netcat-openbsd_1.217-3_amd64.deb ...
Unpacking netcat-openbsd (1.217-3) ...
Selecting previously unselected package netcat.
Preparing to unpack .../netcat_1.10-46_all.deb ...
Unpacking netcat (1.10-46) ...
Setting up libmd0:amd64 (1.0.3-3) ...
Setting up libbsd0:amd64 (0.11.3-1+deb11u1) ...
Setting up netcat-openbsd (1.217-3) ...
update-alternatives: using /bin/nc.openbsd to provide /bin/nc (nc) in auto mode
Setting up netcat (1.10-46) ...
Processing triggers for libc-bin (2.31-13+deb11u10) ...
+ type curl
curl is /usr/bin/curl
+ type jq
jq is /usr/bin/jq
+ type nc
nc is /bin/nc
++ echo http://akash-node-1:26657
++ cut -d: -f2
++ cut -d/ -f3
+ solo_ip=akash-node-1
++ echo http://akash-node-1:26657
++ cut -d: -f3
++ cut -d/ -f1
+ port=26657
+ [[ http://akash-node-1:26657 != \h\t\t\p\:\/\/\a\k\a\s\h\-\n\o\d\e\-\1\:\2\6\6\5\7 ]]
++ curl -s http://akash-node-1:26657/status
++ jq -r .result.sync_info.catching_up
+ [[ false == \f\a\l\s\e ]]
++ curl -s http://akash-node-1:26657/status
++ jq -r .result.sync_info.latest_block_time
+ DATE_AKASH=2024-12-06T22:09:04.62514059Z
++ date +%s --date 2024-12-06T22:09:04.62514059Z
+ TS_AKASH=1733522944
++ date +%s
+ TS=1733522954
++ echo '1733522954 - 1733522944'
++ bc
+ DIFF=10
+ [[ 10 -gt 30 ]]
+ [[ 10 -lt -30 ]]
Last block Akash RPC http://akash-node-1:26657 seen was 10 seconds ago => OK
+ echo 'Last block Akash RPC http://akash-node-1:26657 seen was 10 seconds ago => OK'
++ provider-services keys show akash1ut3m97h62tty06qdq9lds85r34dxe3snjj0xfe -a
+ PROVIDER_ADDRESS=akash1ut3m97h62tty06qdq9lds85r34dxe3snjj0xfe
+ [[ -z akash1ut3m97h62tty06qdq9lds85r34dxe3snjj0xfe ]]
+ CERT_SYMLINK=/root/.akash/akash1ut3m97h62tty06qdq9lds85r34dxe3snjj0xfe.pem
+ CERT_REAL_PATH=/config/provider.pem
+ rm -vf /root/.akash/akash1ut3m97h62tty06qdq9lds85r34dxe3snjj0xfe.pem
removed '/root/.akash/akash1ut3m97h62tty06qdq9lds85r34dxe3snjj0xfe.pem'
+ ln -sv /config/provider.pem /root/.akash/akash1ut3m97h62tty06qdq9lds85r34dxe3snjj0xfe.pem
'/root/.akash/akash1ut3m97h62tty06qdq9lds85r34dxe3snjj0xfe.pem' -> '/config/provider.pem'
+ GEN_NEW_CERT=1
+ [[ -f /config/provider.pem ]]
++ cat /config/provider.pem
++ openssl x509 -serial -noout
++ cut -d= -f2
+ LOCAL_CERT_SN=180E9E9FA2FC8826
++ echo 'obase=10; ibase=16; 180E9E9FA2FC8826'
++ bc
+ LOCAL_CERT_SN_DECIMAL=1733497315055667238
++ AKASH_OUTPUT=json
++ provider-services query cert list --owner akash1ut3m97h62tty06qdq9lds85r34dxe3snjj0xfe --state valid --serial 1733497315055667238 --reverse
++ jq -r '.certificates[0].certificate.state'
Provider certificate serial number: 180E9E9FA2FC8826, status on chain: valid
+ REMOTE_CERT_STATUS=valid
+ echo 'Provider certificate serial number: 180E9E9FA2FC8826, status on chain: valid'
+ [[ -z 180E9E9FA2FC8826 ]]
+ [[ valid != \v\a\l\i\d ]]
+ AKASH_OUTPUT=json
+ provider-services query cert list --owner akash1ut3m97h62tty06qdq9lds85r34dxe3snjj0xfe --state valid --reverse
+ jq -r '.certificates[0].certificate.cert'
+ openssl base64 -A -d
+ openssl x509 -checkend 604800 -noout
+ rc=0
+ [[ 0 -ne 0 ]]
+ openssl x509 -checkend 604800 -noout -in /config/provider.pem
+ rc=0
+ [[ 0 -ne 0 ]]
+ [[ 1 -eq 0 ]]
I[2024-12-06|22:09:15.176] using in cluster kube config                 cmp=provider
D[2024-12-06|22:09:15.355] service being found via autodetection        cmp=provider service=hostname-operator
I[2024-12-06|22:09:15.358] dns discovery success                        cmp=provider cmp=service-discovery-agent addrs="[{Target:operator-hostname.akash-services.svc.cluster.local. Port:8080 Priority:0 Weight:100}]" portName=rest service-name=operator-hostname namespace=akash-services
D[2024-12-06|22:09:15.358] satisfying pending requests                  cmp=provider cmp=service-discovery-agent qty=1
I[2024-12-06|22:09:15.359] check result                                 cmp=provider operator=hostname status=200
I[2024-12-06|22:09:15.359] ready                                        cmp=provider cmp=waiter waitable="<*hostname.client 0xc0005c3200>"
I[2024-12-06|22:09:15.359] all waitables ready                          cmp=provider cmp=waiter
I[2024-12-06|22:09:15.362] starting with existing reservations          module=provider-cluster cmp=provider cmp=service cmp=inventory-service qty=4
I[2024-12-06|22:09:15.363] dialing inventory operator at operator-inventory.akash-services.svc.cluster.local:8081 cmp=provider inventory=(MISSING)
D[2024-12-06|22:09:15.365] cluster resources dump={"nodes":[{"name":"4090-server","allocatable":{"cpu":200000,"gpu":6,"memory":412494336000,"storage_ephemeral":1504968452759},"available":{"cpu":189100,"gpu":2,"memory":321924632576,"storage_ephemeral":1247270414999}}],"total_allocatable":{"cpu":200000,"gpu":6,"memory":412494336000,"storage_ephemeral":1504968452759,"storage":{"beta3":2651719467008}},"total_available":{"cpu":189100,"gpu":2,"memory":321924632576,"storage_ephemeral":1247270414999,"storage":{"beta3":2651719467008}}} module=provider-cluster cmp=provider cmp=service cmp=inventory-service
E[2024-12-06|22:09:15.365] adjust inventory for pending reservation     module=provider-cluster cmp=provider cmp=service cmp=inventory-service error="insufficient capacity"
E[2024-12-06|22:09:15.365] adjust inventory for pending reservation     module=provider-cluster cmp=provider cmp=service cmp=inventory-service error="insufficient capacity"
D[2024-12-06|22:09:15.366] found existing hostname                      module=provider-cluster cmp=provider cmp=service hostname=edad4oj2bpc9df7o50qtdjqka0.ingress.4090.akashgpu.com id=akash18wt6gwlnhxmpkynhe34czqvfl0376kn20lgu3e/19235757/1/1/akash1ut3m97h62tty06qdq9lds85r34dxe3snjj0xfe
D[2024-12-06|22:09:15.366] found existing hostname                      module=provider-cluster cmp=provider cmp=service hostname=hehcqbjs7hfdleglbc1lo78s7k.ingress.4090.akashgpu.com id=akash1r6fms9zf0tcsue6806kjmtch2cjpcx44ksk6kd/19234820/1/1/akash1ut3m97h62tty06qdq9lds85r34dxe3snjj0xfe
D[2024-12-06|22:09:15.366] found existing hostname                      module=provider-cluster cmp=provider cmp=service hostname=psoev893vldmr0dr8ur195qtvo.ingress.4090.akashgpu.com id=akash1r6fms9zf0tcsue6806kjmtch2cjpcx44ksk6kd/19234834/1/1/akash1ut3m97h62tty06qdq9lds85r34dxe3snjj0xfe
D[2024-12-06|22:09:15.366] found existing hostname                      module=provider-cluster cmp=provider cmp=service hostname=v0arkstged9l1fc7d0cotnndvk.ingress.4090.akashgpu.com id=akash1uranz3l3wfuzs7hpxt8nqwr5v050l2xfv0rsak/19236041/1/1/akash1ut3m97h62tty06qdq9lds85r34dxe3snjj0xfe
D[2024-12-06|22:09:20.359] cluster resources dump={"nodes":[],"total_allocatable":{"cpu":0,"gpu":0,"memory":0,"storage_ephemeral":0},"total_available":{"cpu":0,"gpu":0,"memory":0,"storage_ephemeral":0}} module=provider-cluster cmp=provider cmp=service cmp=inventory-service
E[2024-12-06|22:09:20.359] adjust inventory for pending reservation     module=provider-cluster cmp=provider cmp=service cmp=inventory-service error="insufficient capacity"
E[2024-12-06|22:09:20.359] adjust inventory for pending reservation     module=provider-cluster cmp=provider cmp=service cmp=inventory-service error="insufficient capacity"
E[2024-12-06|22:09:20.359] adjust inventory for pending reservation     module=provider-cluster cmp=provider cmp=service cmp=inventory-service error="insufficient capacity"
E[2024-12-06|22:09:20.359] adjust inventory for pending reservation     module=provider-cluster cmp=provider cmp=service cmp=inventory-service error="insufficient capacity"
D[2024-12-06|22:09:20.359] cluster resources dump={"nodes":[],"total_allocatable":{"cpu":0,"gpu":0,"memory":0,"storage_ephemeral":0},"total_available":{"cpu":0,"gpu":0,"memory":0,"storage_ephemeral":0}} module=provider-cluster cmp=provider cmp=service cmp=inventory-service
E[2024-12-06|22:09:20.359] adjust inventory for pending reservation     module=provider-cluster cmp=provider cmp=service cmp=inventory-service error="insufficient capacity"
E[2024-12-06|22:09:20.359] adjust inventory for pending reservation     module=provider-cluster cmp=provider cmp=service cmp=inventory-service error="insufficient capacity"
E[2024-12-06|22:09:20.359] adjust inventory for pending reservation     module=provider-cluster cmp=provider cmp=service cmp=inventory-service error="insufficient capacity"
E[2024-12-06|22:09:20.359] adjust inventory for pending reservation     module=provider-cluster cmp=provider cmp=service cmp=inventory-service error="insufficient capacity"
I[2024-12-06|22:09:20.362] dialing inventory operator at operator-inventory.akash-services.svc.cluster.local:8081 cmp=provider inventory=(MISSING)
D[2024-12-06|22:09:20.363] cluster resources dump={"nodes":[{"name":"4090-server","allocatable":{"cpu":200000,"gpu":6,"memory":412494336000,"storage_ephemeral":1504968452759},"available":{"cpu":189100,"gpu":2,"memory":321924632576,"storage_ephemeral":1247270414999}}],"total_allocatable":{"cpu":200000,"gpu":6,"memory":412494336000,"storage_ephemeral":1504968452759,"storage":{"beta3":2651719467008}},"total_available":{"cpu":189100,"gpu":2,"memory":321924632576,"storage_ephemeral":1247270414999,"storage":{"beta3":2651719467008}}} module=provider-cluster cmp=provider cmp=service cmp=inventory-service
E[2024-12-06|22:09:20.363] adjust inventory for pending reservation     module=provider-cluster cmp=provider cmp=service cmp=inventory-service error="insufficient capacity"
E[2024-12-06|22:09:20.363] adjust inventory for pending reservation     module=provider-cluster cmp=provider cmp=service cmp=inventory-service error="insufficient capacity"
E[2024-12-06|22:09:29.317] error querying open orders:                  module=bidengine-service cmp=provider err="post failed: Post "http://akash-node-1:26657": EOF"
E[2024-12-06|22:09:29.317] finding existing orders                      module=bidengine-service cmp=provider err="post failed: Post "http://akash-node-1:26657": EOF"
E[2024-12-06|22:09:29.317] creating bidengine service                   module=provider-service cmp=provider err="post failed: Post "http://akash-node-1:26657": EOF"
D[2024-12-06|22:09:29.317] shutting down                                module=provider-service cmp=provider cmp=balance-checker
D[2024-12-06|22:09:29.317] draining deployment managers...              module=provider-cluster cmp=provider cmp=service qty=4
D[2024-12-06|22:09:29.317] shutting down                                module=provider-cluster cmp=provider cmp=service cmp=deployment-manager lease=akash1r6fms9zf0tcsue6806kjmtch2cjpcx44ksk6kd/19234820/1/1/akash1ut3m97h62tty06qdq9lds85r34dxe3snjj0xfe manifest-group=akash
D[2024-12-06|22:09:29.317] shutting down                                module=provider-cluster cmp=provider cmp=service cmp=inventory-service
D[2024-12-06|22:09:29.317] shutdown complete                            module=provider-cluster cmp=provider cmp=service cmp=inventory-service
D[2024-12-06|22:09:29.317] shutting down                                module=provider-cluster cmp=provider cmp=service cmp=deployment-manager lease=akash1r6fms9zf0tcsue6806kjmtch2cjpcx44ksk6kd/19234834/1/1/akash1ut3m97h62tty06qdq9lds85r34dxe3snjj0xfe manifest-group=akash
D[2024-12-06|22:09:29.317] shutting down                                module=provider-cluster cmp=provider cmp=service cmp=deployment-manager lease=akash1uranz3l3wfuzs7hpxt8nqwr5v050l2xfv0rsak/19236041/1/1/akash1ut3m97h62tty06qdq9lds85r34dxe3snjj0xfe manifest-group=akash
D[2024-12-06|22:09:29.317] shutting down                                module=provider-cluster cmp=provider cmp=service cmp=deployment-manager lease=akash18wt6gwlnhxmpkynhe34czqvfl0376kn20lgu3e/19235757/1/1/akash1ut3m97h62tty06qdq9lds85r34dxe3snjj0xfe manifest-group=akash
E[2024-12-06|22:09:29.634] lease query failed                           module=provider-cluster cmp=provider cmp=service cmp=deployment-manager lease=akash18wt6gwlnhxmpkynhe34czqvfl0376kn20lgu3e/19235757/1/1/akash1ut3m97h62tty06qdq9lds85r34dxe3snjj0xfe manifest-group=akash err=(MISSING)
E[2024-12-06|22:09:30.206] lease query failed                           module=provider-cluster cmp=provider cmp=service cmp=deployment-manager lease=akash1r6fms9zf0tcsue6806kjmtch2cjpcx44ksk6kd/19234820/1/1/akash1ut3m97h62tty06qdq9lds85r34dxe3snjj0xfe manifest-group=akash err=(MISSING)
E[2024-12-06|22:09:30.226] lease query failed                           module=provider-cluster cmp=provider cmp=service cmp=deployment-manager lease=akash1uranz3l3wfuzs7hpxt8nqwr5v050l2xfv0rsak/19236041/1/1/akash1ut3m97h62tty06qdq9lds85r34dxe3snjj0xfe manifest-group=akash err=(MISSING)
E[2024-12-06|22:09:30.229] lease query failed                           module=provider-cluster cmp=provider cmp=service cmp=deployment-manager lease=akash1r6fms9zf0tcsue6806kjmtch2cjpcx44ksk6kd/19234834/1/1/akash1ut3m97h62tty06qdq9lds85r34dxe3snjj0xfe manifest-group=akash err=(MISSING)
D[2024-12-06|22:09:30.295] read from runch during shutdown              module=provider-cluster cmp=provider cmp=service cmp=deployment-manager lease=akash18wt6gwlnhxmpkynhe34czqvfl0376kn20lgu3e/19235757/1/1/akash1ut3m97h62tty06qdq9lds85r34dxe3snjj0xfe manifest-group=akash
D[2024-12-06|22:09:30.295] waiting on dm.wg                             module=provider-cluster cmp=provider cmp=service cmp=deployment-manager lease=akash18wt6gwlnhxmpkynhe34czqvfl0376kn20lgu3e/19235757/1/1/akash1ut3m97h62tty06qdq9lds85r34dxe3snjj0xfe manifest-group=akash
I[2024-12-06|22:09:30.295] shutting down unclean, running teardown now  module=provider-cluster cmp=provider cmp=service cmp=deployment-manager lease=akash18wt6gwlnhxmpkynhe34czqvfl0376kn20lgu3e/19235757/1/1/akash1ut3m97h62tty06qdq9lds85r34dxe3snjj0xfe manifest-group=akash
D[2024-12-06|22:09:30.297] purged ips                                   module=provider-cluster cmp=provider cmp=service cmp=deployment-manager lease=akash18wt6gwlnhxmpkynhe34czqvfl0376kn20lgu3e/19235757/1/1/akash1ut3m97h62tty06qdq9lds85r34dxe3snjj0xfe manifest-group=akash
D[2024-12-06|22:09:30.306] purged hostnames                             module=provider-cluster cmp=provider cmp=service cmp=deployment-manager lease=akash18wt6gwlnhxmpkynhe34czqvfl0376kn20lgu3e/19235757/1/1/akash1ut3m97h62tty06qdq9lds85r34dxe3snjj0xfe manifest-group=akash
I[2024-12-06|22:09:30.309] shutdown complete                            module=provider-cluster cmp=provider cmp=service cmp=deployment-manager lease=akash18wt6gwlnhxmpkynhe34czqvfl0376kn20lgu3e/19235757/1/1/akash1ut3m97h62tty06qdq9lds85r34dxe3snjj0xfe manifest-group=akash
D[2024-12-06|22:09:30.309] hostnames released                           module=provider-cluster cmp=provider cmp=service cmp=deployment-manager lease=akash18wt6gwlnhxmpkynhe34czqvfl0376kn20lgu3e/19235757/1/1/akash1ut3m97h62tty06qdq9lds85r34dxe3snjj0xfe manifest-group=akash
D[2024-12-06|22:09:30.309] sending manager into channel                 module=provider-cluster cmp=provider cmp=service cmp=deployment-manager lease=akash18wt6gwlnhxmpkynhe34czqvfl0376kn20lgu3e/19235757/1/1/akash1ut3m97h62tty06qdq9lds85r34dxe3snjj0xfe manifest-group=akash
D[2024-12-06|22:09:30.309] manager done                                 module=provider-cluster cmp=provider cmp=service lease=akash18wt6gwlnhxmpkynhe34czqvfl0376kn20lgu3e/19235757/1/1/akash1ut3m97h62tty06qdq9lds85r34dxe3snjj0xfe
D[2024-12-06|22:09:30.588] read from runch during shutdown              module=provider-cluster cmp=provider cmp=service cmp=deployment-manager lease=akash1r6fms9zf0tcsue6806kjmtch2cjpcx44ksk6kd/19234820/1/1/akash1ut3m97h62tty06qdq9lds85r34dxe3snjj0xfe manifest-group=akash
D[2024-12-06|22:09:30.588] waiting on dm.wg                             module=provider-cluster cmp=provider cmp=service cmp=deployment-manager lease=akash1r6fms9zf0tcsue6806kjmtch2cjpcx44ksk6kd/19234820/1/1/akash1ut3m97h62tty06qdq9lds85r34dxe3snjj0xfe manifest-group=akash
I[2024-12-06|22:09:30.589] shutting down unclean, running teardown now  module=provider-cluster cmp=provider cmp=service cmp=deployment-manager lease=akash1r6fms9zf0tcsue6806kjmtch2cjpcx44ksk6kd/19234820/1/1/akash1ut3m97h62tty06qdq9lds85r34dxe3snjj0xfe manifest-group=akash
D[2024-12-06|22:09:30.589] read from runch during shutdown              module=provider-cluster cmp=provider cmp=service cmp=deployment-manager lease=akash1uranz3l3wfuzs7hpxt8nqwr5v050l2xfv0rsak/19236041/1/1/akash1ut3m97h62tty06qdq9lds85r34dxe3snjj0xfe manifest-group=akash
D[2024-12-06|22:09:30.589] waiting on dm.wg                             module=provider-cluster cmp=provider cmp=service cmp=deployment-manager lease=akash1uranz3l3wfuzs7hpxt8nqwr5v050l2xfv0rsak/19236041/1/1/akash1ut3m97h62tty06qdq9lds85r34dxe3snjj0xfe manifest-group=akash
I[2024-12-06|22:09:30.589] shutting down unclean, running teardown now  module=provider-cluster cmp=provider cmp=service cmp=deployment-manager lease=akash1uranz3l3wfuzs7hpxt8nqwr5v050l2xfv0rsak/19236041/1/1/akash1ut3m97h62tty06qdq9lds85r34dxe3snjj0xfe manifest-group=akash
D[2024-12-06|22:09:30.589] read from runch during shutdown              module=provider-cluster cmp=provider cmp=service cmp=deployment-manager lease=akash1r6fms9zf0tcsue6806kjmtch2cjpcx44ksk6kd/19234834/1/1/akash1ut3m97h62tty06qdq9lds85r34dxe3snjj0xfe manifest-group=akash
D[2024-12-06|22:09:30.589] waiting on dm.wg                             module=provider-cluster cmp=provider cmp=service cmp=deployment-manager lease=akash1r6fms9zf0tcsue6806kjmtch2cjpcx44ksk6kd/19234834/1/1/akash1ut3m97h62tty06qdq9lds85r34dxe3snjj0xfe manifest-group=akash
I[2024-12-06|22:09:30.589] shutting down unclean, running teardown now  module=provider-cluster cmp=provider cmp=service cmp=deployment-manager lease=akash1r6fms9zf0tcsue6806kjmtch2cjpcx44ksk6kd/19234834/1/1/akash1ut3m97h62tty06qdq9lds85r34dxe3snjj0xfe manifest-group=akash
D[2024-12-06|22:09:30.591] purged ips                                   module=provider-cluster cmp=provider cmp=service cmp=deployment-manager lease=akash1r6fms9zf0tcsue6806kjmtch2cjpcx44ksk6kd/19234820/1/1/akash1ut3m97h62tty06qdq9lds85r34dxe3snjj0xfe manifest-group=akash
D[2024-12-06|22:09:30.591] purged ips                                   module=provider-cluster cmp=provider cmp=service cmp=deployment-manager lease=akash1uranz3l3wfuzs7hpxt8nqwr5v050l2xfv0rsak/19236041/1/1/akash1ut3m97h62tty06qdq9lds85r34dxe3snjj0xfe manifest-group=akash
D[2024-12-06|22:09:30.592] purged ips                                   module=provider-cluster cmp=provider cmp=service cmp=deployment-manager lease=akash1r6fms9zf0tcsue6806kjmtch2cjpcx44ksk6kd/19234834/1/1/akash1ut3m97h62tty06qdq9lds85r34dxe3snjj0xfe manifest-group=akash
D[2024-12-06|22:09:30.601] purged hostnames                             module=provider-cluster cmp=provider cmp=service cmp=deployment-manager lease=akash1uranz3l3wfuzs7hpxt8nqwr5v050l2xfv0rsak/19236041/1/1/akash1ut3m97h62tty06qdq9lds85r34dxe3snjj0xfe manifest-group=akash
D[2024-12-06|22:09:30.602] purged hostnames                             module=provider-cluster cmp=provider cmp=service cmp=deployment-manager lease=akash1r6fms9zf0tcsue6806kjmtch2cjpcx44ksk6kd/19234834/1/1/akash1ut3m97h62tty06qdq9lds85r34dxe3snjj0xfe manifest-group=akash
D[2024-12-06|22:09:30.602] purged hostnames                             module=provider-cluster cmp=provider cmp=service cmp=deployment-manager lease=akash1r6fms9zf0tcsue6806kjmtch2cjpcx44ksk6kd/19234820/1/1/akash1ut3m97h62tty06qdq9lds85r34dxe3snjj0xfe manifest-group=akash
I[2024-12-06|22:09:30.603] shutdown complete                            module=provider-cluster cmp=provider cmp=service cmp=deployment-manager lease=akash1uranz3l3wfuzs7hpxt8nqwr5v050l2xfv0rsak/19236041/1/1/akash1ut3m97h62tty06qdq9lds85r34dxe3snjj0xfe manifest-group=akash
D[2024-12-06|22:09:30.603] hostnames released                           module=provider-cluster cmp=provider cmp=service cmp=deployment-manager lease=akash1uranz3l3wfuzs7hpxt8nqwr5v050l2xfv0rsak/19236041/1/1/akash1ut3m97h62tty06qdq9lds85r34dxe3snjj0xfe manifest-group=akash
D[2024-12-06|22:09:30.603] sending manager into channel                 module=provider-cluster cmp=provider cmp=service cmp=deployment-manager lease=akash1uranz3l3wfuzs7hpxt8nqwr5v050l2xfv0rsak/19236041/1/1/akash1ut3m97h62tty06qdq9lds85r34dxe3snjj0xfe manifest-group=akash
D[2024-12-06|22:09:30.603] manager done                                 module=provider-cluster cmp=provider cmp=service lease=akash1uranz3l3wfuzs7hpxt8nqwr5v050l2xfv0rsak/19236041/1/1/akash1ut3m97h62tty06qdq9lds85r34dxe3snjj0xfe
I[2024-12-06|22:09:30.609] shutdown complete                            module=provider-cluster cmp=provider cmp=service cmp=deployment-manager lease=akash1r6fms9zf0tcsue6806kjmtch2cjpcx44ksk6kd/19234834/1/1/akash1ut3m97h62tty06qdq9lds85r34dxe3snjj0xfe manifest-group=akash
D[2024-12-06|22:09:30.609] hostnames released                           module=provider-cluster cmp=provider cmp=service cmp=deployment-manager lease=akash1r6fms9zf0tcsue6806kjmtch2cjpcx44ksk6kd/19234834/1/1/akash1ut3m97h62tty06qdq9lds85r34dxe3snjj0xfe manifest-group=akash
D[2024-12-06|22:09:30.609] sending manager into channel                 module=provider-cluster cmp=provider cmp=service cmp=deployment-manager lease=akash1r6fms9zf0tcsue6806kjmtch2cjpcx44ksk6kd/19234834/1/1/akash1ut3m97h62tty06qdq9lds85r34dxe3snjj0xfe manifest-group=akash
D[2024-12-06|22:09:30.609] manager done                                 module=provider-cluster cmp=provider cmp=service lease=akash1r6fms9zf0tcsue6806kjmtch2cjpcx44ksk6kd/19234834/1/1/akash1ut3m97h62tty06qdq9lds85r34dxe3snjj0xfe
I[2024-12-06|22:09:30.611] shutdown complete                            module=provider-cluster cmp=provider cmp=service cmp=deployment-manager lease=akash1r6fms9zf0tcsue6806kjmtch2cjpcx44ksk6kd/19234820/1/1/akash1ut3m97h62tty06qdq9lds85r34dxe3snjj0xfe manifest-group=akash
D[2024-12-06|22:09:30.611] hostnames released                           module=provider-cluster cmp=provider cmp=service cmp=deployment-manager lease=akash1r6fms9zf0tcsue6806kjmtch2cjpcx44ksk6kd/19234820/1/1/akash1ut3m97h62tty06qdq9lds85r34dxe3snjj0xfe manifest-group=akash
Error: creating bidengine service: post failed: Post "http://akash-node-1:26657": EOF
D[2024-12-06|22:09:30.611] sending manager into channel                 module=provider-cluster cmp=provider cmp=service cmp=deployment-manager lease=akash1r6fms9zf0tcsue6806kjmtch2cjpcx44ksk6kd/19234820/1/1/akash1ut3m97h62tty06qdq9lds85r34dxe3snjj0xfe manifest-group=akash
D[2024-12-06|22:09:30.611] manager done                                 module=provider-cluster cmp=provider cmp=service lease=akash1r6fms9zf0tcsue6806kjmtch2cjpcx44ksk6kd/19234820/1/1/akash1ut3m97h62tty06qdq9lds85r34dxe3snjj0xfe
I[2024-12-06|22:09:30.611] shutdown complete                            module=provider-service cmp=provider```
@Zblocker64
Copy link
Author

Zblocker64 commented Dec 6, 2024

Pods before scaling up provider pod

NAMESPACE                                       NAME                                                    READY   STATUS      RESTARTS         AGE
48rq9ebfa9v5lvap5spdaevccmdh8j546jasg1pkrptde   app-79889b9755-f4626                                    1/1     Running     0                2m5s
akash-services                                  akash-node-1-0                                          1/1     Running     6 (2m39s ago)    2d4h
akash-services                                  operator-hostname-6867ddc55d-knt9f                      1/1     Running     7 (2m39s ago)    2d4h
akash-services                                  operator-inventory-569b9bff4b-kng7j                     1/1     Running     3 (2m39s ago)    29h
akash-services                                  operator-inventory-hardware-discovery-4090-server       1/1     Running     0                2m24s
cert-manager                                    cert-manager-69fc948d8d-m78v6                           1/1     Running     7 (2m39s ago)    2d1h
cert-manager                                    cert-manager-cainjector-6ccd76f4-rcfs7                  1/1     Running     7 (2m39s ago)    2d1h
cert-manager                                    cert-manager-webhook-69bf986648-dffw6                   1/1     Running     7 (2m39s ago)    2d1h
ingress-nginx                                   ingress-nginx-controller-j9hdx                          1/1     Running     6 (2m39s ago)    2d4h
inqg5ubru59dtu30ap9c4uu87lk0uguh3r06eos7ot6f4   app-74fb9d99dd-djqxf                                    1/1     Running     0                2m5s
iti0732jp1a0ba5qfuvrctdjok3teeg6im9t1b2d9vtsu   app-cc7c4b555-5z66k                                     1/1     Running     0                2m5s
kube-system                                     calico-kube-controllers-65dcc554ff-xdstd                1/1     Running     13 (2m39s ago)   2d15h
kube-system                                     calico-node-dzf9q                                       1/1     Running     12 (2m39s ago)   2d15h
kube-system                                     coredns-7b98449c4-lbdb8                                 1/1     Running     13 (2m39s ago)   2d15h
kube-system                                     local-path-provisioner-595dcfc56f-8xj5r                 1/1     Running     13 (2m39s ago)   2d15h
kube-system                                     metrics-server-cdcc87586-66nnq                          1/1     Running     13 (2m39s ago)   2d15h
nvidia-device-plugin                            nvdp-nvidia-device-plugin-hp75z                         1/1     Running     0                2m1s
rl6ut492fuavhgm96uqjlgnb89o8v9d18jd9cfoihgcmm   app-f68f945fb-2twlv                                     1/1     Running     0                2m5s
rook-ceph                                       csi-cephfsplugin-7pjrs                                  3/3     Running     9 (2m39s ago)    42h
rook-ceph                                       csi-cephfsplugin-provisioner-5b59749b6b-xvrsh           6/6     Running     18 (2m39s ago)   42h
rook-ceph                                       csi-rbdplugin-pg62j                                     3/3     Running     9 (2m39s ago)    42h
rook-ceph                                       csi-rbdplugin-provisioner-85c6dc4d6b-pnd8m              6/6     Running     18 (2m39s ago)   42h
rook-ceph                                       rook-ceph-crashcollector-4090-server-7bdbb9cdc7-l2gsp   1/1     Running     3 (2m39s ago)    42h
rook-ceph                                       rook-ceph-exporter-4090-server-6d5d8cf65f-kxs4b         1/1     Running     3 (2m39s ago)    42h
rook-ceph                                       rook-ceph-mgr-a-56bdff98b6-hrkdj                        2/2     Running     6 (2m39s ago)    42h
rook-ceph                                       rook-ceph-mon-a-7d6979dcb8-4x4wj                        2/2     Running     6 (2m39s ago)    42h
rook-ceph                                       rook-ceph-operator-7dbdc67799-m42kd                     1/1     Running     3 (2m39s ago)    42h
rook-ceph                                       rook-ceph-osd-0-7d79d844b6-lglp5                        2/2     Running     6 (2m39s ago)    42h
rook-ceph                                       rook-ceph-osd-1-7f848fcd4b-zmhvn                        2/2     Running     6 (2m39s ago)    42h
rook-ceph                                       rook-ceph-osd-2-56b478d6c7-mg5pl                        2/2     Running     6 (2m39s ago)    42h
rook-ceph                                       rook-ceph-osd-3-7d4b859574-j6qw4                        2/2     Running     6 (2m39s ago)    42h
rook-ceph                                       rook-ceph-osd-prepare-4090-server-sldg7                 0/1     Completed   0                84s
rook-ceph                                       rook-ceph-tools-59d5f5f9f5-q8tdp                        1/1     Running     3 (2m39s ago)    42h

Pods After

akash-services         akash-node-1-0                                          1/1     Running     6 (6m11s ago)    2d4h
akash-services         akash-provider-0                                        1/1     Running     1 (50s ago)      5m2s
akash-services         operator-hostname-6867ddc55d-knt9f                      1/1     Running     7 (6m11s ago)    2d4h
akash-services         operator-inventory-569b9bff4b-kng7j                     1/1     Running     3 (6m11s ago)    29h
akash-services         operator-inventory-hardware-discovery-4090-server       1/1     Running     0                5m56s
cert-manager           cert-manager-69fc948d8d-m78v6                           1/1     Running     7 (6m11s ago)    2d1h
cert-manager           cert-manager-cainjector-6ccd76f4-rcfs7                  1/1     Running     7 (6m11s ago)    2d1h
cert-manager           cert-manager-webhook-69bf986648-dffw6                   1/1     Running     7 (6m11s ago)    2d1h
ingress-nginx          ingress-nginx-controller-j9hdx                          1/1     Running     6 (6m11s ago)    2d4h
kube-system            calico-kube-controllers-65dcc554ff-xdstd                1/1     Running     13 (6m11s ago)   2d15h
kube-system            calico-node-dzf9q                                       1/1     Running     12 (6m11s ago)   2d15h
kube-system            coredns-7b98449c4-lbdb8                                 1/1     Running     13 (6m11s ago)   2d15h
kube-system            local-path-provisioner-595dcfc56f-8xj5r                 1/1     Running     13 (6m11s ago)   2d15h
kube-system            metrics-server-cdcc87586-66nnq                          1/1     Running     13 (6m11s ago)   2d15h
nvidia-device-plugin   nvdp-nvidia-device-plugin-hp75z                         1/1     Running     0                5m33s
rook-ceph              csi-cephfsplugin-7pjrs                                  3/3     Running     9 (6m11s ago)    42h
rook-ceph              csi-cephfsplugin-provisioner-5b59749b6b-xvrsh           6/6     Running     18 (6m11s ago)   42h
rook-ceph              csi-rbdplugin-pg62j                                     3/3     Running     9 (6m11s ago)    42h
rook-ceph              csi-rbdplugin-provisioner-85c6dc4d6b-pnd8m              6/6     Running     18 (6m11s ago)   42h
rook-ceph              rook-ceph-crashcollector-4090-server-7bdbb9cdc7-l2gsp   1/1     Running     3 (6m11s ago)    42h
rook-ceph              rook-ceph-exporter-4090-server-6d5d8cf65f-kxs4b         1/1     Running     3 (6m11s ago)    42h
rook-ceph              rook-ceph-mgr-a-56bdff98b6-hrkdj                        2/2     Running     6 (6m11s ago)    42h
rook-ceph              rook-ceph-mon-a-7d6979dcb8-4x4wj                        2/2     Running     6 (6m11s ago)    42h
rook-ceph              rook-ceph-operator-7dbdc67799-m42kd                     1/1     Running     3 (6m11s ago)    42h
rook-ceph              rook-ceph-osd-0-7d79d844b6-lglp5                        2/2     Running     6 (6m11s ago)    42h
rook-ceph              rook-ceph-osd-1-7f848fcd4b-zmhvn                        2/2     Running     6 (6m11s ago)    42h
rook-ceph              rook-ceph-osd-2-56b478d6c7-mg5pl                        2/2     Running     6 (6m11s ago)    42h
rook-ceph              rook-ceph-osd-3-7d4b859574-j6qw4                        2/2     Running     6 (6m11s ago)    42h
rook-ceph              rook-ceph-osd-prepare-4090-server-sldg7                 0/1     Completed   0                4m56s
rook-ceph              rook-ceph-tools-59d5f5f9f5-q8tdp                        1/1     Running     3 (6m11s ago)    42h```

@andy108369
Copy link
Contributor

andy108369 commented Dec 20, 2024

Thanks @Zblocker64
You can close it as the duplicate of #17 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants