diff --git a/dev/README.md b/dev/README.md index 7aee77391..e7c1f8f81 100644 --- a/dev/README.md +++ b/dev/README.md @@ -29,9 +29,17 @@ This container is a PostgreSQL DB. DB data is kept in a volume, persistent acros This container hosts the frontend UI for end-users. -### ssh-host +### receiver -This container hosts a minimal SSH server, usefull for uploading Zimfarm artifacts locally during tests. +This container hosts a customized SSH server, used to receive Zimfarm ZIMs coming from workers. + +### worker_mgr + +This container is the main worker container, responsible to start tasks. It is commented by default. + +### task_worker + +This container is a sample task executor. It is commented by default. ## Instructions @@ -42,7 +50,7 @@ cd dev docker compose -p zimfarm up -d ``` -## Setup Postgresql DB +### Setup Postgresql DB If this is your first run or if you made any schema change, you need to set/update the DB schema before having all containers OK. @@ -67,7 +75,7 @@ alembic check Note that to run integration tests, we use a separate DB, you hence have to set/update the DB schema as well. Just do the same as above with the backend-tests container (instead of the backend-tools) -## Restart the backend +### Restart the backend The backend might typically fail if the DB schema is not up-to-date, or if you create some nasty bug while modifying the code. @@ -78,7 +86,7 @@ docker restart zf_backend Other containers might be restarted the same way. -## Browse the web UI +### Browse the web UI To develop: open [the development web UI](http://localhost:8002). This version has hot reload of UI code changes. @@ -86,7 +94,7 @@ To test build version: open [the web UI](http://localhost:8001) in your favorite You can login with username `admin` and password `admin`. -## Run tests +### Run backend tests Do not forget to set/update the test DB schema @@ -125,4 +133,40 @@ docker-compose setup is ready (and supposing that your local DB is up-to-date): ```sh cd dispatcher/backend docker run --network zimfarm_default -v "$(pwd)/docs/schemaspy.properties:/schemaspy.properties" -v "$(pwd)/docs/schemaspy:/output" schemaspy/schemaspy:latest -``` \ No newline at end of file +``` + +### create a test worker + +In order to test worker manager and task worker, but also to test some other stuff, you will need to have a test worker. + +It is not mandatory to have the worker manager running in most situation, but you will need to have both a worker user in the Zimfarm, and associated private/public key pairs. + +A usefull script to perform all the test worker creation is at `contrib/create_worker.sh`: call it once to create a `test_worker` user, the associated worker object, and upload a test public key. You will then be able to assign tasks to this worker in the UI, and use this test worker for running the worker manager and the task worker. + +Once this is is done, you can start the worker manager simply by uncommenting the `worker_mgr` container in `docker-compose.yml`. + +**Important:** Beware that once you start the worker manager, any pending task will be automatically started by the worker manager. You might want to clear the pending tasks list before starting the worker manager. + +### mark a task as started + +Through the UI, it is easy to create a requested task for your test worker. However, if you do not want to run the worker manager because you do not want the task to really proceed, it gets complicated to fake the start this requested task, i.e mark the fact that the test worker manager has reserved this requested task. + +A usefull script is at `contrib/start_first_req_task.sh`: this will mark the first task in the pipe (oldest one) as reserved for the test worker, and hence transform the requested task into a task. You can obviously call it many times to reserve many tasks. The script displays the whole task, including its id. + +### tweak receiver configuration + +Receiver is responsible to receive ZIMs, logs and artifacts created by the task worker. It is a modified SSH server which performs authentication against the Zimfarm DB. + +In order to use it with a task manager, you have to create one directory per warehouse path (or at least create the ones for the tasks you will run). + +A usefull script has been added to the dev stack to create these directories: + +``` +docker exec -it zf_receiver /contrib/create-warehouse-paths.sh +``` + +### test a task manager + +You can start a task manager manually simply by requesting a task in the UI and starting it manually (see above). + +Once the task is reserved for the `test_worker`, you can modify the `task_worker` container `command` in `docker-compose.yml` with this ID, uncomment the `task_worker` section and start it. \ No newline at end of file diff --git a/dev/contrib/create_worker.sh b/dev/contrib/create_worker.sh new file mode 100755 index 000000000..5d4cd7973 --- /dev/null +++ b/dev/contrib/create_worker.sh @@ -0,0 +1,71 @@ +#!/bin/bash + +# Call it once to create a `test_worker`: +# - retrieve an admin token +# - create the `test_worker`` user +# - create the associated worker object +# - upload a test public key. +# +# To be used to have a "real" test worker for local development, typically to start +# a worker manager or a task manager or simply assign tasks to a worker in the UI/API + +set -e + +echo "Retrieving admin access token" + +ZF_ADMIN_TOKEN="$(curl -s -X 'POST' \ + 'http://localhost:8000/v1/auth/authorize' \ + -H 'accept: application/json' \ + -H 'Content-Type: application/x-www-form-urlencoded' \ + -d 'username=admin&password=admin' \ + | jq -r '.access_token')" + +echo "Create test_worker user" + +curl -s -X 'POST' \ + 'http://localhost:8000/v1/users/' \ + -H 'accept: */*' \ + -H 'Content-Type: application/json' \ + -H "Authorization: Bearer $ZF_ADMIN_TOKEN" \ + -d '{ + "role":"worker", + "username": "test_worker", + "email":"test_worker@acme.com", + "password":"test_worker" +}' + +echo "Retrieving test_worker access token" + +ZF_USER_TOKEN="$(curl -s -X 'POST' \ + 'http://localhost:8000/v1/auth/authorize' \ + -H 'accept: application/json' \ + -H 'Content-Type: application/x-www-form-urlencoded' \ + -d 'username=test_worker&password=test_worker' \ + | jq -r '.access_token')" + +echo "Worker check-in (will create it since missing)" + +curl -s -X 'PUT' \ + 'http://localhost:8000/v1/workers/test_worker/check-in' \ + -H 'accept: */*' \ + -H 'Content-Type: application/json' \ + -H "Authorization: Bearer $ZF_USER_TOKEN" \ + -d '{ + "username": "test_worker", + "cpu": 3, + "memory": 1024, + "disk": 0, + "offliners": [ + "zimit" + ] +}' + +echo "Add private key to test_worker" + +curl -X POST http://localhost:8000/v1/users/test_worker/keys \ + -H 'accept: */*' \ + -H "Authorization: Bearer $ZF_USER_TOKEN" \ + -H 'Content-Type: application/json; charset=utf-8' \ + -d '{"name": "test_key", "key": "AAAAB3NzaC1yc2EAAAADAQABAAABAQCn2r5IZSJp02FBAYSZBQRdOBKBK2VOErdrBCZm5Ig3hDKQuxq38+W5CJ2JUJU+LQm//uenm58scGlEtk5+w5SjObjzK8Qx6JeRhAiZ8xpyydSoUIvd0ARD9OKwdiQFqVlLPlOyrdIpQ2vRESdwzhe0f7EYUwgKzBw5k0foxQsGxTiztY/ugWJ8Jso5WOxXwzEw4cSnGhdrehqLphlZanr54wj5oTcrj/vJHlpbxkYzFMc2Zgj81GdIV4yP3H1yX4ySK8VkDPOCczHacdRnHw4u8Vgf6wS6Zy3iMpvuGu7BJkwNoTXvmVV5BXUm6GAMSQTAPcw5T8M+eXjSAnriGDAL"}' + +echo "DONE" \ No newline at end of file diff --git a/dev/contrib/start_first_req_task.sh b/dev/contrib/start_first_req_task.sh new file mode 100755 index 000000000..8c0ba7024 --- /dev/null +++ b/dev/contrib/start_first_req_task.sh @@ -0,0 +1,40 @@ +#!/bin/bash + +# Call as many times as necessary to transition the first (oldest) requested task in the +# database into a reserved task, assigned to the `test_worker` worker. +# +# Displays the whole task JSON. + +set -e + +echo "Retrieving access token" + +ZF_ADMIN_TOKEN="$(curl -s -X 'POST' \ + 'http://localhost:8000/v1/auth/authorize' \ + -H 'accept: application/json' \ + -H 'Content-Type: application/x-www-form-urlencoded' \ + -d 'username=admin&password=admin' \ + | jq -r '.access_token')" + +echo "Get last requested task" + +FIRST_TASK_ID="$(curl -s -X 'GET' \ + 'http://localhost:8000/v1/requested-tasks/' \ + -H 'accept: application/json' \ + -H "Authorization: Bearer $ZF_ADMIN_TOKEN" \ + | jq -r '.items[0]._id')" + +if [ "$FIRST_TASK_ID" = "null" ]; then + echo "No pending requested task. Exiting script." + exit 1 +fi + +echo "Start task (i.e. mark it as started)" + +curl -s -X 'POST' \ + "http://localhost:8000/v1/tasks/$FIRST_TASK_ID?worker_name=worker" \ + -H 'accept: application/json' \ + -H "Authorization: Bearer $ZF_ADMIN_TOKEN" \ + -d '' + +echo "DONE" \ No newline at end of file diff --git a/dev/docker-compose.yml b/dev/docker-compose.yml index 4f4e5e2f3..d7b4f2b5c 100644 --- a/dev/docker-compose.yml +++ b/dev/docker-compose.yml @@ -28,9 +28,11 @@ services: JWT_SECRET: DH8kSxcflUVfNRdkEiJJCn2dOOKI3qfw POSTGRES_URI: postgresql+psycopg://zimfarm:zimpass@postgresdb:5432/zimfarm ALEMBIC_UPGRADE_HEAD_ON_START: "1" - ARTIFACTS_UPLOAD_URI: scp://root@ssh-host:22/artifacts/ - LOGS_UPLOAD_URI: scp://root@ssh-host:22/logs/ - ZIM_UPLOAD_URI: scp://root@ssh-host:22/zims/ + # upload artifacts, logs and zim to receiver for simplicity + ARTIFACTS_UPLOAD_URI: sftp://uploader@receiver:22/logs/ # reusing logs dir, kind of a hack + LOGS_UPLOAD_URI: sftp://uploader@receiver:22/logs/ + ZIM_UPLOAD_URI: sftp://uploader@receiver:22/zim/ + ZIMCHECK_OPTION: --all depends_on: - postgresdb frontend-ui: @@ -81,13 +83,47 @@ services: POSTGRES_URI: postgresql+psycopg://zimfarm:zimpass@postgresdb:5432/zimtest depends_on: - postgresdb - ssh-host: - build: - context: ssh-host - container_name: zf_ssh_host + receiver: + build: ../receiver + container_name: zf_receiver ports: - - 127.0.0.1:8022:22 + - 127.0.0.1:8222:22 + volumes: + - ./receiver/create-warehouse-paths.sh:/contrib/create-warehouse-paths.sh + environment: + - ZIMFARM_WEBAPI=http://backend:8000/v1 + depends_on: + - backend + + # # uncomment this only if you want to run a worker manager + # worker_mgr: + # build: + # context: ../workers + # dockerfile: manager-Dockerfile + # container_name: zf_worker_mgr + # depends_on: + # - backend + # command: worker-manager --webapi-uri 'http://backend:8000/v1' --username test_worker --name test_worker + # volumes: + # - /var/run/docker.sock:/var/run/docker.sock + # - ./test_worker-identity/id_rsa:/etc/ssh/keys/zimfarm + # # uncomment this only if you want to run a 'standalone' task worker + # # you have to modify the in the command with a real requested task + # task_worker: + # build: + # context: ../workers + # dockerfile: task-Dockerfile + # container_name: zf_task_worker + # depends_on: + # - backend + # command: task-worker --webapi-uri 'http://backend:8000/v1' --username test_worker --task-id + # volumes: + # - /var/run/docker.sock:/var/run/docker.sock + # - ./test_worker-identity/id_rsa:/etc/ssh/keys/zimfarm + # environment: + # - DEBUG=1 + # - DOCKER_NETWORK=zimfarm_default volumes: pg_data_zimfarm: diff --git a/dev/receiver/create-warehouse-paths.sh b/dev/receiver/create-warehouse-paths.sh new file mode 100755 index 000000000..df44a47f1 --- /dev/null +++ b/dev/receiver/create-warehouse-paths.sh @@ -0,0 +1,47 @@ +#!/bin/bash + +set -e + +mkdir -p \ + /jail/zim/freecodecamp \ + /jail/zim/gutenberg \ + /jail/zim/ifixit \ + /jail/zim/mooc \ + /jail/zim/other \ + /jail/zim/phet \ + /jail/zim/stack_exchange \ + /jail/zim/ted \ + /jail/zim/videos \ + /jail/zim/vikidia \ + /jail/zim/wikibooks \ + /jail/zim/wikihow \ + /jail/zim/wikinews \ + /jail/zim/wikipedia \ + /jail/zim/wikiquote \ + /jail/zim/wikisource \ + /jail/zim/wikiversity \ + /jail/zim/wikivoyage \ + /jail/zim/wiktionary \ + /jail/zim/zimit + +chmod 777 \ + /jail/zim/freecodecamp \ + /jail/zim/gutenberg \ + /jail/zim/ifixit \ + /jail/zim/mooc \ + /jail/zim/other \ + /jail/zim/phet \ + /jail/zim/stack_exchange \ + /jail/zim/ted \ + /jail/zim/videos \ + /jail/zim/vikidia \ + /jail/zim/wikibooks \ + /jail/zim/wikihow \ + /jail/zim/wikinews \ + /jail/zim/wikipedia \ + /jail/zim/wikiquote \ + /jail/zim/wikisource \ + /jail/zim/wikiversity \ + /jail/zim/wikivoyage \ + /jail/zim/wiktionary \ + /jail/zim/zimit \ No newline at end of file diff --git a/dev/ssh-host/Dockerfile b/dev/ssh-host/Dockerfile deleted file mode 100644 index d0d32ca13..000000000 --- a/dev/ssh-host/Dockerfile +++ /dev/null @@ -1,44 +0,0 @@ -FROM alpine:3 - -# Install SSH server -RUN apk update && apk add --no-cache openssh-server - -# Create SSH directory and set permissions -RUN mkdir /var/run/sshd -RUN chmod 0755 /var/run/sshd - -# Copy SSH host keys to container -COPY ssh_host_rsa_key /etc/ssh/ssh_host_rsa_key - -# Copy test client SSH public key to the container -COPY id_rsa.pub /root/.ssh/authorized_keys - -RUN mkdir -p \ - /root/artifacts \ - /root/logs \ - /root/zims/freecodecamp \ - /root/zims/gutenberg \ - /root/zims/ifixit \ - /root/zims/mooc \ - /root/zims/other \ - /root/zims/phet \ - /root/zims/stack_exchange \ - /root/zims/ted \ - /root/zims/videos \ - /root/zims/vikidia \ - /root/zims/wikibooks \ - /root/zims/wikihow \ - /root/zims/wikinews \ - /root/zims/wikipedia \ - /root/zims/wikiquote \ - /root/zims/wikisource \ - /root/zims/wikiversity \ - /root/zims/wikivoyage \ - /root/zims/wiktionary \ - /root/zims/zimit - -# Expose SSH port -EXPOSE 22 - -# Start SSH server -CMD ["/usr/sbin/sshd", "-D"] \ No newline at end of file diff --git a/dev/ssh-host/known_hosts b/dev/ssh-host/known_hosts deleted file mode 100644 index bdd5f2f79..000000000 --- a/dev/ssh-host/known_hosts +++ /dev/null @@ -1 +0,0 @@ -ssh-host ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABgQCunjxQLDBHohVN13u/fMQNlHeM+QQK6N6LRNo6XO/Y08CKIh6s7YFmZdPWqslOilLdGQM8z8ShZd/WrCjGEbanYQI2c5TgduLd6CIjL3t07otLtf8KqXZCDBBtOvvlTUWHJQk1hN3STdfw+VNjdOMhLeWTtCAyO2zgJQTfkqrrjVul+m3ykAOQw4ULLpTefrZ2/qpcbJaDl8qoortpMuhHExxvMgJJhyx0cOLHA7UdEgEr+2BfMTj6BznB+udREYTFDFTgkDexHzdpptphtO2HCqyY06z8lacdOPw5mXe0Ilfr/EFDhtkk4i8MsdRLhaqtkzvw914t/4yZcFBDd1DNQzMcBK4W8WZUNeArsB14/UMhACFj2QUIGyxa8yoawQ5G8EaEf0Djg1MP+gnTFb2fu9vXBEO0Bu/TYOIfs9W9iKN5aw7NvupulCcO1eTQED3k5QIKeautKL42hnPCnL8SQQsS2JPRRzXtarjLIghog1chirQNFkfNiNjsa2ltOp0= diff --git a/dev/ssh-host/ssh_host_rsa_key b/dev/ssh-host/ssh_host_rsa_key deleted file mode 100644 index 5516b92e5..000000000 --- a/dev/ssh-host/ssh_host_rsa_key +++ /dev/null @@ -1,39 +0,0 @@ ------BEGIN OPENSSH PRIVATE KEY----- -b3BlbnNzaC1rZXktdjEAAAAABG5vbmUAAAAEbm9uZQAAAAAAAAABAAABlwAAAAdzc2gtcn -NhAAAAAwEAAQAAAYEArp48UCwwR6IVTdd7v3zEDZR3jPkECujei0TaOlzv2NPAiiIerO2B -ZmXT1qrJTopS3RkDPM/EoWXf1qwoxhG2p2ECNnOU4Hbi3egiIy97dO6LS7X/Cql2QgwQbT -r75U1FhyUJNYTd0k3X8PlTY3TjIS3lk7QgMjts4CUE35Kq641bpfpt8pADkMOFCy6U3n62 -dv6qXGyWg5fKqKK7aTLoRxMcbzICSYcsdHDixwO1HRIBK/tgXzE4+gc5wfrnURGExQxU4J -A3sR83aabaYbTthwqsmNOs/JWnHTj8OZl3tCJX6/xBQ4bZJOIvDLHUS4WqrZM78PdeLf+M -mXBQQ3dQzUMzHASuFvFmVDXgK7AdeP1DIQAhY9kFCBssWvMqGsEORvBGhH9A44NTD/oJ0x -W9n7vb1wRDtAbv02DiH7PVvYijeWsOzb7qbpQnDtXk0BA95OUCCnmrrSi+NoZzwpy/EkEL -EtiT0Uc17Wq4yyIIaINXIYq0DRZHzYjY7GtpbTqdAAAFoKgeDg2oHg4NAAAAB3NzaC1yc2 -EAAAGBAK6ePFAsMEeiFU3Xe798xA2Ud4z5BAro3otE2jpc79jTwIoiHqztgWZl09aqyU6K -Ut0ZAzzPxKFl39asKMYRtqdhAjZzlOB24t3oIiMve3Tui0u1/wqpdkIMEG06++VNRYclCT -WE3dJN1/D5U2N04yEt5ZO0IDI7bOAlBN+SquuNW6X6bfKQA5DDhQsulN5+tnb+qlxsloOX -yqiiu2ky6EcTHG8yAkmHLHRw4scDtR0SASv7YF8xOPoHOcH651ERhMUMVOCQN7EfN2mm2m -G07YcKrJjTrPyVpx04/DmZd7QiV+v8QUOG2STiLwyx1EuFqq2TO/D3Xi3/jJlwUEN3UM1D -MxwErhbxZlQ14CuwHXj9QyEAIWPZBQgbLFrzKhrBDkbwRoR/QOODUw/6CdMVvZ+729cEQ7 -QG79Ng4h+z1b2Io3lrDs2+6m6UJw7V5NAQPeTlAgp5q60ovjaGc8KcvxJBCxLYk9FHNe1q -uMsiCGiDVyGKtA0WR82I2OxraW06nQAAAAMBAAEAAAGACdCno7sEILal22kCEe6dp4TBno -kttr5Fqg7d9FteePHYF/uYfVBhTmPpXx7c7177juV1xuCH1SmghhTJuu5qdaiQgwaGpwJP -uLjwWEl2N0mkR0Zs1kjVtpsufjFLUOWBw7mrdZhpDoXlHiypiQTcMnR9u8prZ99qvIOgLT -/1fwWEUgVMUk7RgHzY+Nqur/3v3Cru4QCSikWJNObmwWBE6Z/ToJVvRvpD36yrtpOJBeAJ -9FKuJVOjN/yZfMORZn9lPu0HTbkyWw2hYN7A6I2oXsHFuCz1VDnFdjv4xHCBFwPu/mzXvG -RkhYQHr5YzXohn+SJUBCvTqGgl/PXvTrD8aw5SQmBINEZXtBlL9a4UKiPmtS5QcOICa4ps -eODzAiY01jTntb0uJh9lJTL5QK9Aq2W0VJHRL8cQFiyMYAw0vL1l2Lk/XTomcxAyW98ABd -pbFZxNfbkHd6YbQQ7UEZLUnvcs7+OXBe0lLtrM38TNKlcqVepHPel0RJKu9zU8iEa5AAAA -wQDn6RUSt5jY7cWd8hbmomtwZYdcSqo9ySkUVINGi13qNBmIwtLK4rMV5zng65tNZFeOjh -646dPVBzyxfR9I2hior0OMcswzNfLXZx90AJAEzrpoFmxQS59yP2GOgBY8SSDo6V5Bgm61 -qMyAMxWk9+qoqGpKCqvbIY5Oq0tEhv/VF3TTipvcqWw7jg+3Yopw/4sG0XtFiQa3hqC3LH -zxeoW3U4su94M+NExwoXOGsUoDHyunJUWiY83RrK/G1Pf7zzAAAADBAO48ES7Dt8rUxiqQ -fUsWzdL2lX1ASgYqkpmfSkivV93L9kL+eRTHGzov1EStezI8Gx8NBpzpE7f8z6wftlvlya -dCnq2hmQJLVDTIZlbgT6AQ9VBs7FZP0ZmRSP49uV8JY/dEOqi+o6xaDHflsPF/tx5pC6KV -WPxgoqrkmnwt7vgwmBd0vY75rgdxsEW/S1B8AvU43eG1Bi4HaqRHDeS2ffOCOI4dOkzU1z -MyFIpIn0Sx7qXlRLdppu9JSz9ssCZR1QAAAMEAu6O4TlW+3IuQOTkpNPKHtS40VW9Rk4De -995+ktFQFRbYeHptJWAs+abbTZfV6XBNsMx9z6C1x2J9v7/O7/lqGMyPCBsUn//FrtjA+l -GUDCutEYiMewKsZpMzEY5XGbGV4srSHazrlYgVDi1MjWSo5zYG3EloJe/wrew26y76mREM -yA9u+SLVJ3NBtFYtTkf7WecH2xtT2H6uedT+QeisVFKFpzdwIvyNUswgqS7S6vu8QL3nRM -7twwb3BPpu4eGpAAAAI2Jlbm9pdEBVYnVudHUtMjIwNC1qYW1teS1hbWQ2NC1iYXNlAQID -BAUGBw== ------END OPENSSH PRIVATE KEY----- diff --git a/dev/ssh-host/id_rsa b/dev/test_worker-identity/id_rsa similarity index 100% rename from dev/ssh-host/id_rsa rename to dev/test_worker-identity/id_rsa diff --git a/dev/ssh-host/id_rsa.pub b/dev/test_worker-identity/id_rsa.pub similarity index 100% rename from dev/ssh-host/id_rsa.pub rename to dev/test_worker-identity/id_rsa.pub diff --git a/dnscache/Dockerfile b/dnscache/Dockerfile index 381c77188..acdcbf031 100644 --- a/dnscache/Dockerfile +++ b/dnscache/Dockerfile @@ -13,6 +13,7 @@ RUN chmod +x /usr/local/bin/entrypoint.sh # --no-daemon and --keep-in-foreground are similar # but no-deamon has debug enabled (and thus starts with useful output) + ENTRYPOINT ["entrypoint.sh"] CMD ["dnsmasq", "--no-daemon", "--user=root", "--conf-file=/etc/dnsmasq.conf", "--resolv-file=/etc/resolv.dnsmasq", "--domain-needed", "--bogus-priv", "--no-hosts", "--cache-size=1500", "--neg-ttl=600", "--no-poll"] diff --git a/receiver/Dockerfile b/receiver/Dockerfile index b3ca15d47..90eb68270 100644 --- a/receiver/Dockerfile +++ b/receiver/Dockerfile @@ -1,10 +1,10 @@ -FROM python:3.8-buster +FROM python:3.12-slim-bookworm LABEL zimfarm=true LABEL org.opencontainers.image.source https://github.com/openzim/zimfarm # system dependencies RUN apt-get update -y \ - && apt-get install -y --no-install-recommends openssh-sftp-server openssh-server wget cron parallel \ + && apt-get install -y --no-install-recommends openssh-sftp-server openssh-server wget cron parallel build-essential \ && apt-get clean \ && rm -rf /var/lib/apt/lists/* @@ -19,10 +19,10 @@ RUN printf "allowscp\nallowsftp\n" > /etc/rssh.conf WORKDIR / # setup a chroot jail at /jail -RUN wget -nv https://olivier.sessink.nl/jailkit/jailkit-2.21.tar.gz -RUN tar xf jailkit-2.21.tar.gz -RUN cd jailkit-2.21 && ./configure && make && make install -RUN rm -rf /jailkit-2.21 && rm -f jailkit-2.21.tar.gz +RUN wget -nv https://olivier.sessink.nl/jailkit/jailkit-2.23.tar.gz +RUN tar xf jailkit-2.23.tar.gz +RUN cd jailkit-2.23 && ./configure && make && make install +RUN rm -rf /jailkit-2.23 && rm -f jailkit-2.23.tar.gz # patch ini file RUN sed -i.bak -e '116d' /etc/jailkit/jk_init.ini RUN printf "\n[rssh]\npaths = /bin/rssh, /etc/rssh.conf\n" >> /etc/jailkit/jk_init.ini diff --git a/receiver/apps/get_zimfarm_key.py b/receiver/apps/get_zimfarm_key.py index 66b24db68..b8b920455 100755 --- a/receiver/apps/get_zimfarm_key.py +++ b/receiver/apps/get_zimfarm_key.py @@ -84,7 +84,7 @@ def fetch_public_keys_for(username, raw_fingerprint): if __name__ == "__main__": if len(sys.argv) != 3: - logger.error(f"Usage: {sys.argv[1]} ") + logger.error(f"Usage: {sys.argv[0]} ") sys.exit(1) sys.exit(print_keys_for(*sys.argv[1:])) diff --git a/receiver/apps/requirements.txt b/receiver/apps/requirements.txt index 566083cb6..2c24336eb 100644 --- a/receiver/apps/requirements.txt +++ b/receiver/apps/requirements.txt @@ -1 +1 @@ -requests==2.22.0 +requests==2.31.0 diff --git a/uploader/CONTRIBUTING.md b/uploader/CONTRIBUTING.md index f5cb3e485..7209d242d 100644 --- a/uploader/CONTRIBUTING.md +++ b/uploader/CONTRIBUTING.md @@ -47,7 +47,7 @@ It should succeed. If you run the same test a second time, it will fail due to c Following test upload should succeed. ``` -docker run -it --rm -v $PWD:/data -v $PWD/../dev/ssh-host/id_rsa:/etc/ssh/keys/id_rsa -v $PWD/../dev/ssh-host/known_hosts:/etc/ssh/known_hosts --network zimfarm_default local-zf-uploader uploader --file /data/CONTRIBUTING.md --upload-uri scp://root@ssh-host:22/CONTRIBUTING.md --move +docker run -it --rm -v $PWD:/data -v $PWD/dev/test_worker-identity/id_rsa:/etc/ssh/keys/id_rsa --network zimfarm_default local-zf-uploader uploader --file /data/CONTRIBUTING.md --upload-uri scp://uploader@receiver:22/logs/CONTRIBUTING.md --move ``` ## SFTP test @@ -55,5 +55,5 @@ docker run -it --rm -v $PWD:/data -v $PWD/../dev/ssh-host/id_rsa:/etc/ssh/keys/i Following test upload should succeed. ``` -docker run -it --rm -v $PWD:/data -v $PWD/../dev/ssh-host/id_rsa:/etc/ssh/keys/id_rsa -v $PWD/../dev/ssh-host/known_hosts:/etc/ssh/known_hosts --network zimfarm_default local-zf-uploader uploader --file /data/CONTRIBUTING.md --upload-uri sftp://root@ssh-host:22/CONTRIBUTING.md +docker run -it --rm -v $PWD/CONTRIBUTING.md:/data/CONTRIBUTING.md -v $PWD/dev/test_worker-identity/id_rsa:/etc/ssh/keys/id_rsa --network zimfarm_default local-zf-uploader uploader --file /data/CONTRIBUTING.md --upload-uri sftp://uploader@receiver:22/logs/CONTRIBUTING.md ``` \ No newline at end of file diff --git a/workers/app/common/constants.py b/workers/app/common/constants.py index b747ebf57..4234d04ea 100644 --- a/workers/app/common/constants.py +++ b/workers/app/common/constants.py @@ -21,8 +21,8 @@ TASK_WORKER_IMAGE = ( os.getenv("TASK_WORKER_IMAGE") or "ghcr.io/openzim/zimfarm-task-worker:latest" ) -DNSCACHE_IMAGE = os.getenv("DNSCACHE_IMAGE") or "ghcr.io/openzim/dnscache:1.0.1" -UPLOADER_IMAGE = os.getenv("UPLOADER_IMAGE") or "ghcr.io/openzim/uploader:1.2" +DNSCACHE_IMAGE = os.getenv("DNSCACHE_IMAGE") or "ghcr.io/openzim/dnscache:latest" +UPLOADER_IMAGE = os.getenv("UPLOADER_IMAGE") or "ghcr.io/openzim/uploader:latest" CHECKER_IMAGE = os.getenv("CHECKER_IMAGE") or "ghcr.io/openzim/zim-tools:3.3.0" MONITOR_IMAGE = os.getenv("MONITOR_IMAGE") or "ghcr.io/openzim/zimfarm-monitor:latest" diff --git a/workers/app/common/zim.py b/workers/app/common/zim.py deleted file mode 100644 index 9bad89d09..000000000 --- a/workers/app/common/zim.py +++ /dev/null @@ -1,26 +0,0 @@ -import base64 -import pathlib -from typing import Any, Dict - -from zimscraperlib.zim import Archive - - -def get_zim_info(fpath: pathlib.Path) -> Dict[str, Any]: - zim = Archive(fpath) - payload = { - "id": str(zim.uuid), - "counter": zim.counters, - "article_count": zim.article_counter, - "media_count": zim.media_counter, - "size": fpath.stat().st_size, - "metadata": zim.metadata, - } - for size in zim.get_illustration_sizes(): - payload["metadata"].update( - { - f"Illustration_{size}x{size}": base64.standard_b64encode( - zim.get_illustration_item(size).content - ).decode("ASCII") - } - ) - return payload diff --git a/workers/app/emitter.py b/workers/app/emitter.py deleted file mode 100644 index 1d13fff79..000000000 --- a/workers/app/emitter.py +++ /dev/null @@ -1,41 +0,0 @@ -#!/usr/bin/env python3 -# -*- coding: utf-8 -*- -# vim: ai ts=4 sts=4 et sw=4 nu - -""" zmq relay tester: emmit random messages to the `internal` channel """ - -import logging -import os -import random -import time - -import zmq - -SOCKET_URI = os.getenv("SOCKET_URI", "tcp://192.168.1.13:5000") -EVENTS = os.getenv("EVENTS", "requested-task,task-event").split(",") - -logger = logging.getLogger("emitter") - -if not logger.hasHandlers(): - logger.setLevel(logging.DEBUG) - handler = logging.StreamHandler() - handler.setFormatter(logging.Formatter("[%(asctime)s: %(levelname)s] %(message)s")) - logger.addHandler(handler) - - -def main(): - context = zmq.Context() - socket = context.socket(zmq.PUB) - - logger.info(f"connecting to {SOCKET_URI}…") - socket.connect(SOCKET_URI) - - while True: - message = "{} {}".format(random.choice(EVENTS), random.randint(0, 1000)) - logger.info(f"[SENDING] {message}") - socket.send_string(message) - time.sleep(random.randint(5, 20)) - - -if __name__ == "__main__": - main() diff --git a/workers/app/listener.py b/workers/app/listener.py deleted file mode 100644 index 826d4335e..000000000 --- a/workers/app/listener.py +++ /dev/null @@ -1,39 +0,0 @@ -#!/usr/bin/env python3 -# -*- coding: utf-8 -*- -# vim: ai ts=4 sts=4 et sw=4 nu - -""" zmq relay tester: listens to topic on relay's public channel """ - -import logging -import os - -import zmq - -SOCKET_URI = os.getenv("SOCKET_URI", "tcp://localhost:6000") -EVENTS = os.getenv("EVENTS", "requested-task,task-event").split(",") -logger = logging.getLogger("listener") - -if not logger.hasHandlers(): - logger.setLevel(logging.DEBUG) - handler = logging.StreamHandler() - handler.setFormatter(logging.Formatter("[%(asctime)s: %(levelname)s] %(message)s")) - logger.addHandler(handler) - - -def main(): - context = zmq.Context() - socket = context.socket(zmq.SUB) - - logger.info(f"connecting to {SOCKET_URI}…") - socket.connect(SOCKET_URI) - for event in EVENTS: - logger.debug(f"subscribing to topic `{event}`") - socket.setsockopt_string(zmq.SUBSCRIBE, event) - - while True: - received_string = socket.recv_string() - logger.info(f"[INCOMING] {received_string}") - - -if __name__ == "__main__": - main() diff --git a/workers/app/task/worker.py b/workers/app/task/worker.py index e7daea29d..1431f7a74 100644 --- a/workers/app/task/worker.py +++ b/workers/app/task/worker.py @@ -38,7 +38,7 @@ ) from common.utils import format_key, format_size from common.worker import BaseWorker -from common.zim import get_zim_info +from task.zim import get_zim_info SLEEP_INTERVAL = 60 # nb of seconds to sleep before watching PENDING = "pending" diff --git a/workers/app/task/zim.py b/workers/app/task/zim.py new file mode 100644 index 000000000..9a8606d91 --- /dev/null +++ b/workers/app/task/zim.py @@ -0,0 +1,121 @@ +from __future__ import annotations + +import base64 +import io +import pathlib +from collections import namedtuple +from typing import Any, Dict, Optional + +from libzim import Archive + + +def get_zim_info(fpath: pathlib.Path) -> Dict[str, Any]: + zim = Archive(fpath) + payload = { + "id": str(zim.uuid), + "counter": counters(zim), + "article_count": zim.article_count, + "media_count": zim.media_count, + "size": fpath.stat().st_size, + "metadata": { + key: get_text_metadata(zim, key) + for key in zim.metadata_keys + if not key.startswith("Illustration_") + }, + } + for size in zim.get_illustration_sizes(): + payload["metadata"].update( + { + f"Illustration_{size}x{size}": base64.standard_b64encode( + zim.get_illustration_item(size).content + ).decode("ASCII") + } + ) + return payload + + +# Code below is duplicated from python-scraperlib, in order to depend only on +# python-libzim in the task manager, and not the whole python-scraperlib and all its +# dependencies + +MimetypeAndCounter = namedtuple("MimetypeAndCounter", ["mimetype", "value"]) +CounterMap = Dict[ + type(MimetypeAndCounter.mimetype), type(MimetypeAndCounter.value) # pyright: ignore +] + + +def get_text_metadata(zim: Archive, name: str) -> str: + """Decoded value of a text metadata""" + return zim.get_metadata(name).decode("UTF-8") + + +def getline(src: io.StringIO, delim: Optional[bool] = None) -> tuple[bool, str]: + """C++ stdlib getline() ~clone + + Reads `src` until it finds `delim`. + returns whether src is EOF and the extracted string (delim excluded)""" + output = "" + if not delim: + return True, src.read() + + char = src.read(1) + while char: + if char == delim: + break + output += char + char = src.read(1) + return char == "", output + + +def counters(zim: Archive) -> dict[str, int]: + try: + return parseMimetypeCounter(get_text_metadata(zim, "Counter")) + except RuntimeError: # pragma: no cover (no ZIM avail to test itl) + return {} # pragma: no cover + + +def readFullMimetypeAndCounterString( + src: io.StringIO, +) -> tuple[bool, str]: + """read a single mimetype-and-counter string from source + + Returns whether the source is EOF and the extracted string (or empty one)""" + params = "" + eof, mtcStr = getline(src, ";") # pyright: ignore + if mtcStr.find("=") == -1: + while params.count("=") != 2: # noqa: PLR2004 + eof, params = getline(src, ";") # pyright: ignore + if params.count("=") == 2: # noqa: PLR2004 + mtcStr += ";" + params + if eof: + break + return eof, mtcStr + + +def parseASingleMimetypeCounter(string: str) -> MimetypeAndCounter: + """MimetypeAndCounter from a single mimetype-and-counter string""" + k: int = string.rfind("=") + if k != len(string) - 1: + mimeType = string[:k] + counter = string[k + 1 :] # noqa: E203 + try: + return MimetypeAndCounter(mimeType, int(counter)) + except ValueError: + pass # value is not castable to int + return MimetypeAndCounter("", 0) + + +def parseMimetypeCounter( + counterData: str, +) -> CounterMap: + """Mapping of MIME types with count for each from ZIM Counter metadata string""" + counters = {} + ss = io.StringIO(counterData) + eof = False + while not eof: + eof, mtcStr = readFullMimetypeAndCounterString(ss) + mtc = parseASingleMimetypeCounter(mtcStr) + if mtc.mimetype: + counters.update([mtc]) + ss.close() + return counters diff --git a/workers/manager-Dockerfile b/workers/manager-Dockerfile index 75879c390..b7178c3e1 100644 --- a/workers/manager-Dockerfile +++ b/workers/manager-Dockerfile @@ -1,7 +1,7 @@ -FROM python:3.8-buster +FROM python:3.12-slim-bookworm LABEL zimfarm=true LABEL org.opencontainers.image.source https://github.com/openzim/zimfarm - + WORKDIR /usr/src COPY manager-requirements.txt requirements.txt diff --git a/workers/manager-requirements.txt b/workers/manager-requirements.txt index 32eb446a3..dcc9f275c 100644 --- a/workers/manager-requirements.txt +++ b/workers/manager-requirements.txt @@ -1,7 +1,6 @@ -zmq -requests>=2.26,<3.0 -docker>=6.0.1,<7.0 -psutil>=5.8.0,<6.0 -humanfriendly>=9.2,<10.0 -PyJWT>=2.4.0,<3.0 +requests==2.31.0 +docker==7.0.0 +psutil==5.9.8 +humanfriendly==10.0 +PyJWT==2.8.0 paramiko==2.11.0 diff --git a/workers/task-Dockerfile b/workers/task-Dockerfile index 7d69f31c4..9d523b587 100644 --- a/workers/task-Dockerfile +++ b/workers/task-Dockerfile @@ -1,4 +1,4 @@ -FROM python:3.8-buster +FROM python:3.12-slim-bookworm LABEL zimfarm=true LABEL org.opencontainers.image.source https://github.com/openzim/zimfarm diff --git a/workers/task-requirements.txt b/workers/task-requirements.txt index 972575294..fb7a482c9 100644 --- a/workers/task-requirements.txt +++ b/workers/task-requirements.txt @@ -1,9 +1,9 @@ -requests>=2.26,<3.0 -docker>=6.0.1,<7.0 -psutil>=5.8.0,<6.0 -humanfriendly>=9.2,<10.0 -PyJWT>=2.4.0,<3.0 +requests==2.31.0 +docker==7.0.0 +psutil==5.9.8 +humanfriendly==10.0 +PyJWT==2.8.0 kiwixstorage==0.6 -ujson>=5.1.0,<5.2 -zimscraperlib>=1.4.1,<1.5 +ujson==5.9.0 +libzim==3.4.0 paramiko==2.11.0