Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zimwriterfs: created zim-file triggers download dialog when trying to open any extensionless html page in Kiwix Desktop #371

Closed
ilyaigpetrov opened this issue Oct 3, 2023 · 23 comments · Fixed by #374

Comments

@ilyaigpetrov
Copy link

I've created a small zim package that includes an extensionless welcome html page.
If I try to open it in kiwix-desktop_x86_64_2.3.1-4.appimage, it triggers a download dialog instead of rendering the page.

# ls example.com
hello.html  main_page  zim_favicon.png
# cat example.com/main_page 
<!DOCTYPE html>
<html>
  <head>
    <meta charset="UTF-8">
    <title>TEST</title>
  </head>
  <body>
Hello from the main page.
  </body>
</html>
# zimwriterfs --welcome=main_page --illustration=zim_favicon.png --language=eng --title=example.com --description=nodescription --name=examplename -L longdescriptionhere --creator=https://github.com/ballerburg9005/wget-2-zim --publisher=wget-2-zim /root/wget-2-zim/example.com /root/wget-2-zim/example.com.zim

zim-main-page-download-triggered

Expected: main_page must be rendered as html.
Actual: it is offered to be downloaded as file.

If I download https://www.mirrorservice.org/sites/download.kiwix.org/zim/wikipedia/wikipedia_ru_chemistry_mini_2023-08.zim and dump its content then I see it has extensionless html files such as A/Яд which are rendered correctly by Kiwix Desktop. I want to reproduce the same behavior for my zim-file.

My higher objective is to save a media-wiki site to a zim.

Thanks.

@mgautierfr
Copy link
Collaborator

You are proposed to download the main_page because the mimetype is not supported.
The mimetype is set by zimwriterfs based on first the extension and then the content (if first method didn't help).

A file -i main_page gives a mimetype text/html so it should be good. I don't know what is the issue here.

My higher objective is to save a media-wiki site to a zim.

You have mwoffliner for that.

@ilyaigpetrov
Copy link
Author

I don't know what is the issue here.

Could you, please, try reproducing the wrong behavior and decide if it's a bug or a mistake on my side?

@ilyaigpetrov
Copy link
Author

example.com.zip

@mgautierfr
Copy link
Collaborator

It works on my side. See example.com.zip
(Rename to example.com.zim)

--

What is the output of file -i main_page on your computer ?
We use the magic library and the configuration data from your computer. We may have a miss configuration here.

@ilyaigpetrov
Copy link
Author

ilyaigpetrov commented Oct 3, 2023

# file -i main_page 
main_page: text/html; charset=us-ascii
# zimwriterfs -V
zim-tools 3.2.0

libzim 8.2.1
+ libzstd 1.5.2
+ liblzma 5.2.6
+ libxapian 1.4.22
+ libicu 58.2.0

Your zim file works as expected.
I launch zimwriterfs from your official docker with this wrapper script:
sudo docker run --rm -it -v /root:/root ghcr.io/openzim/zim-tools zimwriterfs $@.

@ilyaigpetrov
Copy link
Author

Works correctly on my machine for version 3.1.0 from repos of Linux Mint XFCE Edition:

# zimwriterfs -V
zim-tools 3.1.0

libzim 7.2.0
+ libzstd 1.4.8
+ liblzma 5.2.5
+ libxapian 1.4.18
+ libicu 70.1.0
# lsb_release -a
No LSB modules are available.
Distributor ID: Linuxmint
Description:    Linux Mint 21.2
Release:        21.2
Codename:       victoria
# uname -a
Linux X220-LINUX 5.15.0-84-generic #93-Ubuntu SMP Tue Sep 5 17:16:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

I'm going to stick with 3.1.0 for now.

@mgautierfr
Copy link
Collaborator

This is probably a missing package/library in docker.

@rgaudin
Copy link
Member

rgaudin commented Oct 4, 2023

The docker image uses the static build binary of zim-tools. As far as I understand, those binaries should be self-sufficient so if there's a runtime dependency to libmagic, it's a packaging bug and it should be staticaly linked. Adding libmagic would not work with those binaries.

@kelson42
Copy link
Contributor

kelson42 commented Oct 4, 2023

@rgaudin @mgautierfr Might that be that libmagic relies on some kind of data file?

@rgaudin
Copy link
Member

rgaudin commented Oct 4, 2023

no, file -i example.com/main_page on that container properly returns the HTML mime type

@kelson42
Copy link
Contributor

kelson42 commented Oct 4, 2023

@rgaudin OK, weird...

@mgautierfr
Copy link
Collaborator

@rgaudin Their is no file in ghcr.io/openzim/zim-tools:3.2.0-2
How did you test it ?

@mgautierfr
Copy link
Collaborator

I confirm the issue:

Running a zimwriterfs in the docker create a example.com.zim broken:

$ zimdump list --details example.com.zim
path: hello.html
* title:          hello.html
* idx:            0
* type:           item
* mime-type:      text/html
* item size:      26
path: main_page
* title:          main_page
* idx:            1
* type:           item
* mime-type:      application/octet-stream
* item size:      146
path: zim_favicon.png
* title:          zim_favicon.png
* idx:            2
* type:           item
* mime-type:      image/png
* item size:      264

Workaround :

  • Found the magic.mgc file on your computer : /usr/share/misc/magic.mgc on mine (Fedora) (man file is your friend here)
  • Run the docker image with options: -v /usr/share/misc/magic.mgc:/root/magic.mgc -e MAGIC=/root/magic.mgc ...

Then

$ zimdump list --details example.com.zim
path: hello.html
* title:          hello.html
* idx:            0
* type:           item
* mime-type:      text/html
* item size:      26
path: main_page
* title:          TEST
* idx:            1
* type:           item
* mime-type:      text/html
* item size:      146
path: zim_favicon.png
* title:          zim_favicon.png
* idx:            2
* type:           item
* mime-type:      image/png
* item size:      264

We need to add the magic.mgc file (from which source ?) into the image and set the MAGIC env var.

@kelson42
Copy link
Contributor

kelson42 commented Oct 6, 2023

If this is Debian based, apt-file will give the answer.

@rgaudin
Copy link
Member

rgaudin commented Oct 6, 2023

mgc is already included with libmagic, in /usr/share/misc/magic.mgc (see this which is why file returns the correct value from inside the container (see my previous comment).

The issue is that our use of libmagic is not defaulting to that folder as does file.

Simply running container with -e MAGIC=/usr/share/misc/magic.mgc does the trick.

This can be fixed easily in the Dockerfile but the fact that we rely on this file at runtime (with varying results depending on it being present) while using a static binary is disturbing. Can we acknowledge this?

@kelson42
Copy link
Contributor

kelson42 commented Oct 6, 2023

@rgaudin Yes, would be better to include it in the binary.

rgaudin added a commit that referenced this issue Oct 6, 2023
Added libmagic explicitly as a way to document that this is a required runtime dependency
and because the `MAGIC` environ is meant for it but it was already included indirectly.
@rgaudin
Copy link
Member

rgaudin commented Oct 6, 2023

OK, created #372 in case we want a short-term fix

@kelson42
Copy link
Contributor

kelson42 commented Oct 7, 2023

Maybe the solution would be to check if magic.mgc is available at start?

@rgaudin
Copy link
Member

rgaudin commented Oct 7, 2023

Maybe the solution would be to check if magic.mgc is available at start?

What if it's not? Exit? Only solves the unpredictability of results. Still requires a runtime library (and be presented as such).

@kelson42
Copy link
Contributor

kelson42 commented Oct 8, 2023

Maybe the solution would be to check if magic.mgc is available at start?

What if it's not? Exit? Only solves the unpredictability of results. Still requires a runtime library (and be presented as such).

At this stage, I would:

  • DONE: Fix our container image
  • TODO: Document weakness in README, with wokaround
  • TODO: Properly exit if magic.mgc not available (and used?)

Ultimatively:

  • TODO: embedd the magic.mgc in the binary

... But maybe the ultimate solution is overkill...

@rgaudin
Copy link
Member

rgaudin commented Oct 9, 2023

Yes, it's more of a principle issue than a practical one.

I am surprised that zimwriterfs -V does not reference libmagic though. It is one of our dependencies and we do include it in the binary. It should thus be exposed as such.
Another half-way solution would be to include the mgc file in the zim-tools tarball.

@mgautierfr
Copy link
Collaborator

Note that magic.mgc file is not a library. It is a database which is a "compiled" version of the magic file which is a concatenation of all files in https://github.com/file/file/tree/master/magic/Magdir (roughly).

What if it's not? Exit? Only solves the unpredictability of results. Still requires a runtime library (and be presented as such).

magic.mgc is not strictly necessary to zimwriterfs. If it is not present, zimwriterfs can still run and create a (valid) zim file. It will simply use the application/octet-stream mimetype for file it doesn't recognize the extension. (Which is also some kind of "valid" zim file).

I would say we should print a warning or at least propose an option to continue even if the file is not present.

embedd the magic.mgc in the binary

This can be a solution yes.
But for now (we can change that) we include file only in libraries, not binaries.
So it would means we include it in libzim and I think it is definitively overkill.

I am surprised that zimwriterfs -V does not reference libmagic though. It is one of our dependencies and we do include it in the binary. It should thus be exposed as such.

Indeed.


I think that :

  • Update readme and zimwriterfs -V.
  • Stopping the execution with a clear message if magic.mgc is not found.
  • Add a option like --skip-libmagic-check to allow running zimwriterfs without magic.mgc

is good enough.

@rgaudin
Copy link
Member

rgaudin commented Oct 9, 2023

Works for me 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants