Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Calling a wkhtmltopdf microservice instead of a subprocess? #178

Open
mrenoch opened this issue Feb 8, 2021 · 8 comments
Open

Calling a wkhtmltopdf microservice instead of a subprocess? #178

mrenoch opened this issue Feb 8, 2021 · 8 comments

Comments

@mrenoch
Copy link

mrenoch commented Feb 8, 2021

Hi,

Thanks for all your hard work on this django plugin! We use it heavily and it always delivers.

I am wondering if you have you ever considered the possibility of calling a microservice instead of shelling out to a subprocess on the webserver? I recently encountered https://imti.co/webpage-to-pdf-microservice/, which is an open source packaged version of wkhtmltopdf. We want to remove all of the pdf rendering dependencies from the webserver, and this approach looked viable.

Has anyone every attempted modifying django-wkhtmltopdf to use a microservice instead of a local binary? Would you take a PR if we attempted that?

PS - we're also interested in exploring caching strategies also.

best

@mrenoch
Copy link
Author

mrenoch commented Feb 27, 2021

Hi @maxpeterson - Do you know if thiis project still active? Are you accepting PRs or planning new releases? Thanks!

@maxpeterson
Copy link
Member

Hi @mrenoch I am not actively maintaining this. I am no longer using it for any projects so I am afraid it doesn’t get much attention.

As to your original questions, as far as I know no one has attempted to use a microservice rather than calling the binary.

It sounds an interesting idea, but without proper investigation I am not sure whether this library is a good starting point or not.

If you can leverage the existing “wrapping code” and provide an option to call a micro service in place of the binary, without breaking compatibility with the binary then there could be value in using this library.

@mrenoch
Copy link
Author

mrenoch commented Mar 2, 2021

Thanks @maxpeterson!

Do you know if this project is being actively maintained? There hasn't been much activity and I wonder about it's future. Seems like there are many forks, and maybe it "just works", but I am curious about it's future.

Apart from the microservice idea, it seems like this library may also be a good place to introduce a caching layer - am trying to judge if that if work is also interesting to anyone here.

@maxpeterson
Copy link
Member

This project is not actively maintained. Most of the development was done almost a decade ago by the team at Incuna, since then it has received steady support and maintenance from the wider community.

There is no longer an Incuna team to maintain it and my time is limited.

I am not sure how actively it is used, but https://pepy.tech/project/django-wkhtmltopdf suggest it is still fairly widely used so it would be a shame not find a way to keep it going.

If you are willing to help maintain it then I would be grateful of the help. Likewise, if you wan't to attempt to add your microservice and caching ideas then I would be happy help get them merged and released.

I may be a bit delayed in responding, but always feel free to @ me.

@maxpeterson
Copy link
Member

I should also mention that @johnraz has helped a lot with maintenance

@pinoatrome
Copy link

pinoatrome commented Mar 21, 2021

Hi all, I am leading a team that use this library on a number of projects with great satisfaction.

I've just completed a remote execution of the wkhtmltopdf in a remote (to django) container via gRPC.
The initial driver for this implementation is to reduce the size of the django image by moving all the binaries related to pdf creation to a different container (exposed as microservice).

this is the (simple) idea:
the PDFTemplateView uses a custom version of the PDFTemplateResponse that invokes the endpoint in case it is defined in django settings (remote invocation), otherwise goes with the usual path (local invocation).

class SPDFTemplateView(PDFTemplateView):
    response_class = SPDFTemplateResponse

class SPDFTemplateResponse(PDFTemplateResponse):

    @property
    def rendered_content(self):
        endpoint = getattr(settings, 'WKHTMLTOPDF_ENDPOINT', None)
        if not endpoint:
            logger.debug(f'grpc service endpoint missing in settings: will create PDF with local wkhtmltopdf binary')
            return super().rendered_content
        logger.debug(f'rendering content via grpc service @ {endpoint}')
        ....

Then after the templates rendering (in RenderedFile) the files content are passed to a client that invokes the remote service and obtains the bytes of the PDF file.

        ...
        input_file = RenderedFile(
            template=input_template,
            context=context,
            request=request
        )
        ...
        output = self.cmd_options.pop('output', None)
        try:
            content = client.transform(endpoint, input_file.filename, cmd_options=cmd_options,
                                       header=header_filename, footer=footer_filename, cover=cover_filename)
            if output:
                with open(output, 'wb') as pdf_fp:
                    pdf_fp.write(content)
            return bytes(content)
        except exceptions.SDPFRenderClientException as e:
            logger.error(f'error from PDF endpoint: {e}')
            raise

There is a problem with RenderedFile -> the rendered template contains local references to static and media files: for instance the href="/static/css/project.css" is transformed to href="file:///home/user/project/root_static_dir/css/project.css"

This is done when render_to_temporary_file() call make_absolute_paths()

and it is not likely (or too limited) that the two containers for django and the microservice share the same absolute path: guess it is a matter of passing a flag to RenderedFile to call make_absolute_paths (local invocation) or not (remote invocation) in render_to_temporary_file:

    content = smart_text(content)
    content = make_absolute_paths(content) // <-- this should happen on the remote server creating the PDF, not on the client side when rendering the templates.

Caching common common parts (header, footer, cover) could be straightforward: once cached the service could download its content when the cache keys are passed instead of the actual binary content.

Sorry for the long post.

Feel free to contact me for any follow up.
Ciao

@mrenoch
Copy link
Author

mrenoch commented Mar 22, 2021

Hi @pinoatrome!

Thanks for reaching out. I am leading a team that also relies extensively on wk for our main product line. I would love to talk more sometime and learn about your roadmap. Perhaps we can team up and revive this project, under a proper project account.

cheers!
/Jonah

@pinoatrome
Copy link

Hi Jonah, thanks for your interest.
Your proposal sounds great: fell free to contact me, my personal email should be visible on github.

About the microservice implementation it is passing tonight in beta test after solved the absolute path issue: in case of remote call the content of the rendered template (plus header footer and cover) is not saved in any temporary file but streamed to the service -> no need for any change in RenderedFile (simply not using it).

Ciao
Giuseppe

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants