Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More implementations of script and job API #189

Open
bertfrees opened this issue Apr 18, 2023 · 5 comments
Open

More implementations of script and job API #189

bertfrees opened this issue Apr 18, 2023 · 5 comments
Assignees

Comments

@bertfrees
Copy link
Member

bertfrees commented Apr 18, 2023

Now that the job and script APIs have been reworked and made more generic (XProc agnostic), the opportunity is there to add other backends.

1. Pipeline 1 backend

A first idea is to add a Pipeline 1 backend. This would provide an alternative to porting scripts from Pipeline 1 to Pipeline 2 and would allow discontinuing the Pipeline 1 GUI and give access to Pipeline 1 scripts in the new Pipeline 2 GUI.

2. Web server backend

A second idea is to add a backend that dispatches jobs to one or more Pipeline web servers. It is a combination of two older idea's and one fresh use case:

  1. Unified Java API. The idea, originally launched by Jostein, has been around for a while to unify the "direct" Java API and the Java client library API to connect to a server over HTTP. The benefit is that users can easily change between the two methods to call Pipeline 2, and that we would uncomplicate and eliminate our code base.

  2. Scaling. The idea to be able to scale Pipeline came from MTM and they have probably implemented something for their specific needs already (but haven't shared it). My idea for addressing this request was to have a web server that would connect and dispatch to multiple other web servers and that would manage the pool of servers based on the load and would also do load balancing based on the load of each server.

  3. A new use case is a project I’m doing that uses Pipeline as a Java library and that needs to run jobs one at a time (one job per invocation of the tool), but ideally without the overhead of starting and stopping the engine each time. The solution could be to fire up a web server that will keep running in the back, also when the process ends, and connect to it each time. There are however a number of hurdles with this and I would much prefer if there was a reusable component that would do it for me:

    • The Java API (after the rework) is more streamlined and rich than the Java client APIs (both the pipeline-clientlib-java/pipeline-clientlib-httpclient and pipeline-clientlib-jaxb versions).
    • As a user I don't want to be bothered with managing the web server process. It could be done automatically behind the scenes.
    • I don't even need to know that the implementation is based on a HTTP server. It doesn't matter how it is implemented.

The new component I'm thinking about would provide all of the above features. It would implement the script and job APIs and would have the following additional configuration parameters:

  • whether to automatically fire up web servers or connect to a existing ones
  • whether to fire up servers using Docker or the normal way
  • connection info (addresses, authentication info)
  • whether to use fixed port numbers or automatically assign free ports
  • location where to store PID and port number info of launched web servers
  • locations of home directories of launched web servers (only when persistence is needed)
  • whether to automatically stop servers when they have no more jobs in their queue or force the user to stop them through an API call
  • load balancing settings
@bertfrees
Copy link
Member Author

CC @rdeltour @josteinaj

@josteinaj
Copy link
Member

josteinaj commented Apr 18, 2023

Maybe this would make it possible to include the nordic scripts again as well. The new version of the nordic scripts are not run using Pipeline 2. But there's still a HTTP API, so that could be invoked in the background.

For the nordic scripts we're planning to "wrap" other APIs when validating books. By that, I mean that we combine validation reports from different tools so that you don't have to invoke them all separately.

We always validate according to the nordic guidelines. But we also currently include Epubcheck and Ace in the Docker image, and use those for additional validation. But I'd like those to run as separate Docker containers (preferably official Docker images) and invoked through an API. It would make it easier to upgrade to newer versions of Ace and Epubcheck without creating a new version of the nordic validator.

Similarly we would have other organization-specific or shared validators that could be plugged in. For instance a MathML validator and a audio book validator, which could run as separate Docker containers and be developed as separate projects.

@egli
Copy link
Member

egli commented Apr 19, 2023

I'm not totally sure what you're asking for with this issue. I guess you're throwing out ideas to get feedback.

  • I like the idea that you make the Java API more complete and unified. OTOH a REST API is quite easy to use from any language (Python or Clojure), so personally I'm more interested in a complete REST API.
  • Pipeline 1 backend: I'm not too sure about that. I can see that it make porting pipeline 1 scripts easier. However from a maintenance point of view it is much better to have a uniform code base where all interaction with scripts is done in one way, namely the Pipeline2 way. As far as I understand there aren't that many Pipeline1 scripts that we really want to port, so I think we should port them properly and in the future only have one way to invoke scripts.
  • Web server backend: I'm not sure I totally understand what you are after. Do you want load balancing? Then why don't you use an off-the-shelf proven existing load balancer? Or since we're heavy into docker images anyway why not go all the way and run the pipeline inside a Kubernetes cluster. That will take care of load balancing, virtual networking, basically all your bullet points if properly configured.
    I know Kubernetes is a whole new pair of shoes, but to me at least, it sounds like you are trying to implement something which might end up like a poor implementation of Kubernetes.

@bertfrees
Copy link
Member Author

personally I'm more interested in a complete REST API

Are there things missing from the REST API in your opinion?

As far as I understand there aren't that many Pipeline1 scripts that we really want to port, so I think we should port them properly and in the future only have one way to invoke scripts.

I agree and in the long run this is still the goal. The problem is that we've been saying for years that we are going to port scripts, but it doesn't happen (or very slowly). Lack of resources, lack of incentives, lack of help from the community, ... the reasons vary. Also, unless we do the porting very carefully, with maximum reuse of existing code, there is the risk that we make things worse from a maintenance point of view. We already have a lot of conversion code that is not uniform. (Despite the efforts we put into it this remains the case.) Without uniformity in the existing code base, it's easy to make things worse when you add new code.

Do you want load balancing?

Yes, but load balancing is just one of the things I mentioned. I want the ability to connect to a web server (that was possibly fired up automatically) from Java. This opens up several possibilities. Load balancing is one part.

Then why don't you use an off-the-shelf proven existing load balancer? [...] it sounds like you are trying to implement something which might end up like a poor implementation of Kubernetes

I'm not saying I want to do things from scratch. Looking at Kubernetes for the scaling part is a great idea. Perhaps Kubernetes can take care of everything we want for scaling so that we only need to connect to one server. Or perhaps we need some more logic on the Java end. The crucial point in my proposal is that I want a unified (REST and Java) API so that people can easily switch to a different backend if they want to optimize or scale their application, without the added complexity and duplication of effort.

@egli
Copy link
Member

egli commented Apr 19, 2023

Are there things missing from the REST API in your opinion?

No, I'm happy with the REST API. I'm just saying that for me the REST API is more important than the Java API.

I agree with the rest of your argument.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants