-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
More implementations of script and job API #189
Comments
Maybe this would make it possible to include the nordic scripts again as well. The new version of the nordic scripts are not run using Pipeline 2. But there's still a HTTP API, so that could be invoked in the background. For the nordic scripts we're planning to "wrap" other APIs when validating books. By that, I mean that we combine validation reports from different tools so that you don't have to invoke them all separately. We always validate according to the nordic guidelines. But we also currently include Epubcheck and Ace in the Docker image, and use those for additional validation. But I'd like those to run as separate Docker containers (preferably official Docker images) and invoked through an API. It would make it easier to upgrade to newer versions of Ace and Epubcheck without creating a new version of the nordic validator. Similarly we would have other organization-specific or shared validators that could be plugged in. For instance a MathML validator and a audio book validator, which could run as separate Docker containers and be developed as separate projects. |
I'm not totally sure what you're asking for with this issue. I guess you're throwing out ideas to get feedback.
|
Are there things missing from the REST API in your opinion?
I agree and in the long run this is still the goal. The problem is that we've been saying for years that we are going to port scripts, but it doesn't happen (or very slowly). Lack of resources, lack of incentives, lack of help from the community, ... the reasons vary. Also, unless we do the porting very carefully, with maximum reuse of existing code, there is the risk that we make things worse from a maintenance point of view. We already have a lot of conversion code that is not uniform. (Despite the efforts we put into it this remains the case.) Without uniformity in the existing code base, it's easy to make things worse when you add new code.
Yes, but load balancing is just one of the things I mentioned. I want the ability to connect to a web server (that was possibly fired up automatically) from Java. This opens up several possibilities. Load balancing is one part.
I'm not saying I want to do things from scratch. Looking at Kubernetes for the scaling part is a great idea. Perhaps Kubernetes can take care of everything we want for scaling so that we only need to connect to one server. Or perhaps we need some more logic on the Java end. The crucial point in my proposal is that I want a unified (REST and Java) API so that people can easily switch to a different backend if they want to optimize or scale their application, without the added complexity and duplication of effort. |
No, I'm happy with the REST API. I'm just saying that for me the REST API is more important than the Java API. I agree with the rest of your argument. |
Now that the job and script APIs have been reworked and made more generic (XProc agnostic), the opportunity is there to add other backends.
1. Pipeline 1 backend
A first idea is to add a Pipeline 1 backend. This would provide an alternative to porting scripts from Pipeline 1 to Pipeline 2 and would allow discontinuing the Pipeline 1 GUI and give access to Pipeline 1 scripts in the new Pipeline 2 GUI.
2. Web server backend
A second idea is to add a backend that dispatches jobs to one or more Pipeline web servers. It is a combination of two older idea's and one fresh use case:
Unified Java API. The idea, originally launched by Jostein, has been around for a while to unify the "direct" Java API and the Java client library API to connect to a server over HTTP. The benefit is that users can easily change between the two methods to call Pipeline 2, and that we would uncomplicate and eliminate our code base.
Scaling. The idea to be able to scale Pipeline came from MTM and they have probably implemented something for their specific needs already (but haven't shared it). My idea for addressing this request was to have a web server that would connect and dispatch to multiple other web servers and that would manage the pool of servers based on the load and would also do load balancing based on the load of each server.
A new use case is a project I’m doing that uses Pipeline as a Java library and that needs to run jobs one at a time (one job per invocation of the tool), but ideally without the overhead of starting and stopping the engine each time. The solution could be to fire up a web server that will keep running in the back, also when the process ends, and connect to it each time. There are however a number of hurdles with this and I would much prefer if there was a reusable component that would do it for me:
The new component I'm thinking about would provide all of the above features. It would implement the script and job APIs and would have the following additional configuration parameters:
The text was updated successfully, but these errors were encountered: