A Document Worker can be used to alter the field names and values contained within the metadata of a document supplied to it. Each instance of a Document Worker will be required to be configured when implementing it to give instructions as when to change document metadata and what to change it to. An example use case for a Document Worker is given below, followed by a brief overview of the modules that make up this repository.
A Document Worker can be completely configured for each individual instance running. This means that multiple Document Workers could be operating and performing different functions simultaneously. In the case of this example let us assume that a case has arisen that required a lookup to check the metadata
fields on documents for a specific value, field or encoding type. Once found the worker would then evaluate its response based on the logic set forth within its implementation. For this example we will assume that it was looking for a field called JOB_TITLE
that contains a value of C-Level-Exec
. Once our example worker looks through the meta data of the document and has found this field and value, it will then implement the custom logic that was set out during its implementation, which for this case will be that if JOB_TITLE
is equal to C-Level-Exec
add a new field to the metadata called IGNORE_FILE
and set the value of this field to true
. The Document Worker will then pass back a result object to the Document Worker converter. The DocumentWorkerResult
object will then contain a map of field names to Lists of DocumentWorkerFieldValue
objects. Each DocumentWorkerFieldValue
object will contain a value and an encoding. The Document Worker has two actions that can be used to achieve three goals. The first goal is add
, this action can be used to add values onto a current field as well as to add a new field to a document's metadata. The second is replace
, and this action can be used to achieve two goals: to replace a current field or data values with information specified by the worker, or to remove metadata from a document. This is achieved by the Document Worker passing back a field name with an action of replace
and a data value of null
. The converter will then recognise that replacing a field with a value of null
is a signal to delete the field and do so. The new metadata fields or values added to a document can then be used in other workers to enable decision making. With regards to this example the new field added to the document IGNORE_FILE
can be used in other workers to make the decision to ignore documents linked to a person with a JOB_TITLE
of C-Level-Exec
.
Below is an illustration showing the communication between a document-worker and Policy
using RabbitMQ
:
This repository defines the public classes that facilitate the creation of mutable and read only document objects. The project can be found in worker-document-utility.
This repository defines public base classes and it can be found in worker-document.
This project is used for centralizing dependency information for a Document Worker. The project can be found in worker-document-framework.
This library defines public interfaces to assist with the implementation of a Document Worker. The project can be found in worker-document-interface.
This is the shared library defining public classes that constitute the worker interface to be used by consumers of the Document Worker. The project can be found in worker-document-shared.
This project is used to define the format of a document that a Document Worker can accept. The project can be found in worker-document-schema.
This project is used to validate the document that a Document Worker can accept. The project can be found in worker-document-validator.
This contains implementations of the testing framework to allow for integration testing of the Document Worker. The project can be found in worker-document-testing.
It is not mandatory that the worker JavaScript file should contain all of the below event handlers.
The event handler will only be triggered, if there is corresponding function from the script file being executed.
/* global thisScript */
function onProcessTask(e)
{
// e.application (read-only)
// e.task (read-only)
// e.rootDocument (read-only)
}
This is the first function called by worker on the task message.
This function is passed TaskEventObject
as an argument .
The structure of the TaskEventObject
is below.
{
"type" : "object",
"properties" : {
"application" : { "type" : "object"},
"task" : { "type" : "object"},
"rootDocument" : { "type" : "object"}
}
}
For more details of the TaskEventObject
, refer the java implementation of the class TaskEventObject.java
function onBeforeProcessDocument(e)
{
// e.application (read-only)
// e.task (read-only)
// e.rootDocument (read-only)
// e.document (read-only)
// e.cancel (writable) (default: false)
}
This event will be executed after onProcessTask
and before processing of a document.
This function is passed CancelableDocumentEventObject
as an argument.
The structure of the CancelableDocumentEventObject
is below.
{
"type" : "object",
"properties" : {
"application" : { "type" : "object"},
"task" : { "type" : "object"},
"rootDocument" : { "type" : "object"},
"document" : { "type" : "object"},
"cancel" : { "type" : "boolean"}
}
}
Set e.cancel = true to cancel processing of the document. This flag is used to determine, if that individual document should be processed by the worker.
If the cancellation flag set to true, onProcessDocument and onAfterProcessDocument will not be triggered and onAfterProcessTask will only be triggered.
For more details of the CancelableDocumentEventObject
, refer the Java implementation of for the class CancelableDocumentEventObject.java
function onProcessDocument(e)
{
// e.application (read-only)
// e.task (read-only)
// e.rootDocument (read-only)
// e.document (read-only)
}
This function is called after onBeforeProcessDocument (if cancellation was not requested).
This function is passed DocumentEventObject
as an argument.
The structure of the DocumentEventObject
is below.
{
"type" : "object",
"properties" : {
"application" : { "type" : "object"},
"task" : { "type" : "object"},
"rootDocument" : { "type" : "object"},
"document" : { "type" : "object"}
}
}
For more details of the DocumentEventObject
, refer to the java implementation for the class DocumentEventObject
function onAfterProcessDocument(e)
{
// e.application (read-only)
// e.task (read-only)
// e.rootDocument (read-only)
// e.document (read-only)
}
This function will be called once the processing of the document completed successfully.
This function is passed DocumentEventObject
as an argument.
The structure of the DocumentEventObject
is explained in onProcessDocument section.
function onAfterProcessTask(e)
{
// e.application (read-only)
// e.task (read-only)
// e.rootDocument (read-only)
}
This is the last function called by worker on the task message.
This function is passed TaskEventObject
as an argument.
The structure of the TaskEventObject
is explained in onProcessTask section.
Irrespective of the value of cancellation flag, this event will be triggered always while processing document.
function onError(errorEvent)
{
// errorEvent.application (read-only)
// errorEvent.task (read-only)
// errorEvent.rootDocument (read-only)
// errorEvent.error (read-only)
// errorEvent.handled (writable) (default: false)
}
This function will be called in case of a failure in the worker that is not handled by the worker code. In chained workers, this will allow continuing to process the document.
This function is passed ErrorEventObject
as an argument.
The structure of the ErrorEventObject
is below.
{
"type" : "object",
"properties" : {
"application" : { "type" : "object"},
"task" : { "type" : "object"},
"rootDocument" : { "type" : "object"},
"error" : { "type" : "object"},
"handled" : { "type" : "boolean"}
}
}
For more details of the ErrorEventObject
, refer the java implementation of the class for the class ErrorEventObject
Set errorEvent.handled = true to indicate if the error was handled. If it is not handled by event handler, the change log section of the document will be updated with the failure details.
The failure details with failureId
and failureMessage
will be updated under addFailure
section of the parent document and sub documents seperately.