{
"image": "base64_encoded_string",
"guid": "optional unique id",
"category": "string",
"task_type": "string",
"fields": ["string"]
}
image
: Base64 encoded image stringcategory
: Document/Domain category (e.g., "bank_cheque", "proof_of_funds")task_type
: Type of task ("extraction", "validation", or "classification")fields
: Array of fields to process or return, depending on the task_type
Examples:
For classification, model will always give class
and confidence
.
For validation, model will always give is_valid
and reason
and confidence
.
For extraction, model will give all fields specified in config.
If we only want subset of whatever model is capturing,
use the outputs
attribute and pass whatever fields are required.
- Extraction:
{
"image": "base64_encoded_string",
"category": "bank_cheque",
"task_type": "extraction",
"outputs": ["name", "net_amount"]
}
- Validation:
{
"image": "base64_encoded_string",
"category": "proof_of_funds",
"task_type": "validation",
"outputs": ["is_valid", "reason"]
}
- Classification:
{
"image": "base64_encoded_string",
"category": "proof_of_funds",
"task_type": "classification",
"outputs": ["class", "confidence"]
}
official_check:
validation:
processor: llm
handler: gemini-1.5-pro
extra_kwargs:
prompt_template: doc_validation_v1.txt
invalid: ['blank', 'blurred', 'dirty']
classification:
extraction:
processor: microservice
handler: http://myextservice/predict/
extra_kwargs: {'entity': 'payee'}
prompt_template:
fields: ['account_no', 'routing_no']
proof_of_funds:
validation:
processor: custom
handler: DocValidationProcessor
extra_kwargs: {'model': 'pof_val_v1.pt'}
prompt_template:
classes: ['valid', 'invalid']
classification:
processor: llm
handler: gemini-pro-vision
extra_kwargs:
prompt_template: doc_classification_v2.txt
classes: ['bank statement', 'payslip', 'invoice']
extraction:
processor: llm
handler: gemini-pro-vision
extra_kwargs:
prompt_template: doc_extraction_v2.txt
fields: ['name', 'document_date', 'net_amount', 'currency']
src/ocrorchestrator
├── artifacts
│ ├── base.py
│ ├── __init__.py
│ ├── model.py
│ └── prompt.py
├── config
│ └── app_config.py
├── datamodels
│ └── api_io.py
├── deps.py
├── __init__.py
├── main.py
├── manager.py
├── managers
│ ├── artifact.py
│ └── processor.py
├── processors
│ ├── base.py
│ ├── factory.py
│ ├── gradio.py
│ ├── __init__.py
│ ├── llm.py
│ ├── microservice.py
│ └── pytorch.py
├── repos
│ ├── base.py
│ ├── factory.py
│ ├── gcs.py
│ ├── __init__.py
│ └── local.py
├── routers.py
└── utils
├── constants.py
├── img.py
├── logging.py
├── misc.py
└── ml.py
When a refresh is triggered (prompt file is updated, new model updated etc):
- ArtifactManager clears its cached prompts and models, then reinitializes
- ProcessorFactory clears its existing processors
- AppConfig updates its internal configuration
The ProcessorFactory reinitializes by:
- Getting the updated configuration from ArtifactManager
- Iterating through the configuration
- Creating new processor instances based on the updated configuration
This ensures that all components are using the most up-to-date configuration, prompts, and models after a refresh. This design allows for dynamic updates to the system's configuration and processing logic without requiring a full restart of the application. It's particularly useful in environments where configurations or models might need to be updated frequently or on-the-fly.