Document Classification Activity

The Document Classification activity uses machine learning models to identify and categorize the type of a given document image. It analyzes the visual structure and content of a file (such as invoices, IDs, or contracts) to determine its category based on a pre-trained model and a specified confidence threshold.

Field	Description	Requirement
Model file name	The full file path or identifier of the pre-trained document classification model.	Required
Image file name	The file path of the document image or PDF page to be classified.	Required
Threshold	The minimum confidence score (between 0 and 1) required to validate the classification.	Optional
Charset	The character encoding used for processing text within the document.	Optional
Response variable name	The name of the variable where the identified document type and confidence score will be stored.	Required

Action Types & Examples

Automated Document Sorting

Identifying the type of incoming scanned documents to route them to specific sub-processes.

Format: Image file name: "C:\Scans\Doc_001.png"
Example Result: {"document_type": "Identity_Card", "confidence": 0.98}

Implementation Examples

Field Setup

Model file name: C:\Robusta\Models\DocClassifier.onnx
Image file name: {{LastDownloadedFile}}
Threshold: 0.80
Response variable name: docTypeResult

Execution Parameters

Process: The activity scans the uploaded image against the model. If the confidence exceeds 0.80, the resulting label is stored in docTypeResult.

Technical Notes

You need to train a model first to use this activity. You can ask for assistance from the support department for training your own model. This activity is optimized for image-based inputs; ensure that PDFs are converted to images or that specific pages are targeted if necessary.