Document Classification Activity
The Document Classification activity uses machine learning models to identify and categorize the type of a given document image. It analyzes the visual structure and content of a file (such as invoices, IDs, or contracts) to determine its category based on a pre-trained model and a specified confidence threshold.
| Field | Description | Requirement |
|---|---|---|
| Model file name | The full file path or identifier of the pre-trained document classification model. | Required |
| Image file name | The file path of the document image or PDF page to be classified. | Required |
| Threshold | The minimum confidence score (between 0 and 1) required to validate the classification. | Optional |
| Charset | The character encoding used for processing text within the document. | Optional |
| Response variable name | The name of the variable where the identified document type and confidence score will be stored. | Required |
Action Types & Examples
Automated Document Sorting
Identifying the type of incoming scanned documents to route them to specific sub-processes.
- Format:
Image file name: "C:\Scans\Doc_001.png" - Example Result:
{"document_type": "Identity_Card", "confidence": 0.98}
Implementation Examples
Field Setup
- Model file name:
C:\Robusta\Models\DocClassifier.onnx - Image file name:
{{LastDownloadedFile}} - Threshold:
0.80 - Response variable name:
docTypeResult
Execution Parameters
- Process: The activity scans the uploaded image against the model. If the confidence exceeds
0.80, the resulting label is stored indocTypeResult.
Technical Notes
You need to train a model first to use this activity. You can ask for assistance from the support department for training your own model. This activity is optimized for image-based inputs; ensure that PDFs are converted to images or that specific pages are targeted if necessary.