Skip to content

Document Classification Activity

The Document Classification activity uses machine learning models to identify and categorize the type of a given document image. It analyzes the visual structure and content of a file (such as invoices, IDs, or contracts) to determine its category based on a pre-trained model and a specified confidence threshold.

Field Description Requirement
Model file name The full file path or identifier of the pre-trained document classification model. Required
Image file name The file path of the document image or PDF page to be classified. Required
Threshold The minimum confidence score (between 0 and 1) required to validate the classification. Optional
Charset The character encoding used for processing text within the document. Optional
Response variable name The name of the variable where the identified document type and confidence score will be stored. Required

Action Types & Examples

Automated Document Sorting

Identifying the type of incoming scanned documents to route them to specific sub-processes.

  • Format: Image file name: "C:\Scans\Doc_001.png"
  • Example Result: {"document_type": "Identity_Card", "confidence": 0.98}

Implementation Examples

Field Setup

  • Model file name: C:\Robusta\Models\DocClassifier.onnx
  • Image file name: {{LastDownloadedFile}}
  • Threshold: 0.80
  • Response variable name: docTypeResult

Execution Parameters

  • Process: The activity scans the uploaded image against the model. If the confidence exceeds 0.80, the resulting label is stored in docTypeResult.

Technical Notes

You need to train a model first to use this activity. You can ask for assistance from the support department for training your own model. This activity is optimized for image-based inputs; ensure that PDFs are converted to images or that specific pages are targeted if necessary.