Skip to content

Read Text Activity

The Read Text activity extracts all records from a specified PDF file as text. It enables users to retrieve content from an entire document or a defined area within it, storing the result in a designated variable.

Field Description Requirement
Pdf name The reference name for the PDF file to be processed. Required
Start page The starting page number for text extraction. Optional
End page The ending page number for text extraction. Optional
Result variable name The name of the variable that will store the extracted text. Required
X-coordinate The X-coordinate (in pixels) of the top-left corner of the rectangular area to read. This field is used in conjunction with Y-coordinate, Height, and Width to define a specific region for text extraction. Optional
Y-coordinate The Y-coordinate (in pixels) of the top-left corner of the rectangular area to read. This field is used in conjunction with X-coordinate, Height, and Width to define a specific region for text extraction. Optional
Height The height (in pixels) of the rectangular area to read. This field is used in conjunction with X-coordinate, Y-coordinate, and Width to define a specific region for text extraction. Optional
Width The width (in pixels) of the rectangular area to read. This field is used in conjunction with X-coordinate, Y-coordinate, and Height to define a specific region for text extraction. Optional

Action Types & Examples

Text Extraction Result

  • Format: string
  • Example Result: "This is the extracted text from the PDF document."

Implementation Examples

Field Setup - Pdf name: ${RobustaPdf} - Start page: 1 - End page: 3 - Result variable name: readRobustaPdf - X-coordinate: 120 - Y-coordinate: 250 - Height: 50 - Width: 100

Execution Parameters - Not applicable for this activity.

Technical Notes

The coordinate system for defining the reading area (X-coordinate, Y-coordinate, Height, Width) typically originates from the top-left corner of the PDF page, with X increasing to the right and Y increasing downwards. Ensure that the specified coordinates and dimensions fall within the bounds of the PDF page to prevent errors or partial reads.