Skip to content

Convert Tables to HTML Activity

The Convert Tables to HTML activity enables the extraction and conversion of table data from PDF files into HTML format. This activity processes specified pages within a PDF, identifies table structures, and outputs the data as a clean HTML file.

Field Description Requirement
Pdf Name The reference name for the PDF file to be processed. Required
Table Type Specifies the structure of the table to be converted. Required
Start Page The starting page number within the PDF document for table extraction. Optional
End Page The ending page number within the PDF document for table extraction. Optional
Maximum Font Size The maximum font size to consider during table detection. Optional
Ignore Line Count The number of initial lines from the start page to exclude from the HTML output. Optional
Output Path The full directory path and filename for the generated HTML file. Required

Action Types & Examples

BASIC

  • Format: String
  • Example Result: "BASIC"

COMPLEX

  • Format: String
  • Example Result: "COMPLEX"

STRIPLESS

  • Format: String
  • Example Result: "STRIPLESS"

Implementation Examples

Field Setup - Pdf Name: ${RobustaPdf} - Table Type: BASIC - Start Page: 1 - End Page: 1 - Output Path: C:\Robusta\robusta.html

Execution Parameters - Pdf Name: ${RobustaPdf} - Table Type: BASIC - Start Page: 1 - End Page: 1 - Output Path: C:\Robusta\robusta.html

Technical Notes

Strikethrough lines may occur when the BASIC table type is selected. If this situation is not desired, the table type should be changed to COMPLEX.