Convert Tables to HTML Activity

The Convert Tables to HTML activity enables the extraction and conversion of table data from PDF files into HTML format. This activity processes specified pages within a PDF, identifies table structures, and outputs the data as a clean HTML file.

Field	Description	Requirement
Pdf Name	The reference name for the PDF file to be processed.	Required
Table Type	Specifies the structure of the table to be converted.	Required
Start Page	The starting page number within the PDF document for table extraction.	Optional
End Page	The ending page number within the PDF document for table extraction.	Optional
Maximum Font Size	The maximum font size to consider during table detection.	Optional
Ignore Line Count	The number of initial lines from the start page to exclude from the HTML output.	Optional
Output Path	The full directory path and filename for the generated HTML file.	Required

Action Types & Examples

BASIC

Format: String
Example Result: "BASIC"

COMPLEX

Format: String
Example Result: "COMPLEX"

STRIPLESS

Format: String
Example Result: "STRIPLESS"

Implementation Examples

Field Setup - Pdf Name: ${RobustaPdf} - Table Type: BASIC - Start Page: 1 - End Page: 1 - Output Path: C:\Robusta\robusta.html

Execution Parameters - Pdf Name: ${RobustaPdf} - Table Type: BASIC - Start Page: 1 - End Page: 1 - Output Path: C:\Robusta\robusta.html

Technical Notes

Strikethrough lines may occur when the BASIC table type is selected. If this situation is not desired, the table type should be changed to COMPLEX.