The workflow Segment and enrich content automatically is used to segment individual documents from an object and enrich them with metadata.
Content of this topic
Basics
In order to use the workflow, a method for automated segmentation must be configured. There are two methods to choose from: ML-based or pattern-based segmentation. A segmentation property must be defined for both methods.
ML-based segmentation
A detailed explanation of how to train AI models can be found here
- In the Objects view, click on the Add button and select Create object manually.
- Assign a name for the object in the display name field, e.g. “PDF segmentation (ML)”
- Click on the tab on the left.
- Click in the Template field and select the PDF segmentation (ML) template from the drop-down menu The predefined values of the object are displayed in the JSON-EDITOR.
- Edit the values in the TEXT CONTENT area.
- After adjusting the setting options, click on CREATE OBJECT .
- Click in the properties view and search for the property on which the model is to be trained (example: chapter).
- Switch to the Relations tab and click on the add button .
- Select the relation has role and assign the role segmentation property (plus:SegmentationProperty) as the relation target.
- Assign the trained AI model with the relation “has assigned AI model” to the property with the role “segmentation property”.
The segment and format documents workflow can be used.
Pattern-based segmentation
- In the Objects view, click on the Add button and select Create object manually.
- Assign a name for the object in the display name field, e.g. “PDF segmentation (regular expression)”
- Click on the tab on the left.
- Click in the Template field and select the PDF segmentation (regular expression) template from the drop-down menu.
The predefined values of the object are displayed in the JSON-EDITOR. - Edit the values in the TEXT CONTENT area.
- After adjusting the setting options, click on CREATE OBJECT .
- Switch to the properties view and search for the “Binary segmentation class” property.
- Switch to the Relations tab and click on the add button .
- Add the relation uses configuration and select the created configuration object (in this case “PDF segmentation (regular expression)”) as the relation target.
- Click CLOSE.
The changes will be saved automatically and the pattern-based segmentation is set according to the settings in the configuration object. The segment and format documents workflow can be used.
Workflow steps
The workflow consists of the following steps:
- Add content
- Detect segments
- Assign and approve metadata
- Generate iiRDS package
Step 1: Add objects
- Create a new project based on the workflow Segment and enrich content automatically .
- Add PDF documents that meet the requirements set in the configuration object.
- Click on the blue arrow to continue to the next workflow step.
Step 2: Recognize segments and metadata
In this step, the segmentation and recognition of the metadata is carried out automatically.
Note: No user intervention is necessary.
Click on the blue arrow to continue to the next workflow step.
Step 3: Check metadata
- Check the segments and adjust if necessary, e.g. small adjustments via manual segmentation or adjustment/extension of the configuration object.
- Check and remove the metadata on the objects.
Step 4: Generate iiRDS package
- The iiRDS package is generated automatically and is then available for download.
- Click on the button to download the iiRDS package.
- If the workflow returns to a previous step and something has been changed, the generation of the iiRDS package can be executed again. This ensures that the changes in the iiRDS package are taken into account. Click on the button to run the program again.