Extractors

This help page explains the "Extractors" method for metadata extraction.

metadata

Content of this topic

Basics

Extractors work according to the principle of “pattern matching”. A regular expression is used to define a search pattern, e.g. a specific string value and format: “Serial number: xx-xxx-xxx”.

plusmeta searches the text for passages that match the pattern and extracts the specified part, e.g. 12-345-678, and writes it into the metadata field. Typical examples of this method of metadata extraction are number sequences such as serial numbers or dates.

This function must be activated via a configuration object.

Create configuration object

  1. Open the Objects view.
  2. Click on the Add button to create a new object.
  3. Select Configuration object as the object type.
  4. Open the lower tab of the Create object dialogue.
  5. Select the template Regular expression.

    Note: If no template is selected during creation, it cannot be added later.

  6. In the JSON EDITOR, click on the line extractor and set to true using the toggle switch .
  7. Click on the three dots in the extractorOptions line.
  8. Replace the regular expression in the line pattern with your own regular expression (e.g. “(serial number:)\s(\d{3})”).
  9. If required, activate line-based processing (multiLine) and ignore upper and lower case (caseInsensitive) or deactivate the conversion of line breaks to spaces (convertNewlineToSpace).
  10. In the match field, enter the desired group whose content should appear in the metadata field.

    Note: If only the actual serial number is to be extracted in the example "(serial number:)\s(\d{3})", then only group 2 would be matched.

  11. In the multiMatch field, select whether multiple values may be extracted (true) or only one (false).
  12. If required, remove one or more values from the list in the scope field.

    Note: The scope defines where the extractor is applied: in the text (text) in the titles (title) or in the properties with the role "Source for metadata recognition" (sources).

  13. Click on CREATE OBJECT.
Create configuration object
Creating a configuration object for regular expressions in plusmeta.
Info: This help page is still under construction and currently not fully available in your language. If you have any questions about this topic, please contact our support via email: support@plusmeta.de

Use extractor

  1. Open the Properties view .
  2. Select the metadata to which you want to assign the configuration object.
  3. Click on the button to open the Edit properties dialogue.
    The Edit properties dialogue opens.
  4. Open the Relations tab .
  5. Click on the Add button to add a relation.
  6. Select the relation uses configuration from the drop-down list.
  7. In the uses configuration field, select the configuration object from the drop-down list.
  8. Switch to the Attributes tab .
  9. Click on the add button and select and activate the attribute auto assignment .
  10. Click CLOSE.
    The changes are saved automatically.
Create configuration objects
Activating an extractor in plusmeta.

Modify string

It is possible to modify the character string extracted by the regular expression with additional character strings. These can be prefixes and suffixes as well as inserted character strings between groups.
This can be used to clean up results or to add characters to the metadata value.

Example: An “S-“ should always be written before the extracted serial number.

  1. Open the Objects view.
  2. Open the configuration object of type Regular expression.
  3. In the tab in the JSON EDITOR in the line match , enter the prefix “S-“ before the group that is to be matched.
  4. Click CLOSE.
    The changes will be saved automatically.
Modify character strings
Modify a string value.

Auto extractors

Extractors can also be created without writing regular expressions using the “Auto extractor” function. Auto-extractors are created without configuration objects and are intended for metadata with units, e.g. height or weight.

Create property

  1. Open the Properties view .
  2. Click the Add button and then click the button.
    The Create property dialogue opens.
  3. Assign the class metadata .
  4. Select the appropriate data type (number for pure numerical values, string value for words or mixed values).
  5. Assign a label.
  6. Click on CREATE.
    All your changes will be saved automatically.

Activate Auto extractors on property

  1. On the property, click the button to open the Edit properties dialogue.
    The Edit properties dialogue opens.
  2. Switch to the Attributes tab.
  3. Click on the Add button and select the attribute Unit .
  4. In the Unit field, enter the unit as it can be found in the content.
  5. Click on the add button and select and activate the attribute auto assignment .
  6. Click CLOSE.
    The changes are saved automatically.
Create configuration objects
Activating an extractor in plusmeta.

Activate Auto extractors in project

  1. When creating a new project, open the project settings using the button.
    For existing projects, click on the settings button at the top left of the work view.
  2. Activate the Auto extractors toggle switch.
    The Auto extractors are used in the project.

{"de"=>"Erste Schritte", "en"=>"First Steps"}
{"de"=>"Grundlagen", "en"=>"Basics"}
{"de"=>"Aufbau", "en"=>"Structure"}
{"de"=>"Workflows & Projekte", "en"=>"Workflows & Projects"}
{"de"=>"Metadaten prüfen und vergeben", "en"=>"Assign and approve metadata"}
{"de"=>"Erweitert", "en"=>"Extended"}