Rule-based assignment

This help page explains rule-based assignment in plusmeta.

structure properties

Content of this topic

Basics

Rule-based assignment (RB) is a method in plusmeta for metadata extraction .

Rule-based assignment matches the text with the list of values defined in the metadata. If a match is found, points are awarded. The match scoring the most points is assigned.

How many points are awarded depends on where the match was found. In addition to the names (labels) of the list values, indicators can also be specified. In addition, partial matches can also score points. The frequency of hits is not taken into account.

Scoring of rule-based hits
Rule-based scoring of hits in the text.

There are default values in plusmeta for the points awarded. However, they can also be configured individually.

Indicators

Indicators play an important role in rule-based assignment. Words or string values can be configured as indicators, which are used for the assignment of certain metadata values. Indicators can be synonyms, alternative spellings, or other clue words that typically occur in the texts. If indicators are found in texts, the hits are also included in the scoring. plusmeta has default values for the scoring of indicator hits. How many points Indicators and Indicator parts receive can also be configured individually.

Further information on indicators can be found here.

Customize rule-based assignment

Rule-based assignment is based on configurable rules. The configuration is done via a configuration object. If there is no specific configuration object, the default rules are applied.

Create configuration object

A configuration object can be used to adjust the default values of the rule-based assignment.

  1. Open the Objects view.
  2. Click on the Add button to create a new object.
  3. Select Configuration object as the object type.
  4. Open the lower tab of the Create object dialogue.
  5. Select the template Rule-based prediction .

    Note: If no template is selected during creation, it cannot be added later.

  6. Adjust the desired values in the JSON editor.
  7. Click on CREATE OBJECT .
Configuration object rule-based prediction
Configuration object for rule-based prediction.

Activate configuration object

  1. Open the properties view.
  2. Select the metadata to which you want to assign the configuration object.
  3. Click on the button to open the Edit properties dialogue.
    The Edit properties dialogue opens.
  4. Open the Relations tab.
  5. Click on the Add button to add a relation.
  6. Select the relation uses configuration from the drop-down list.
  7. In the uses configuration field, select the configuration object from the drop-down list.
  8. Click CLOSE.
    All your changes will be saved automatically.
Beziehung Regelbasierte Vorhersage
Activation of the configuration object for rule-based assignment.

Parameter configuration

Modifier Value Function Default value
allowBlocklist True / false Allow / ignore blocklist. Example: “is” and company with the product range “IS” true
allowNumbersOnly True / false True = Pure numerical values are also evaluated ; False = Pure numerical values are not evaluated true
ignoreCase True / false Ignore or consider upper and lower case. false
minFuzzyMatchDistance Percentage (without % character) Specifies the minimum match percentage for fuzzy matches. Fuzzy matches are multiplied by this value. As a result, the score of fuzzy matches is lower. 80
minSearchLength Number ≥ 0 Specifies the minimum length of the searched character string values, e.g. indicators that only have 2 characters are not found if 3 is set. 3
performanceMode True / false Issues fuzzy matches false
rbThresholdMultipleMatches % figure Issues fuzzy matches false
rbNgramSize Number Specifies how many words the word groups contain when the text is split into tokens, e.g. for “1” the tokens consist of single words, for “2” of word pairs, and so on. As soon as multi-word tokens are created (“2” or more), additional 1-word tokens are also created. If the specification is negated, e.g. “-3”, then every intermediate step is also generated (1-, 2- and 3-word tokens).  
0      
scoreIndicatorInText Score value 0 - x Score for hits of an indicator in the text 10
scoreIndicatorInTitle Score value 0 - x Score for matches of an indicator in the title 20
scoreIndicatorInSource Score value 0 - x Score for matches of an indicator in a metadata source 10
scoreIndicatorPartInTitle Score value 0 - x Score for matches of a part of an indicator in the title 5
scoreIndicatorPartInText Score value 0 - x Score for matches of a part of an indicator in the text 2
scoreIndicatorPartInSource Score value 0 - x Score for matches of a part of an indicator in a metadata source 2
scoreLabelInText Score value 0 - x Score for matches of a label in the text 55
scoreLabelInTitle Score value 0 - x Score for matches from a label in the title 85
scoreLabelInSource Score value 0 - x Score for matches of a label in a metadata source 85
scoreLabelPartInText Score value 0 - x Score for matches of a part of a label in the text 10
scoreLabelPartInTitle Score value 0 - x Score for matches of a part of a label in the title 30
scoreLabelPartInTitle Score value 0 - x Score for matches of a part of a label in a metadata source 30
tokenSplitPattern RegEx Regular expression for the splitting of the texts into tokens (?:[^_\.,:"\[\]\(\)\s]+[\.,:]?)+

{"de"=>"Erste Schritte", "en"=>"First Steps"}
{"de"=>"Grundlagen", "en"=>"Basics"}
{"de"=>"Aufbau", "en"=>"Structure"}
{"de"=>"Workflows & Projekte", "en"=>"Workflows & Projects"}
{"de"=>"Metadaten prüfen und vergeben", "en"=>"Assign and approve metadata"}
{"de"=>"Erweitert", "en"=>"Extended"}