Rule-based assignment

Content of this topic

Basics
Indicators
Customize rule-based assignment

Basics

Rule-based assignment (RB) is a method in plusmeta for metadata extraction .

Rule-based assignment matches the text with the list of values defined in the metadata. If a match is found, points are awarded. The match scoring the most points is assigned.

How many points are awarded depends on where the match was found. In addition to the names (labels) of the list values, indicators can also be specified. In addition, partial matches can also score points. The frequency of hits is not taken into account.

Scoring of rule-based hits — Rule-based scoring of hits in the text.

There are default values in plusmeta for the points awarded. However, they can also be configured individually.

Indicators

Indicators play an important role in rule-based assignment. Words or string values can be configured as indicators, which are used for the assignment of certain metadata values. Indicators can be synonyms, alternative spellings, or other clue words that typically occur in the texts. If indicators are found in texts, the hits are also included in the scoring. plusmeta has default values for the scoring of indicator hits. How many points Indicators and Indicator parts receive can also be configured individually.

Further information on indicators can be found here.

Customize rule-based assignment

Rule-based assignment is based on configurable rules. The configuration is done via a configuration object. If there is no specific configuration object, the default rules are applied.

Create configuration object

A configuration object can be used to adjust the default values of the rule-based assignment.

Open the Objects view.
Click on the Add button to create a new object.
Select Configuration object as the object type.
Open the lower tab of the Create object dialogue.
Select the template Rule-based prediction .

Note: If no template is selected during creation, it cannot be added later.
Adjust the desired values in the JSON editor.
Click on CREATE OBJECT .

Configuration object rule-based prediction — Configuration object for rule-based prediction.

Activate configuration object

Open the properties view.
Select the metadata to which you want to assign the configuration object.
Click on the button to open the Edit properties dialogue.
The Edit properties dialogue opens.
Open the Relations tab.
Click on the Add button to add a relation.
Select the relation uses configuration from the drop-down list.
In the uses configuration field, select the configuration object from the drop-down list.
Click CLOSE.
All your changes will be saved automatically.

Beziehung Regelbasierte Vorhersage — Activation of the configuration object for rule-based assignment.

Parameter configuration

Modifier	Value	Function	Default value
`allowBlocklist`	`True / false`	Allow / ignore blocklist. Example: “is” and company with the product range “IS”	`true`
`allowNumbersOnly`	`True / false`	True = Pure numerical values are also evaluated ; False = Pure numerical values are not evaluated	`true`
`ignoreCase`	`True / false`	Ignore or consider upper and lower case.	`false`
`minFuzzyMatchDistance`	Percentage (without % character)	Specifies the minimum match percentage for fuzzy matches. Fuzzy matches are multiplied by this value. As a result, the score of fuzzy matches is lower.	`80`
`minSearchLength`	Number ≥ 0	Specifies the minimum length of the searched character string values, e.g. indicators that only have 2 characters are not found if 3 is set.	`3`
`performanceMode`	`True / false`	Issues fuzzy matches	`false`
`rbThresholdMultipleMatches`	% figure	Issues fuzzy matches	`false`
`rbNgramSize`	Number	Specifies how many words the word groups contain when the text is split into tokens, e.g. for “1” the tokens consist of single words, for “2” of word pairs, and so on. As soon as multi-word tokens are created (“2” or more), additional 1-word tokens are also created. If the specification is negated, e.g. “-3”, then every intermediate step is also generated (1-, 2- and 3-word tokens).
`0`
`scoreIndicatorInText`	Score value 0 - x	Score for hits of an indicator in the text	`10`
`scoreIndicatorInTitle`	Score value 0 - x	Score for matches of an indicator in the title	`20`
`scoreIndicatorInSource`	Score value 0 - x	Score for matches of an indicator in a metadata source	`10`
`scoreIndicatorPartInTitle`	Score value 0 - x	Score for matches of a part of an indicator in the title	`5`
`scoreIndicatorPartInText`	Score value 0 - x	Score for matches of a part of an indicator in the text	`2`
`scoreIndicatorPartInSource`	Score value 0 - x	Score for matches of a part of an indicator in a metadata source	`2`
`scoreLabelInText`	Score value 0 - x	Score for matches of a label in the text	`55`
`scoreLabelInTitle`	Score value 0 - x	Score for matches from a label in the title	`85`
`scoreLabelInSource`	Score value 0 - x	Score for matches of a label in a metadata source	`85`
`scoreLabelPartInText`	Score value 0 - x	Score for matches of a part of a label in the text	`10`
`scoreLabelPartInTitle`	Score value 0 - x	Score for matches of a part of a label in the title	`30`
`scoreLabelPartInTitle`	Score value 0 - x	Score for matches of a part of a label in a metadata source	`30`
`tokenSplitPattern`	RegEx	Regular expression for the splitting of the texts into tokens	`(?:[^_\.,:"\[\]\(\)\s]+[\.,:]?)+`

First Steps

Basics

Structure

Workflows & Projects

Assign and approve metadata

Extended

Rule-based assignment

Content of this topic

Basics

Indicators

Customize rule-based assignment

Create configuration object

Activate configuration object

Parameter configuration

Related Docs

{"de"=>"Erste Schritte", "en"=>"First Steps"}

{"de"=>"Grundlagen", "en"=>"Basics"}

{"de"=>"Aufbau", "en"=>"Structure"}

{"de"=>"Workflows & Projekte", "en"=>"Workflows & Projects"}

{"de"=>"Metadaten prüfen und vergeben", "en"=>"Assign and approve metadata"}

{"de"=>"Erweitert", "en"=>"Extended"}

plusmeta