Home
Home
German Version
Support
Impressum
25.2 Release ►

Start Chat with Collection

    Main Navigation

    • Preparation
      • Connectors
      • Create an InSpire VM on Hyper-V
      • Initial Startup for G7 appliances
      • Setup InSpire G7 primary and Standby Appliances
    • Datasources
      • Configuration - Atlassian Confluence Connector
      • Configuration - Best Bets Connector
      • Configuration - Box Connector
      • Configuration - COYO Connector
      • Configuration - Data Integration Connector
      • Configuration - Documentum Connector
      • Configuration - Dropbox Connector
      • Configuration - Egnyte Connector
      • Configuration - GitHub Connector
      • Configuration - Google Drive Connector
      • Configuration - GSA Adapter Service
      • Configuration - HL7 Connector
      • Configuration - IBM Connections Connector
      • Configuration - IBM Lotus Connector
      • Configuration - Jira Connector
      • Configuration - JVM Launcher Service
      • Configuration - LDAP Connector
      • Configuration - Microsoft Azure Principal Resolution Service
      • Configuration - Microsoft Dynamics CRM Connector
      • Configuration - Microsoft Exchange Connector
      • Configuration - Microsoft File Connector (Legacy)
      • Configuration - Microsoft File Connector
      • Configuration - Microsoft Graph Connector
      • Configuration - Microsoft Loop Connector
      • Configuration - Microsoft Project Connector
      • Configuration - Microsoft SharePoint Connector
      • Configuration - Microsoft SharePoint Online Connector
      • Configuration - Microsoft Stream Connector
      • Configuration - Microsoft Teams Connector
      • Configuration - Salesforce Connector
      • Configuration - SCIM Principal Resolution Service
      • Configuration - SemanticWeb Connector
      • Configuration - ServiceNow Connector
      • Configuration - Web Connector
      • Configuration - Yammer Connector
      • Data Integration Guide with SQL Database by Example
      • Indexing user-specific properties (Documentum)
      • Installation & Configuration - Atlassian Confluence Sitemap Generator Add-On
      • Installation & Configuration - Caching Principal Resolution Service
      • Installation & Configuration - Mindbreeze InSpire Insight Apps in Microsoft SharePoint On-Prem
      • Mindbreeze InSpire Insight Apps in Microsoft SharePoint Online
      • Mindbreeze Web Parts for Microsoft SharePoint
      • User Defined Properties (SharePoint 2013 Connector)
      • Whitepaper - Mindbreeze InSpire Insight Apps in Salesforce
      • Whitepaper - Web Connector - Setting Up Advanced Javascript Usecases
    • Configuration
      • CAS_Authentication
      • Configuration - Alerts
      • Configuration - Alternative Search Suggestions and Automatic Search Expansion
      • Configuration - Back-End Credentials
      • Configuration - Chinese Tokenization Plugin (Jieba)
      • Configuration - CJK Tokenizer Plugin
      • Configuration - Collected Results
      • Configuration - CSV Metadata Mapping Item Transformation Service
      • Configuration - Entity Recognition
      • Configuration - Exporting Results
      • Configuration - External Query Service
      • Configuration - Filter Plugins
      • Configuration - GSA Late Binding Authentication
      • Configuration - Identity Conversion Service - Replacement Conversion
      • Configuration - InceptionImageFilter
      • Configuration - Index-Servlets
      • Configuration - InSpire AI Chat and Insight Services for Retrieval Augmented Generation
      • Configuration - Item Property Generator
      • Configuration - Japanese Language Tokenizer
      • Configuration - Kerberos Authentication
      • Configuration - Management Center Menu
      • Configuration - Metadata Enrichment
      • Configuration - Metadata Reference Builder Plugin
      • Configuration - Mindbreeze Proxy Environment (Remote Connector)
      • Configuration - Personalized Relevance
      • Configuration - Plugin Installation
      • Configuration - Principal Validation Plugin
      • Configuration - Profile
      • Configuration - Reporting Query Logs
      • Configuration - Reporting Query Performance Tests
      • Configuration - Request Header Session Authentication
      • Configuration - Shared Configuration (Windows)
      • Configuration - Vocabularies for Synonyms and Suggest
      • Configuration of Thumbnail Images
      • Cookie-Authentication
      • Documentation - Mindbreeze InSpire
      • I18n Item Transformation
      • Installation & Configuration - Outlook Add-In
      • Installation - GSA Base Configuration Package
      • JWT Authentication
      • Language detection - LanguageDetector Plugin
      • Mindbreeze Personalization
      • Mindbreeze Property Expression Language
      • Mindbreeze Query Expression Transformation
      • SAML-based Authentication
      • Trusted Peer Authentication for Mindbreeze InSpire
      • Using the InSpire Snapshot for Development in a CI_CD Scenario
      • Whitepaper - AI Chat
      • Whitepaper - Create a Google Compute Cloud Virtual Machine InSpire Appliance
      • Whitepaper - Create a Microsoft Azure Virtual Machine InSpire Appliance
      • Whitepaper - Create AWS 10M InSpire Appliance
      • Whitepaper - Create AWS 1M InSpire Appliance
      • Whitepaper - Create AWS 2M InSpire Appliance
      • Whitepaper - Create Oracle Cloud 10M InSpire Application
      • Whitepaper - Create Oracle Cloud 1M InSpire Application
      • Whitepaper - MMC_ Services
      • Whitepaper - Natural Language Question Answering (NLQA)
      • Whitepaper - SSO with Microsoft AAD or AD FS
      • Whitepaper - Text Classification Insight Services
    • Operations
      • Adjusting the InSpire Host OpenSSH Settings - Set LoginGraceTime to 0 (Mitigation for CVE-2024-6387)
      • app.telemetry Statistics Regarding Search Queries
      • CIS Level 2 Hardening - Setting SELinux to Enforcing mode
      • Configuration - app.telemetry dashboards for usage analysis
      • Configuration - Usage Analysis
      • Deletion of Hard Disks
      • Handbook - Backup & Restore
      • Handbook - Command Line Tools
      • Handbook - Distributed Operation (G7)
      • Handbook - Filemanager
      • Handbook - Indexing and Search Logs
      • Handbook - Updates and Downgrades
      • Index Operating Concepts
      • Inspire Diagnostics and Resource Monitoring
      • Provision of app.telemetry Information on G7 Appliances via SNMPv3
      • Restoring to As-Delivered Condition
      • Whitepaper - Administration of Insight Services for Retrieval Augmented Generation
    • User Manual
      • Browser Extension
      • Cheat Sheet
      • iOS App
      • Keyboard Operation
    • SDK
      • api.chat.v1beta.generate Interface Description
      • api.v2.alertstrigger Interface Description
      • api.v2.export Interface Description
      • api.v2.personalization Interface Description
      • api.v2.search Interface Description
      • api.v2.suggest Interface Description
      • api.v3.admin.SnapshotService Interface Description
      • Debugging (Eclipse)
      • Developing an API V2 search request response transformer
      • Developing Item Transformation and Post Filter Plugins with the Mindbreeze SDK
      • Development of a Query Expression Transformer
      • Development of Insight Apps
      • Embedding the Insight App Designer
      • Java API Interface Description
      • OpenAPI Interface Description
    • Release Notes
      • Release Notes 20.1 Release - Mindbreeze InSpire
      • Release Notes 20.2 Release - Mindbreeze InSpire
      • Release Notes 20.3 Release - Mindbreeze InSpire
      • Release Notes 20.4 Release - Mindbreeze InSpire
      • Release Notes 20.5 Release - Mindbreeze InSpire
      • Release Notes 21.1 Release - Mindbreeze InSpire
      • Release Notes 21.2 Release - Mindbreeze InSpire
      • Release Notes 21.3 Release - Mindbreeze InSpire
      • Release Notes 22.1 Release - Mindbreeze InSpire
      • Release Notes 22.2 Release - Mindbreeze InSpire
      • Release Notes 22.3 Release - Mindbreeze InSpire
      • Release Notes 23.1 Release - Mindbreeze InSpire
      • Release Notes 23.2 Release - Mindbreeze InSpire
      • Release Notes 23.3 Release - Mindbreeze InSpire
      • Release Notes 23.4 Release - Mindbreeze InSpire
      • Release Notes 23.5 Release - Mindbreeze InSpire
      • Release Notes 23.6 Release - Mindbreeze InSpire
      • Release Notes 23.7 Release - Mindbreeze InSpire
      • Release Notes 24.1 Release - Mindbreeze InSpire
      • Release Notes 24.2 Release - Mindbreeze InSpire
      • Release Notes 24.3 Release - Mindbreeze InSpire
      • Release Notes 24.4 Release - Mindbreeze InSpire
      • Release Notes 24.5 Release - Mindbreeze InSpire
      • Release Notes 24.6 Release - Mindbreeze InSpire
      • Release Notes 24.7 Release - Mindbreeze InSpire
      • Release Notes 24.8 Release - Mindbreeze InSpire
      • Release Notes 25.1 Release - Mindbreeze InSpire
      • Release Notes 25.2 Release - Mindbreeze InSpire
    • Security
      • Known Vulnerablities
    • Product Information
      • Product Information - Mindbreeze InSpire - Standby
      • Product Information - Mindbreeze InSpire
    Home

    Path

    Sure, you can handle it. But should you?
    Let our experts manage the tech maintenance while you focus on your business.
    See Consulting Packages

    Query Expression Transformation
    Mindbreeze Query Transformer Plugins

    Mindbreeze Query TransformationPermanent link for this heading

    Mindbreeze provides a list of query transformation services for automatic modification of search queries for better search results.

    On the one hand there are the plugin-based extension points that can be loaded on demand into a Mindbreeze installation:

    • Synonym Transformer
    • Replacement Transformer
    • On the other hand there are integrated product features for easier finding the desired results (e.g. by enrichment of indexed documents with additional metadata):
    • “Did you mean?”
    • Entity Recognition
    • CSV Transformation

    Query Transformation PluginsPermanent link for this heading

    In order to use any of the query transformation services each of them has to be installed into your Mindbreeze installation by means of loading the corresponding plugin (they are delivered within the “Mindbreeze Query Transformation Plugins.zip” package).

    The plugin also needs to be included in your Mindbreeze license.

    Synonym Transformer PluginPermanent link for this heading

    The SynonymTransformer-Plugin allows you to find search results by looking for different synonyms of a word. Therefore, the query is transformed to search for every term listed in the synonyms list.

    Usage: The synonyms can be defined in the Mindbreeze Management Center under "Search Experience" > "Synonyms". Behind this is a CSV file in which a set of synonyms are written in one line, separated by a semicolon (;).

    Example of a small synonym.csv file:

    car;vehicle;automobile

    plane;airplane;aeroplane

    Example 1: a search for car sends the transformed query: car OR vehicle OR automobile

    Example 2: a search for plane sends the transformed query: plane OR airplane OR aeroplane

    Note: The term in first column is used to match on your query. Only single words without spaces are supported in the first column to be matched on.

    InstallationPermanent link for this heading

    • Install the plugin with the Manager UI
    • Activate the plugin for every Index you want (with the Manager UI)
      • Switch to “Indices”-tab, activate “Advanced Settings”
      • Scroll down to the section “Query Transformation Services”
      • Select the “SynonymTransformer”-plugin and click “Add”
    • Add the path to the CSV-file containing the synonym definitions as “Custom Plugin Properties”
      • Add a new property with the name “SYNONYM_CSV_FILE_PATH”
      • And assign a value with the path to the CSV-file (either as local file system path or as network path appropriate for the used operating system)

    Example 1:  SYNONYM_CSV_FILE_PATH  C:\data\synonyms.csv

    Example 2:  SYNONYM_CSV_FILE_PATH  \\fileserver.mydomain.com\mes-config\synonyms.csv

    Finally save the configuration changes and restart the Mindbreeze Node to propagate all changes.

    Note: Any change to the synonym CSV file is applied immediately and will be regarded on the next search.

    Replacement Transformer PluginPermanent link for this heading

    The ReplacementTransformer-Plugin is often used to replace unreasonable search terms with better ones or even to disallow search terms.

    The main difference to the Synonym transformer plugin is that the original query is really replaced with a new one and will not be shown in the reporting of search terms. The Replacement transformer can therefore be used to hide search results found by users and replace them by something else (e.g. to hide a legacy page and show the new version).

    Usage: The terms to be replaced can be defined in the Mindbreeze Management Center under "Search Experience" > "Replacements". Behind this is a CSV file, where the first column defines the term to be replaced. The following columns are taken as disjunctive (OR-combined) replacement value (if empty the term will not be searched for).
    Every new search term that should be replaced has to be written on a new line and the columns have to be separated with a semi-colon (;).

    Example of a small replacement.csv file:

    car;mercedes;bmw;audi

    party

    Example 1: a search for car sends the transformed query: mercedes OR bmw OR audi

    Example 2: a search for party will not find any results as it is replaced by an “empty” search

    InstallationPermanent link for this heading

    • Install the plugin with the Manager UI
    • Activate the plugin for every Index you want (with the Manager UI)
      • Switch to “Indices”-tab, activate “Advanced Settings”
      • Scroll down to the section “Query Transformation Services”
      • Select the “ReplacementTransformer”-plugin and click “Add”
    • Add the path to the CSV-file containing the replacement definitions as “Custom Plugin Properties”
      • Add a new property with the name “REPLACEMENT_CSV_FILE_PATH”
      • And assign a value with the path to the CSV-file (either as local file system path or as network path appropriate for the used operating system)

    Example 1:  REPLACEMENT_CSV_FILE_PATH  C:\data\replacements.csv

    Example 2:  REPLACEMENT_CSV_FILE_PATH  \\fileserver.x.y\config\replacements.csv

    Finally save the configuration changes and restart the Mindbreeze Node to propagate all changes.

    Note: Any change to the replacement CSV file is applied immediately and will be regarded on the next search.

    General Notes on Transformer Plugins (Replacement/Synonym)Permanent link for this heading

    Note: If you are using both plugins (Synonym-Transformer and Replacement-Transformer) the Replacement-Transformer is applied first!

    The following screenshot displays the configuration of both plugins within the Mindbreeze Manager Interface.

    Note: Any change to the synonym CSV file is applied immediately and will be regarded on the next search.

    Stemmer transformer pluginPermanent link for this heading

    The stemmer transformer plugin allows you to find search results by searching for different stems of a word based on linguistic characteristics of the defined language.

    Use: The basic algorithm to find suitable word stems is implemented in the supplied plugin. An additional dictionary with vocabularies of a specific language is available for the most common languages and is used to improve the search results.

    In addition, so-called transliterations can also be carried out with the help of the stemmer transformer. In the process, characters are rewritten using rules. Both the original term and the rewritten term are then taken into account in the search.

    Example:

    A search for leaf will find matches like leaf and leaves.

    Installation/configurationPermanent link for this heading

    • Install the plugin (if not already installed)
    • Enable the plugin for each desired index using the Manager UI:
      • Go to the “Indices”  tab and enable “Advanced Settings”
      • Scroll down to the section “Query Transformation Services“
      • Select the “ StemmerTransformer” plugin and click “Add”

    • Configuring properties (depending on use)

    Languages: The languages of the stemmer. One or more languages are permitted. The languages must be separated by commas or line breaks.

    Path to vocabulary: A local path on the appliance that contains a vocabulary, so that the extension can be executed without just the reduction to stems (e.g. search for “tree” should also find “trees”).

    Stemmer enabled: If checked, the stemmer is used.

    Case sensitive: If this option is checked, the reduction of the stems is carried out taking upper and lower case into account (case-sensitive). This can produce more precise – but also fewer – stems. Note: The stem extension vocabulary is always used with no regard to upper and lower case (case-insensitive).

    Auto detect language from query: The stemmer tries to derive the language from the search query.

    Variants Boosting Factor: Defines the boosting factor for variations. Variantions are the terms that the stemmer generates. This includes root forms, expansions and transliterations. With this factor, for example, the priority of the variations can be reduced so that the meaning of the original value is retained.

    Transliterate all variants: This option allows the stemmer to expand the query to include all matching transliterations.

    TransliterationRule: Rules for rewriting strings in terms. The following rules can be used: http://icu-project.org/apiref/icu4j/com/ibm/icu/text/RuleBasedTransliterator.html

    Excluded Words Path: This option allows you to exclude certain words from stemming. To do so, create a text file with words you want to exclude (1 word per line) and configure the path of the text file in the option.

    Add Single Term Alternatives as Alias: With this option active, in case of query expressions of type “terms”, Synonyms for a single term are added as “alias” instead of “alternative” entries in the transformed query.

    Then save the changes and restart the Mindbreeze node so that the changes take effect.

    Use case: multilingual stemming.Permanent link for this heading

    If Mindbreeze is used with multiple languages, it makes sense to configure the stemmer transformer plugin for multiple languages to deliver matching search results for all languages used.

    The configuration option "Languages" can be used to configure several languages. The stemmer will then attempt to find stem forms in a search query for each configured language. All stem forms found for all configured languages are then used for the transformation.

    If different stem forms of different languages are used together, the search may become too fuzzy and deliver irrelevant search results. To counteract this behavior, you can use the configuration option "Auto detect language from query". If this option is active, a heuristic will be used to determine the language of the search query. Note: The heuristic only determines languages that are configured via the configuration option "Languages". The languages determined are then used for stemming. This means that only the specific language of a search query is used for stemming.

    The stemmer vocabulary must be adapted so that expanding the stem forms also works correctly with multiple languages. The stemmer vocabulary ("Path to Vocabulary") is an unsorted text file containing words and has one word in each line. The stemmer plugin reads this text file and creates stem forms for every single word and links the information about which words have the same stem form. This information is used in a search to expand the search term. For example, a search for “tree” should also find “trees.” The language used by the stemmer to find the stem forms in the vocabulary follows the same rules as those used to find the stem forms for a search term. All configured languages are used, or, if the configuration option "Auto detect language from query" is enabled, a heuristic is used to determine the language of a word in the vocabulary. We recommend expanding the vocabulary text file for each configured language. This can be done by simple concatenation – the words do not have to be sorted.

    Limitations of the stemmer transformer pluginPermanent link for this heading

    Stem forms vs. synonymsPermanent link for this heading

    The stemmer uses a primitive algorithm to find stem forms of a word and expands the search query additionally with a vocabulary. However, this only covers minor variations of a word (a few changed letters). This functionality is very useful for the majority of search queries, but may not be sufficient in special cases.

    If the expansion of a word (tree trees) is not working correctly, you can take the following measures:

    • If no vocabulary is being used, a vocabulary should be configured.
    • If an extensive vocabulary is already in use, we recommend including the corresponding word with synonyms in a synonym transformer. If the vocabulary were to be expanded, there would be no guarantee of success, since the existing vocabulary is usually very extensive and the stemmer uses a naive algorithm. If, however, you add a new synonym, you will definitely be able achieve the desired effect.

    Known words that are difficult to stemPermanent link for this heading

    There are some words for which the stemmer transformer cannot correctly determine the respective stem forms. Known words in the language german are: “Autos,” “Nudeln,” and “Kiwis.” If these words affect the search quality, it is advisable to use a synonym transformer.

    Term2DocumentBoost transformer pluginPermanent link for this heading

    The Term2DocumentBoost plugin enables relevance tuning for search queries. You can perform the following use cases:

    1. Increase the relevance of particular documents for certain search queries. For example, a search for “help” can be tailored so that documents with the keyword “documentation,” for instance, are assigned a higher relevance in this search.
    2. Generally increase the relevance of certain documents. For instance, all documents with the keyword “Mindbreeze” can be assigned a higher relevance.
    3. Increase the relevance for matching metadata. For example, if you search for any person (search term: “John Smith”), documents by this person (metadata: “Author”) can receive a higher relevance.
    4. Generally influence the entire relevance model. For instance, change the relevance factor “Term Frequency” to change the priority of the frequency of search hits in the document.

    InstallationPermanent link for this heading

    • Install the plugin using the Manager UI
    • Enable the plugin for each desired index using the Manager UI:
      • Go to the “Indices”  tab and enable “Advanced Settings”
      • Scroll down to the section “Query Transformation Services“
      • Select the “ Term2DocumentBoost” plugin and click “Add”
    • The plugin is configured via 2 files. The
      • "Term to Document Boost CSV File" is required for use cases 1, 2, and 3.
      • "Default Relevance Options JSON File" is required for use case 4.
    • Configure the settings
    • “Term to Document Boost CSV File Path”
    • Path of the CSV file
    • “Default Relevance Options JSON File Path”
    • Path of the JSON file

    Then save the changes and restart the Mindbreeze node so that the changes take effect.

    • Optional Settings:

    „Use Normalization”

    It is recommended to enable this setting.

    When enabled, the capitalization of search terms and relevant document strings is normalized and whitespace characters are combined into a single space character when matching the boosting rules.

    „Boost Quoted Terms”

    It is recommended to enable this setting.

    When enabled, search terms that are set under quotation marks (for an exact search) are also boosted. Otherwise, these search terms are ignored and excluded from boosting.

    ConfigurationPermanent link for this heading

    General description of the Term to Document Boost CSV file formatPermanent link for this heading

    The CSV file contains one row for each boosting, which in turn contains the following columns:

    • Term: the search term
    • Metadata key: the name of the metadata property to which the boosting is to be applied
    • Pattern: a pattern that determines the value to be boosted
    • Boost: the boost factor
    • Query: Optional. Expanded configuration. See the Configuration via Query section

    Only DocumentInfo metadata (i.e. data that is either aggregatable or regexmatchable) can be used as property here. A list of these properties is available in the designer under "Filter".

    If several rules match at the same time, the rule with the largest boost factor is used. However, this behavior could change in future versions.

    Note: Any change in the CSV file is applied immediately and will be reflected in the next search.

    Calculation of the final valuePermanent link for this heading

    The final value with which the boost is performed is obtained by multiplying the configured boost factors. The different boost factors are defined with a numeric value, where some boost factors only have a numeric value and some have a numeric value and an exponent. While the base value of each boost factor is set by the application, the exponent can be set by the user. The value of the exponent can be set in the following settings:

    • Zone Boost Exponent
    • Term Boost Exponent
    • Doc Boost Exponent
    • Answer Doc Boost Exponent
    • Term Match Exponent
    • Term Boost IDF Exponent
    • Term Boost Zone Coverage Exponent

    For the exponent, a value between 0 and 1 can be set. A value of 0 disables the boost factor, as this will result in a total value of 1. Here are two examples for a better understanding.

    Example 1:

    Based on the Boosting setting, the base has the value of 4. Now the Term Boost Exponent is set to 0,5. This results in the following calculation and the final value for the boost factor:

    40,5 = 2.

    This results in a final boost factor of 2.

    Example 2:

    Based on the Boosting setting, the base has the value of 10. Since the settings for the exponent (e.g. Zone Boost Exponent) have not been further configured, the exponent has a value of 0. This results in the following calculation:

    100 = 1.

    With this, the final boost factor has the value of 1, which disables the boost factor.

    Recommended boost factorsPermanent link for this heading

    The recommended range for the boost factors is between 1 and 10. If a higher factor is used, other fine adjustments can be unintentionally influenced. The use of a boost factor between 1 and 10 can be used in the following functions:

    • Zone Boosting
    • Document Boosting
    • Term Boosting

    For more information, see Mindbreeze Query Expression Transformation - Zone boosting (metadata boosting), Mindbreeze Query Expression Transformation - Document boosting (alternative to Term to Document Boost CSV) and Mindbreeze Query Expression Transformation - Term boosting (term and Ngram boosts).

    Quick start guide for configurationPermanent link for this heading

    This chapter provides a general overview of the steps required to configure boosting. It is important to note that extensive testing is required to verify and adjust the configured boost factors. The following steps must be performed:

    • Configuration of Relevance Factors (if default values are not sufficient)
    • Optional configuration of Document Boosting
    • Optional configuration of Zone Boosting
    • Optional configuration of Term Boosting
    • Optional configuration of Additive Document Boosting

    The Relevance Factors represent the basic boosting configuration and are therefore essential for all use cases. Since the Relevance Factors are global settings, they affect other settings. For the most use cases, the default values are sufficient. If a change of the default values is necessary, the new values must be thoroughly tested. As Relevance Factors are global settings, changes to the values can affect already configured boosts. For more information on Relevance Factors, see Mindbreeze Query Expression Transformation - Relevance factors (term frequency, document frequency).

    After the Relevance Factors, boosting can be fine-tuned by configuring Document Boosting, Zone Boosting and Term Boosting. Document Boosting is recommended as the first additional configuration as it is the easiest to perform. It is important to test the settings thoroughly to check the position of the document in the search results. In addition, irrelevant searches should also be performed to check the position of the document in such cases. Depending on the results, the values set in Document Boosting must be readjusted.

    If there are metadata where the match should be weighted more heavily, Zone Boosting is recommended. If more complex configurations are required, Term Boosting should be used. Once all the required configurations have been carried out, the Personalised Relevancy Transformer can be used for additional fine tuning. The configuration of Additive Document Boosting is recommended according to the use case.

    Use case: increase the relevance of particular documents for certain search queriesPermanent link for this heading

    Example for a CSV file:

    Term;Metadata Key;Pattern;Boost

    help;title;portal help|intranet help;5

    When a user performs a search for help, documents containing the terms portal help or intranet help in the title will be boosted by a factor of 5.

    Use case: increase the relevance of particular documentsPermanent link for this heading

    Term;Metadata Key;Pattern;Boost

    ;extension;.*pdf;10

    Leave the "Term" column empty. The document is boosted regardless of the user’s search query. For example, any document with the extension “pdf” can be boosted up or down.

    Introduction to the Mindbreeze relevance modelPermanent link for this heading

    The Mindbreeze relevance model calculates a relevance count or rank for each result. This is also visible as metadata in Mindbreeze Export:

    This rank or relevance count is calculated using the following parameters. The higher the count, the more important the result.

    Recency

    The more recent a result is, the higher the relevance count will be.

    Term frequency

    The more often the searched term is matched in the current hit, the higher the relevance ranking will be.

    Term proximity

    If the distance between the matches in the current result is smaller than in another match, then it is more important.

    Term inverse zone frequency

    If two documents have the same number of matches but one document contains a lot more different terms than the other. The document with the smaller number of other terms then gets a higher rank.

    Common misunderstandings and misinterpretationsPermanent link for this heading

    It is important to note that boosting did not replace the relevance count, instead, it only increased it multiplicatively. If the relevance count of a document is 20 and it is boosted by a factor of 2, the relevance is then 40. This can result in the following phenomenon. You want Result 2 to be in position 1:

    Result 1: Rank = 2000

    Result 2: Rank = 20

    If you boost Result 2 by 10, it will still be in position 2 just like before boosting:

    Result 1: Rank = 2000

    Result 2: Rank = 200

    You therefore need to boost Result 2 by a factor of 101, for example, in order to put it in the first position.

    Result 2: Rank = 20020

    Result 1: Rank = 2000

    Use case: increasing relevance for matching metadata/advanced configuration with queryPermanent link for this heading

    To achieve more flexibility with boosting, you can also add an additional "Query" column. Here you can specify a query directly with the Mindbreeze InSpire Query Language, which determines the documents to be boosted.

    Note: If you use the "Query" column, the "Metadata Key" and "Pattern" columns will be ignored.

    Example of query boosting within the MMC table editor:

    Term

    Metadata Key

    Pattern

    Boost

    Query

    help

    3

    "datasource/mes:key":"http://myweb.com/help-index.html"

    When a user searches for help, documents found with the query "datasource/mes:key": "http://myweb.com/help-index.html" are boosted by a factor of 3.

    Another possible use of query boosting is, for example, when searching for people's names, to boost those documents that were written by the person you are searching for.
    For this purpose, the metadatum of the document "Author" is used. In the boosting rules, we have the variable {{query}} at our disposal, which in this case corresponds to the value of the person we are searching for.
    Thus, we can define a query for this case that finds documents with the author of the search term and boosts them.

    The variable {{query}} is used in the query column and is dynamically replaced by the search query during a search.

    Note: if you use the Query column, the Term column is also ignored.

    If the user searches for the term John Doe, all documents with the author metadata: John Doe are boosted by 7 based on the boosting below.

    Term

    Metadata Key

    Pattern

    Boost

    Query

    7

    Author:"{{query}}"

    Another useful variable that can be used is the {{lang}} variable. When the user starts a search, those documents should be boosted which correspond to the user`s web browser language.
    For this, a document metadatum is needed in which the language of the document is stored.
    To do this, the LanguageDetector plugin needs to be configured. The LanguageDetector can then recognise in which language a document is written and sets the corresponding language metadatum (Example: en or de).

    The variable {{lang}} contains the language, which is sent in they query by the web browser.

    Note: Here the Term column is ignored as well if the Query column is used.

    For example, if the user now configures a LanguageDetector plugin that creates a metadata called detectedLanguage, the boosting below it will boost all documents that match the web browser's language by 7.

    Term

    Metadata Key

    Pattern

    Boost

    Query

    detectedLanguage

    {{lang}}

    7

    The following variables are supported:

    Name

    Description

    {{query}}

    The current search query

    {{lang}}

    The language in the User-Context (e.g. en) (if available)

    {{country}}

    The country in the User-Context (e.g. US) (if available)

    {{usercontext_language}}

    The language code User-Context (e.g. en-US) (if available).

    {{session_<<key>>}}

    Any value from the Session of the User-Context. e.g. {{session_mycustomkey}} results in mycustomvalue (if present in the Session properties).

    {{identity_name}}

    Name of the user in the User-Context. (e.g. john.doe) (if available)

    {{identity_<<key>>}}

    Any values from the user's Identity. e.g. {{identity_mail}} results in john.doe@example.com (if present in the Identity properties).

    {{usercontext_<<key>>}}

    Any value from the User-Context. e.g. {{usercontext_mycustomkey}} results in mycustomvalue (if available in the User-Context properties).

    If there is no value for a variable, the associated boosting will not be applied.

    Use case: general influence of the relevance modelPermanent link for this heading

    You can generally adjust all parameters of the relevance model. This is done via the Default Relevance Options JSON file.

    It is not advisable to edit this JSON file manually. Instead, you will find the item "Relevance" under the menu item "Search Experience" in the Management Center.

    Note: These parameters are a fundamental part of the relevance model; small changes can have a major impact on the order of the search results. It is possible that the boosting factors in the CSV will have to be adjusted at a later time.

    The following sections describe which parameters can be adjusted.

    For more information, see:

    • Mindbreeze InSpire Configuration Manual, Indices tab
    • Manual api.v2.search Interface Description

    Relevance factors (term frequency, document frequency)Permanent link for this heading

    • The individual entries can be used to determine how the relevance parameters influence the relevance ranking. The relative share of the individual factors is the percentage share of this parameter.

    Serial

    The influence of recency (document date mes:date) on the relevance. Documents from the last two years (25 months) are considered “recent”.  Anything older than two years is generally treated as not recent.

    Term frequency

    Absolute frequency of words

    Doc frequency

    Relative frequency of words in the document – TF-IDF

    Term proximity

    Distance between the hit terms in the text

    Term inverse zone frequency

    Maximum relative frequency of words in individual zones – max TF-IZF

    Zone boost exponent

    Influence of document property boosting on relevance ranking (0 means it will be ignored)

    Term boost exponent

    Influence of search term boosting on relevance ranking (0 means it will be ignored)

    Doc boost exponent

    Influence of mes:boost property on relevance ranking (0 means it will be ignored)

    Term match exponent

    Influence of the matching of terms (interesting for the OR function) mes:boost property on relevance ranking (0 means it will be ignored)

    Constant

    Particularly if Term boosting/Document boosting/Zone boosting is used exclusively and you do not want to use the remaining components (e.g. Term proximity, Serial).

    Term boost IDF exponent

    IDF = Inverse document frequency. The frequency of the occurrence of a term in many documents should have an effect on the calculation of the term boost. A high exponent means: less frequent words are weighted more strongly. A low exponent means: frequent words are weighted more weakly. 0 means that this option will be ignored.

    Zone boosting (metadata boosting)Permanent link for this heading

    Zone boosting is another way to change the order of the search results. Boost factors can be configured for so-called zones.  A zone is nothing more than a piece of document metadata. If you want documents that are found based on a certain metadata to be ranked higher in the search results, you can define a boost factor for this metadata (= zone). In the above example, documents found on the basis of the metadata “Author” are classified as more relevant by a factor of 1.05.  Valid values of the boost factor are real numbers greater than or equal to one with a decimal separator “.” (≥ 1.0).

    Document boosting (alternative to Term to Document Boost CSV)Permanent link for this heading

    Using “Document boosting,” you can also change the relevance of certain documents. The relevance of documents that are found based on a search query can be changed by the “Boost factor” for all documents that match the “Query Expr”. In the above example, documents found that originate from the author “Legend User” are rated more relevant by a factor of 1.1.

    Valid values of the Boost factor are:

    • To decrease weighting: real numbers greater than zero and less than one (> 0.0 ∧ < 1.0) with decimal separator “.”
    • To increase weighting: real numbers greater than one (> 1) with decimal separator “.”
    • The Boost factor 1 has no impact
    Term boosting (term and Ngram boosts)Permanent link for this heading

    Term boost factor

    Boost factor for exact matches (1.0)

    Ngram boost factor

    Boost factor for partial word matches (1.0). This option is only relevant if the following settings are enabled in the Management Center under “Configuration” -> “Client Services” -> “Enable Character NGRAMs” (“Advanced Settings” must be enabled). This option is already enabled by default.

    Congruence boost factor

    Boost factor for character congruence (e.g. “a” vs. “ä”). This option is only relevant if the following settings are enabled in the Management Center under “Configuration” -> “Client Services” -> “Query Expansion for Diacritic Term Variants” (“Advanced Settings” must be enabled). This option is already enabled by default.

    Distance boost reduction

    Boost decrease for each change = Edit distance (e.g. “Mindbreze” vs. “Mindbreeze”). This option is only relevant if the following settings are enabled in the Management Center under “Configuration” -> “Client Services” -> “Enable Query Expansion for Similar Term” (“Advanced Settings” must be enabled). However, this option is enabled by default.

    Disable Terms Position Boost Reduction

    If set to "true", "Terms Position Boost Step Size Reduction" and "Terms Position Boost Maximum Reduction" are deactivated.

    Terms Position Boost Maximum Reduction

    Maximum value by which the boosting of a term can be reduced.
    Values: 0.0 – 1.0 (Default 0.2)

    Example:
    See the following example of the option „Terms Position Boost Step Size Reduction“.

    Note: "Terms Position Boost Maximum Reduction" works only if Optional Terms is enabled in Client Services (enabled by default).

    Terms Position Boost Step Size Reduction

    Step size by which each following value is reduced.
    Values: 0.0 – 1.0 (Default 0.05)

    Example with 0.1 and "Terms Position Boost Maximum Reduction "=0.2 and search input of "My name is John" results in the following term boosting:
    My = 1.0
    name = 0.9
    is = 0.8
    John = 0.8

    Note: “Terms Position Boost Step Size Reduction” works only if Optional Terms is enabled in Client Services (enabled by default).

    Other Relevance OptionsPermanent link for this heading

    Use Additive Document Boosting (Recommended)

    Defines the boosting strategy for multiple boostings of one document. By default, Additive Document Boosting is enabled, which considers all boostings on a document for calculating relevance. If the setting is disabled, only the highest boosting is used to calculate relevance.

    Use Additive Zone Boosting (Experimental)

    Defines the boosting strategy for multiple zone boostings. By default, Additive Zone Boosting is disabled, and only the highest matching zone boosting is considered. If Additive Zone Boosting is enabled, all matching zone boostings are considered for calculating relevance.

    Export DumpPermanent link for this heading

    The current dump can be saved and downloaded as Excel file by clicking the button "Download Dump".

    MetadataQueryTransformer PluginPermanent link for this heading

    (formerly "MetadataTransformer" plugin) This plugin manipulates search queries for metadata searches. It is used for users who search with colon notation (e.g. name:John), but do not mean the metadata "name". The plugin is configured with a CSV file consisting of rules.

    installationPermanent link for this heading

    • Install the plugin with the Manager UI
    • Activate the plugin for each desired index using the Manager UI:
      • Switch to the "Indices" tab and activate "Advanced Settings".
      • Scroll down to the "Query Transformation Services" section.
      • Select the "MetadataQueryTransformer " plugin and click "Add".

    configurationPermanent link for this heading

    The following parameters can be configured:

    "Path to Label transformation CSV"

    Path to CSV file (see next section)

    "Asterisk Expansion Vocabulary File"

    Path to vocabulary file (see next section)

    "Asterisk Expansion Max Results"

    Maximum number of words that the asterisk symbol expands.

    Label Transformation CSV SyntaxPermanent link for this heading

    This file contains the transformation rules. One rule per line. 2 or more columns without column caption. Meaning of the columns:

    Label

    Name of the label in the search query to which this rule applies. The asterisk symbol (*) can also be used for any name.

    Rule type

    "PHRASE," "NEAR," "IGNORE," "REGEX_PATTERN" or "ASTERISK_PATTERN"

    Options

    Depending on the rule type

    Basically, a search is performed directly in the metadata and an alternative search condition is added.

    Note: For the REGEX_PATTERN or ASTERISK_PATTERN types, the property searched should be regexmatchable or aggregatable. This can be defined in the Category Descriptor or in the Index Configuration.

    "PHRASE"

    Creates a phrase search (normal search).

    e.g.: rule name;PHRASE , search for "name:John" finds documents with "name John" in the content

    "NEAR"

    Creates a near search, the distance can be defined via an option.

    e.g. rule temperature;NEAR;3 , search for "temperature:20" finds documents with "the temperature is about 20 degrees" in content

    "IGNORE"

    Creates a Neutral Search that does not return any results itself.

    e.g. rule operation:IGNORE

    This rule allows a selective exception of transformations, if a standard transformation was previously introduced by means of *.

    "ASTERISK_PATTERN"

    Transforms a metadata search into an asterisk pattern search, synonyms can be defined via options.

    e.g. rule number;ASTERISK_PATTERN;id;nb , search for "number:A42*" finds documents whose property "id" or "nb" begins with A42.

    "REGEX_PATTERN"

    Creates a Regex pattern search, synonyms can be defined via options

    e.g. rule number;REGEX_PATTERN;id;nb , search for "number:A.*" finds documents whose property "id" or "nb" matches the regular expression A.*.

    Vocabulary File SyntaxPermanent link for this heading

    Regardless of label transformation, the plugin also provides the ability to transform normal search terms containing asterisk symbols (*). These search terms are replaced by similar terms from a defined vocabulary.

    The "Vocabulary File" is a text file with terms, one term per line.

    For example, a vocabulary file with the following content:

    superprint

    printomatic

    fastprint

    a search for "*print" searches for the following terms: "superprint" and "fastprint".

    DotExtensionToLabeledTransformer PluginPermanent link for this heading

    This plugin makes it easier to search for a file extension. Search queries in the form “.pdf:searchterm” will be converted to the form “extension:pdf searchterm”.

    Example: A search for the term "Invoice" and the file extension "pdf" normally looks like this:

    "extension:pdf Invoice"

    With this plugin the search can be simplified on

    ".pdf:Invoice"

    InstallationPermanent link for this heading

    • Install the plugin with the manager UI
    • Activate the plugin for each desired index using the Manager UI:
      • Switch to the "Indices" tab and activate "Advanced Settings".
      • Scroll down to the "Query Transformation Services" section.
      • Select the " DotExtensionToLabeledTransformer" plugin and click "Add".
    • Finally, save the changes and restart the Mindbreeze Node for the changes to take effect.

    ConfigurationPermanent link for this heading

    This plugin does not require any configuration.

    QueryExprLabelTranslation PluginPermanent link for this heading

    The plugin allows you to search for metadata in the original language. For example, the metadata with the ID "title" in German is translated as "Name". If you want to search for documents with the name "Rechnung", you have to enter the following search query without this plugin – "title:Rechnung" – to get the desired results. With the QueryExprLabelTranslation Plugin it is now possible to make a search query in the original language: "Name:Rechnung". The used label "Name" is translated back to "title" by the plugin and the search query returns the desired results.

    InstallationPermanent link for this heading

    The QueryExprLabelTranslation plugin is already built-in and requires no installation.

    ConfigurationPermanent link for this heading

    The QueryExprLabelTranslation plugin is active for each index by default and requires no configuration. The translations are loaded from the CategoryDescriptor by the metadatum tags.

    Additional FeaturesPermanent link for this heading

    Did you mean?Permanent link for this heading

    If you don’t find any results and only misspelled the word in the search term Mindbreeze offers an alternative search term (based on some internal index statistics and analysis) that would find better results. This feature is called “Did you mean?”.

    Entity RecognitionPermanent link for this heading

    Entity recognition can be used to extract metadata from the document content or from other metadata properties of the documents which may be used for more efficient searches afterwards.

    This topic is described in detail in “Documentation – Mindbreeze Inspire”. For details please read the documentation on “Indices tab”.

    CSV TransformationPermanent link for this heading

    To extend indexed documents with additional metadata for easier finding results the CSV transformation allows the mapping of well-defined values to other value columns stored in a CSV file.

    This feature can be quite helpful to extend your index with technical terms, abbreviations, topics or even short descriptions for your documents in special use cases.

    Example: a city ZIP code directory

    ZIP;City;Province

    4020;Linz;Central Upper Austria
    1020;Vienna;Capital City of Austria
    9861;Krems;Forest Quarter
    4400;Steyr;Traun Quarter

    The first line of this sample CSV contains the head line defining the column names to map the data. The other lines contain the values for each mapping column. So if you are searching for the term “quarter” you will find search results for the two cities Steyr and Krems.

    Another example would be the mapping of technical product data stored in a CSV file to the base articles on your web site. The mapping could be accomplished using the product ID extracted from the product web site and the CSV file contains a set of columns describing the article (product ID, category, price, dimensions, etc.).

    ConfigurationPermanent link for this heading

    As this feature is part of the Mindbreeze base product you don’t have to install any additional plugins but you only have to configure it.

    • Switch to “Indices”-tab, activate “Advanced Settings”
    • Scroll down to the section “CSV Transformation”
    • Specify the path to the CSV file containing the data mappings (either as local file system path or as network path appropriate for the used operating system)
    • Example 1:  CSV File PathC:\data\csv-mappings.csv
    • Example 2:  CSV File Path\\fileserver.x.y\config\csv-mappings.csv

    For every metadata property (column) you want to extract from the CSV file add a new metadata definition with following property settings:

    • If Expression Matches:{{ZIP}}… this is the name of the mapping column in the CSV file (header name of the column containing the keys to map the documents)
    • In Property:customer_zipcode … this is the source document metadata property from the indexed document used to map the results (this could also be mes:key or any other property)
    • Name:City… this is the desired metadata name of the new property to extract (will be available for searching and if listed in the categoryDescriptor also visible in the results)
    • Value:{{City}}… this is the name of the desired target column in the CSV file (header name of column to be extracted)

    Filtering in CSV EditorsPermanent link for this heading

    Saving a CSV Editor table data while filtering will save the whole table data, including non-filtered records.

    You can filter by typing a case insensitive search term which will search in all the columns. Furthermore, you can narrow the search scope by searching in a specific column. This can be achieved by the following search format: Columname:searchterm. Both the column name and search term are case insensitive.

    Example 1: No filtering of the data

    Example 2: All records with Type "Snacks" are displayed

    Download PDF

    • Mindbreeze Query Expression Transformation

    Content

    • Mindbreeze Query Transformation
    • Query Transformation Plugins
    • Additional Features

    Download PDF

    • Mindbreeze Query Expression Transformation