Query Expression Transformation

Mindbreeze Query Transformer Plugins

Copyright ©

Mindbreeze GmbH, A-4020 Linz, .

All rights reserved. All hardware and software names used are registered trade names and/or registered trademarks of the respective manufacturers.

These documents are highly confidential. No rights to our software or our professional services, or results of our professional services, or other protected rights can be based on the handing over and presentation of these documents.

Distribution, publication or duplication is not permitted.

The term ‘user‘ is used in a gender-neutral sense throughout the document.

Mindbreeze Query TransformationPermanent link for this heading

Mindbreeze provides a list of query transformation services for automatic modification of search queries for better search results.

On the one hand there are the plugin-based extension points that can be loaded on demand into a Mindbreeze installation:

  • Synonym Transformer
  • Replacement Transformer
  • On the other hand there are integrated product features for easier finding the desired results (e.g. by enrichment of indexed documents with additional metadata):
  • “Did you mean?”
  • Entity Recognition
  • CSV Transformation

Query Transformation PluginsPermanent link for this heading

In order to use any of the query transformation services each of them has to be installed into your Mindbreeze installation by means of loading the corresponding plugin (they are delivered within the “Mindbreeze Query Transformation Plugins.zip” package).

The plugin also needs to be included in your Mindbreeze license.

Synonym Transformer PluginPermanent link for this heading

The SynonymTransformer-Plugin allows you to find search results by looking for different synonyms of a word. Therefore, the query is transformed to search for every term listed in the synonyms list.

Usage: The synonyms can be defined in a CSV-file writing a bulk of synonym values on every line separated with a semi-colon (;).

Example of a small synonym.csv file:

car;vehicle;automobile

plane;airplane;aeroplane

Example 1: a search for car sends the transformed query: car OR vehicle OR automobile

Example 2: a search for plane sends the transformed query: plane OR airplane OR aeroplane

Note: The term in first column is used to match on your query. Only single words without spaces are supported in the first column to be matched on.

InstallationPermanent link for this heading

  • Install the plugin (either with the Manager UI or with the commandline tool mesextension)

mesextension --interface=plugin --type=archive --file=SynonymTransformer-<version>.zip install

  • Activate the plugin for every Index you want (with the Manager UI)
    • Switch to “Indices”-tab, activate “Advanced Settings
    • Scroll down to the section “Query Transformation Services
    • Select the “SynonymTransformer”-plugin and click “Add
  • Add the path to the CSV-file containing the synonym definitions as “Custom Plugin Properties
    • Add a new property with the name “SYNONYM_CSV_FILE_PATH
    • And assign a value with the path to the CSV-file (either as local file system path or as network path appropriate for the used operating system)

Example 1:  SYNONYM_CSV_FILE_PATHC:\data\synonyms.csv

Example 2:  SYNONYM_CSV_FILE_PATH\\fileserver.mydomain.com\mes-config\synonyms.csv

Finally save the configuration changes and restart the Mindbreeze Node to propagate all changes.

Note: Any change to the synonym CSV file is applied immediately and will be regarded on the next search.

Replacement Transformer PluginPermanent link for this heading

The ReplacementTransformer-Plugin is often used to replace unreasonable search terms with better ones or even to disallow search terms.

The main difference to the Synonym transformer plugin is that the original query is really replaced with a new one and will not be shown in the reporting of search terms. The Replacement transformer can therefore be used to hide search results found by users and replace them by something else (e.g. to hide a legacy page and show the new version).

Usage: The replacement terms can be defined in a CSV-file where the first column defines the search term to be replaced and the following columns are taken as disjunctive (OR-combined) replacement value (if empty the term will not be searched for).
Every new search term that should be replaced has to be written on a new line and the columns have to be separated with a semi-colon (;).

Example of a small replacement.csv file:

car;mercedes;bmw;audi

party

Example 1: a search for car sends the transformed query: mercedes OR bmw OR audi

Example 2: a search for party will not find any results as it is replaced by an “empty” search

InstallationPermanent link for this heading

  • Install the plugin (either with the Manager UI or with the commandline tool mesextension)

mesextension --interface=plugin --type=archive --file=ReplacementTransformer-<version>.zip install

  • Activate the plugin for every Index you want (with the Manager UI)
    • Switch to “Indices”-tab, activate “Advanced Settings
    • Scroll down to the section “Query Transformation Services
    • Select the “ReplacementTransformer”-plugin and click “Add
  • Add the path to the CSV-file containing the replacement definitions as “Custom Plugin Properties”
    • Add a new property with the name “REPLACEMENT_CSV_FILE_PATH
    • And assign a value with the path to the CSV-file (either as local file system path or as network path appropriate for the used operating system)

Example 1:  REPLACEMENT_CSV_FILE_PATHC:\data\replacements.csv

Example 2:  REPLACEMENT_CSV_FILE_PATH\\fileserver.x.y\config\replacements.csv

Finally save the configuration changes and restart the Mindbreeze Node to propagate all changes.

Note: Any change to the replacement CSV file is applied immediately and will be regarded on the next search.

General Notes on Transformer Plugins (Replacement/Synonym)Permanent link for this heading

Note: If you are using both plugins (Synonym-Transformer and Replacement-Transformer) the Replacement-Transformer is applied first!

The following screenshot displays the configuration of both plugins within the Mindbreeze Manager Interface.

Note: Any change to the synonym CSV file is applied immediately and will be regarded on the next search.

Stemmer Transformer PluginPermanent link for this heading

The StemmerTransformer-Plugin allows you to find search results by looking for different word stems of a word based on the lingual characteristics of the defined language.

Usage: The basic algorithm for finding appropriate word stems is implemented in the plugin as it is delivered. An additional dictionary of vocabularies for a specific language is also available for the most common languages and is used to improve the search results.

The Stemmer Transformer also supports transliterations. Characters are altered according to the transliteration rules. The altered and the original term are used in the search.

Example:

Example 1: a search for leaf will find matches like “leaf”, “leaves”, “leaving”

Example 2: a search for summary will find matches like “summary”, “summaries”, “summarise”

Installation/ConfigurationPermanent link for this heading

  • Install the plugin (either with the Manager UI or with the commandline tool mesextension)

mesextension --interface=plugin --type=archive --file=StemmerTransformer-<version>.zip install

  • Activate the plugin for every Index you want (with the Manager UI)
    • Switch to “Indices”-tab, activate “Advanced Settings
    • Scroll down to the section “Query Transformation Services
    • Select the “StemmerTransformer”-plugin and click “Add
  • Configure the properties according to your use-case

Languages: the languages of the Stemmer

Path to vocabulary: a local path on the appliance containing a vocabulary. This is required to support not only the extension of the stems but also the expansion.

Stemmer enabled: toggles the state of the stemmer

Auto detect language from query: the stemmer tries to detect the language from the query terms.

Transliterate all variants: with this option the stemmer expand the query with all matching transliterations.

TransliterationRule: rules for altering character sequences in terms. The following rules are supported: http://icu-project.org/apiref/icu4j/com/ibm/icu/text/RuleBasedTransliterator.html

Finally save the configuration changes and restart the Mindbreeze Node to propagate all changes.

Term2DocumentBoost transformer pluginPermanent link for this heading

The Term2DocumentBoost plugin allows you to perform relevance tuning on search queries. You can use it for the following application use cases:

  1. To increase the relevance of certain documents for certain search queries. As an example, a search for "help" can be tailored so that documents with the keyword "documentation" will have higher relevance in this search.
  2. To generally increase the relevance of certain documents. For example, all documents containing the keyword "Mindbreeze" can get higher relevance.
  3. To increase the relevance of matching metadata. For example, if a given person is searched (search term: "John Doe"), documents by this person (metadatum: "author") can obtain higher relevance.
  4. To generally influence the entire relevance model. For instance, you can change the relevancy factor "Term Frequency" to change the priority of the frequency of the search hits in the document.

InstallationPermanent link for this heading

  • Install the plugin (either using the UI Manager or with the command line tool
    mesextension)

mesextension --interface=plugin --type=archive --file=Term2DocumentBoost-<version>.zip install

  • Activate the plugin for each desired index using the UI Manager:
    • Navigate to the "Indices" tab and activate "Advanced Settings"
    • Scroll down to the “Query Transformation Services” section
    • Select the “Term2DocumentBoost ” plugin and click “Add“
  • The plugin is configured via two files. The
    • “Term to Document Boost CSV File” is required for application use cases 1, 2, and 3 above.
    • “Default Relevance Options JSON File“ is required for application use case 4.
  • Configure these settings
  • “Term to Document Boost CSV File Path”
  • Path for CSV file
  • “Default Relevance Options JSON File Path”
  • Path for  JSON file

Finally, save the changes and restart the Mindbreeze Node so that the changes take effect.

ConfigurationPermanent link for this heading

General description of the Term to Document Boost CSV file formatPermanent link for this heading

The CSV file contains one line for each boosting, which in turn contains the following columns:

  • Term: the search term
  • Metadata key: the name of the metadata property to which the boosting is to be applied
  • Pattern: a pattern that determines the value to be boosted
  • Boost: the boost factor
  • Query: Optional; advanced configuration. See section “Configuration via Query”

As a property, only DocumentInfo metadata (that is, data that are either aggregatable or regex-matchable) can be used here. A list of these properties is available in the designer under "Filters".

If multiple rules match at the same time, the rule with the largest boost factor is used. However, this behavior could change in future versions.

Note: Any change in the CSV file is applied immediately and will be considered in the next search.

You can easily edit the CSV file in the Management Center under the "Search Experience" menu item "Query Boostings".

Using Variables within Pattern and QueryPermanent link for this heading

Standardmäßig stehen folgende Variablen zur Verfügung

  • querySearch term(s)
  • langClient language without country (e.g. de,en, …)  
  • countryClient country if available (e.g. AT,DE,US, …)
  • session_*Session properties
  • identity_*Identity properties

One can use these variables by referingn tot he variables via {{<name>}} within the “Pattern” and “Query” column.

In the example below one can use {{lang}} to refer to the language of the client in order to boost documents that are written in that language by a factor, regardless of the query term. The property mes:lang is set if for instance the Language Detection Plugin is used.

Term;Metadata Key;Pattern;Boost

;mes:lang;{{lang}};1.5

Application use case: increase the relevance of certain documents for certain search queries, example CSV file:Permanent link for this heading

Term;Metadata Key;Pattern;Boost

help;title;portal help|intranet help;5

When a user searches for help, documents that contain the terms portalhelp or intranethelp in the title are boosted by a factor of 5.

Application use case: increase the relevance of certain documents overallPermanent link for this heading

Term;Metadata Key;Pattern;Boost

;extension;.*pdf;10

Leave the "Term" column blank. The document is boosted without consideration for the user's search query. For example, each document can be boosted up or down with the extension "pdf".

Application use case: increase the relevance for matching metadata / extended configuration with QueryPermanent link for this heading

Alternatively, in order to have more flexibility in the boosting, you can add another "Query" column. Here, with the “Mindbreeze InSpire Query Language”, you can directly specify a query that determines the documents to be boosted.

Note: If you use the "Query" column, the "Metadata Key" and "Pattern" columns are ignored.

Example of a CSV file:

Term;Metadata Key;Pattern;Boost;Query

help;;;3;"datasource/mes:key:""http://myweb.com/help-index.html"""

When a user searches for help, documents that are found with the query datasource/mes:key:"http://myweb.com/help-index.html" are boosted with a factor of 3. Please note the correct treatment of special characters.

You can also use the placeholder {{query}} in the query. This placeholder is dynamically replaced by the search query when you search.

Note: if you use {{query}}, the Term column is also ignored.

Term;Metadata Key;Pattern;Boost;Query

;;;7;"Author:""{{query}}"""

If the term you are looking for is the exact name of an author, these documents are boosted by a factor of 7. For example, if a user searches for the term John Doe, then documents that are found with the query Author: "John Doe" are boosted by a factor of 7.

Application use case: generally influencing the relevance modelPermanent link for this heading

You can perform an overall adjustment of all the parameters of the relevance model. This is done using the Default Relevance Options JSON file. You can customize the following types of parameters:

  • Relevance factors (term frequency, document frequency)
  • Zone boosting (metadata boosting)
  • Document boosting (alternative to Term to Document Boost CSV)
  • Term boosting (term and n-gram boosts)

It is not recommended to edit this JSON file manually. In the Management Center, you can find the point "Relevance" under the menu item "Search Experience".

Note: These parameters are a fundamental component of the relevance model; minor changes can have a great impact on the order of search results. It is possible that the boosting factors in the CSV will have to be adjusted later.

For more information, see:

  • Configuration Mindbreeze InSpire manual, Indices tab
  • api.v2.search Interface Description manual

Additional FeaturesPermanent link for this heading

Did you mean?Permanent link for this heading

If you don’t find any results and only misspelled the word in the search term Mindbreeze offers an alternative search term (based on some internal index statistics and analysis) that would find better results. This feature is called “Did you mean?”.

Entity RecognitionPermanent link for this heading

Entity recognition can be used to extract metadata from the document content or from other metadata properties of the documents which may be used for more efficient searches afterwards.

This topic is described in detail in “Documentation – Mindbreeze Inspire”. For details please read the documentation on “Indices tab”.

CSV TransformationPermanent link for this heading

To extend indexed documents with additional metadata for easier finding results the CSV transformation allows the mapping of well-defined values to other value columns stored in a CSV file.

This feature can be quite helpful to extend your index with technical terms, abbreviations, topics or even short descriptions for your documents in special use cases.

Example: a city ZIP code directory

ZIP;City;Province

4020;Linz;Central Upper Austria
1020;Vienna;Capital City of Austria
9861;Krems;Forest Quarter
4400;Steyr;Traun Quarter

The first line of this sample CSV contains the head line defining the column names to map the data. The other lines contain the values for each mapping column. So if you are searching for the term “quarter” you will find search results for the two cities Steyr and Krems.

Another example would be the mapping of technical product data stored in a CSV file to the base articles on your web site. The mapping could be accomplished using the product ID extracted from the product web site and the CSV file contains a set of columns describing the article (product ID, category, price, dimensions, etc.).

ConfigurationPermanent link for this heading

As this feature is part of the Mindbreeze base product you don’t have to install any additional plugins but you only have to configure it.

  • Switch to “Indices”-tab, activate “Advanced Settings
  • Scroll down to the section “CSV Transformation
  • Specify the path to the CSV file containing the data mappings (either as local file system path or as network path appropriate for the used operating system)
  • Example 1:  CSV File PathC:\data\csv-mappings.csv
  • Example 2:  CSV File Path\\fileserver.x.y\config\csv-mappings.csv

For every metadata property (column) you want to extract from the CSV file add a new metadata definition with following property settings:

  • If Expression Matches:{{ZIP}}… this is the name of the mapping column in the CSV file (header name of the column containing the keys to map the documents)
  • In Property:customer_zipcode … this is the source document metadata property from the indexed document used to map the results (this could also be mes:key or any other property)
  • Name:City… this is the desired metadata name of the new property to extract (will be available for searching and if listed in the categoryDescriptor also visible in the results)
  • Value:{{City}}… this is the name of the desired target column in the CSV file (header name of column to be extracted)