Query Expression Transformation

Mindbreeze Query Transformer Plugins

Copyright ©

Mindbreeze GmbH, A-4020 Linz, .

All rights reserved. All hardware and software names used are registered trade names and/or registered trademarks of the respective manufacturers.

These documents are highly confidential. No rights to our software or our professional services, or results of our professional services, or other protected rights can be based on the handing over and presentation of these documents.

Distribution, publication or duplication is not permitted.

The term ‘user‘ is used in a gender-neutral sense throughout the document.

Mindbreeze Query TransformationPermanent link for this heading

Mindbreeze provides a list of query transformation services for automatic modification of search queries for better search results.

On the one hand there are the plugin-based extension points that can be loaded on demand into a Mindbreeze installation:

  • Synonym Transformer
  • Replacement Transformer
  • On the other hand there are integrated product features for easier finding the desired results (e.g. by enrichment of indexed documents with additional metadata):
  • “Did you mean?”
  • Entity Recognition
  • CSV Transformation

Query Transformation PluginsPermanent link for this heading

In order to use any of the query transformation services each of them has to be installed into your Mindbreeze installation by means of loading the corresponding plugin (they are delivered within the “Mindbreeze Query Transformation Plugins.zip” package).

The plugin also needs to be included in your Mindbreeze license.

Synonym Transformer PluginPermanent link for this heading

The SynonymTransformer-Plugin allows you to find search results by looking for different synonyms of a word. Therefore, the query is transformed to search for every term listed in the synonyms list.

Usage: The synonyms can be defined in a CSV-file writing a bulk of synonym values on every line separated with a semi-colon (;).

Example of a small synonym.csv file:

car;vehicle;automobile

plane;airplane;aeroplane

Example 1: a search for car sends the transformed query: car OR vehicle OR automobile

Example 2: a search for plane sends the transformed query: plane OR airplane OR aeroplane

Note: The term in first column is used to match on your query. Only single words without spaces are supported in the first column to be matched on.

InstallationPermanent link for this heading

  • Install the plugin (either with the Manager UI or with the commandline tool mesextension)

mesextension --interface=plugin --type=archive --file=SynonymTransformer-<version>.zip install

  • Activate the plugin for every Index you want (with the Manager UI)
    • Switch to “Indices”-tab, activate “Advanced Settings
    • Scroll down to the section “Query Transformation Services
    • Select the “SynonymTransformer”-plugin and click “Add
  • Add the path to the CSV-file containing the synonym definitions as “Custom Plugin Properties
    • Add a new property with the name “SYNONYM_CSV_FILE_PATH
    • And assign a value with the path to the CSV-file (either as local file system path or as network path appropriate for the used operating system)

Example 1:  SYNONYM_CSV_FILE_PATHC:\data\synonyms.csv

Example 2:  SYNONYM_CSV_FILE_PATH\\fileserver.mydomain.com\mes-config\synonyms.csv

Finally save the configuration changes and restart the Mindbreeze Node to propagate all changes.

Note: Any change to the synonym CSV file is applied immediately and will be regarded on the next search.

Replacement Transformer PluginPermanent link for this heading

The ReplacementTransformer-Plugin is often used to replace unreasonable search terms with better ones or even to disallow search terms.

The main difference to the Synonym transformer plugin is that the original query is really replaced with a new one and will not be shown in the reporting of search terms. The Replacement transformer can therefore be used to hide search results found by users and replace them by something else (e.g. to hide a legacy page and show the new version).

Usage: The replacement terms can be defined in a CSV-file where the first column defines the search term to be replaced and the following columns are taken as disjunctive (OR-combined) replacement value (if empty the term will not be searched for).
Every new search term that should be replaced has to be written on a new line and the columns have to be separated with a semi-colon (;).

Example of a small replacement.csv file:

car;mercedes;bmw;audi

party

Example 1: a search for car sends the transformed query: mercedes OR bmw OR audi

Example 2: a search for party will not find any results as it is replaced by an “empty” search

InstallationPermanent link for this heading

  • Install the plugin (either with the Manager UI or with the commandline tool mesextension)

mesextension --interface=plugin --type=archive --file=ReplacementTransformer-<version>.zip install

  • Activate the plugin for every Index you want (with the Manager UI)
    • Switch to “Indices”-tab, activate “Advanced Settings
    • Scroll down to the section “Query Transformation Services
    • Select the “ReplacementTransformer”-plugin and click “Add
  • Add the path to the CSV-file containing the replacement definitions as “Custom Plugin Properties”
    • Add a new property with the name “REPLACEMENT_CSV_FILE_PATH
    • And assign a value with the path to the CSV-file (either as local file system path or as network path appropriate for the used operating system)

Example 1:  REPLACEMENT_CSV_FILE_PATHC:\data\replacements.csv

Example 2:  REPLACEMENT_CSV_FILE_PATH\\fileserver.x.y\config\replacements.csv

Finally save the configuration changes and restart the Mindbreeze Node to propagate all changes.

Note: Any change to the replacement CSV file is applied immediately and will be regarded on the next search.

General Notes on Transformer Plugins (Replacement/Synonym)Permanent link for this heading

Note: If you are using both plugins (Synonym-Transformer and Replacement-Transformer) the Replacement-Transformer is applied first!

The following screenshot displays the configuration of both plugins within the Mindbreeze Manager Interface.

Note: Any change to the synonym CSV file is applied immediately and will be regarded on the next search.

Stemmer Transformer PluginPermanent link for this heading

The StemmerTransformer-Plugin allows you to find search results by looking for different word stems of a word based on the lingual characteristics of the defined language.

Usage: The basic algorithm for finding appropriate word stems is implemented in the plugin as it is delivered. An additional dictionary of vocabularies for a specific language is also available for the most common languages and is used to improve the search results.

The Stemmer Transformer also supports transliterations. Characters are altered according to the transliteration rules. The altered and the original term are used in the search.

Example:

Example 1: a search for leaf will find matches like “leaf”, “leaves”, “leaving”

Example 2: a search for summary will find matches like “summary”, “summaries”, “summarise”

Installation/ConfigurationPermanent link for this heading

  • Install the plugin (either with the Manager UI or with the commandline tool mesextension)

mesextension --interface=plugin --type=archive --file=StemmerTransformer-<version>.zip install

  • Activate the plugin for every Index you want (with the Manager UI)
    • Switch to “Indices”-tab, activate “Advanced Settings
    • Scroll down to the section “Query Transformation Services
    • Select the “StemmerTransformer”-plugin and click “Add
  • Configure the properties according to your use-case

Languages: the languages of the Stemmer

Path to vocabulary: a local path on the appliance containing a vocabulary. This is required to support not only the extension of the stems but also the expansion.

Stemmer enabled: toggles the state of the stemmer

Auto detect language from query: the stemmer tries to detect the language from the query terms.

Transliterate all variants: with this option the stemmer expand the query with all matching transliterations.

TransliterationRule: rules for altering character sequences in terms. The following rules are supported: http://icu-project.org/apiref/icu4j/com/ibm/icu/text/RuleBasedTransliterator.html

Finally save the configuration changes and restart the Mindbreeze Node to propagate all changes.

Term2DocumentBoost Transformer PluginPermanent link for this heading

The Term2DocumentBoost-Plugin enables relevance tuning based on user queries. E.g. a search for help can be tailored to boost documents matching given keywords up or down.

InstallationPermanent link for this heading

  • Install the plugin (either with the Manager UI or with the commandline tool mesextension)

mesextension --interface=plugin --type=archive --file=Term2DocumentBoost-<version>.zip install

  • Activate the plugin for every Index you want (with the Manager UI)
    • Switch to “Indices”-tab, activate “Advanced Settings
    • Scroll down to the section “Query Transformation Services
    • Select the “Term2DocumentBoost”-plugin and click “Add
  • Add the path to the CSV-file containing the boostings via “Custom Plugin Properties
    • Add a new property with the name “CSV_FILE_PATH
    • And assign a value with the path to the CSV-file (either as local file system path or as network path appropriate for the used operating system)

Example 1 (Windows):  CSV_FILE_PATHC:\data\term2documentboost.csv

Example 2 (Windows):  CSV_FILE_PATH\\fileserver.mydomain.com\mes-config\term2documentboost.csv

Example 3 (Linux):  CSV_FILE_PATH/data/term2documentboost.csv

Save the configuration changes and restart the Mindbreeze Node to propagate all changes.

CSV-File FormatPermanent link for this heading

The CSV file contains a line for each boosting, containing

  • the query term to match,
  • the name of a metadata to apply the boosting to,
  • a pattern specifying the values to boost, and
  • the boost factor

Only DocumentInfo metatdata (metadata that are aggregatable or regexmatchable) can be used here. A list of these metadata can be found in the Designer in the Filter list.

Example CSV FILE:

Term;Metadata Key;Pattern;Boost

help;title;portal help|intranet help;5

If a user searches for help documents with titles containing portal help or intranet help will get boosted with a factor of 5.

If the Term is empty, the document is boosted regardless of the user query. E.g. any query within the data source Web can be boosted up or down.

If multiple Rules match, the rule with the maximum boost factor is used. This may change in future versions.

Note: Any change to the CSV file is applied immediately and will be regarded on the next search.

Additional FeaturesPermanent link for this heading

Did you mean?Permanent link for this heading

If you don’t find any results and only misspelled the word in the search term Mindbreeze offers an alternative search term (based on some internal index statistics and analysis) that would find better results. This feature is called “Did you mean?”.

Entity RecognitionPermanent link for this heading

Entity recognition can be used to extract metadata from the document content or from other metadata properties of the documents which may be used for more efficient searches afterwards.

This topic is described in detail in “Documentation – Mindbreeze Inspire”. For details please read the documentation on “Indices tab”.

CSV TransformationPermanent link for this heading

To extend indexed documents with additional metadata for easier finding results the CSV transformation allows the mapping of well-defined values to other value columns stored in a CSV file.

This feature can be quite helpful to extend your index with technical terms, abbreviations, topics or even short descriptions for your documents in special use cases.

Example: a city ZIP code directory

ZIP;City;Province

4020;Linz;Central Upper Austria
1020;Vienna;Capital City of Austria
9861;Krems;Forest Quarter
4400;Steyr;Traun Quarter

The first line of this sample CSV contains the head line defining the column names to map the data. The other lines contain the values for each mapping column. So if you are searching for the term “quarter” you will find search results for the two cities Steyr and Krems.

Another example would be the mapping of technical product data stored in a CSV file to the base articles on your web site. The mapping could be accomplished using the product ID extracted from the product web site and the CSV file contains a set of columns describing the article (product ID, category, price, dimensions, etc.).

ConfigurationPermanent link for this heading

As this feature is part of the Mindbreeze base product you don’t have to install any additional plugins but you only have to configure it.

  • Switch to “Indices”-tab, activate “Advanced Settings
  • Scroll down to the section “CSV Transformation
  • Specify the path to the CSV file containing the data mappings (either as local file system path or as network path appropriate for the used operating system)
  • Example 1:  CSV File PathC:\data\csv-mappings.csv
  • Example 2:  CSV File Path\\fileserver.x.y\config\csv-mappings.csv

For every metadata property (column) you want to extract from the CSV file add a new metadata definition with following property settings:

  • If Expression Matches:{{ZIP}}… this is the name of the mapping column in the CSV file (header name of the column containing the keys to map the documents)
  • In Property:customer_zipcode … this is the source document metadata property from the indexed document used to map the results (this could also be mes:key or any other property)
  • Name:City… this is the desired metadata name of the new property to extract (will be available for searching and if listed in the categoryDescriptor also visible in the results)
  • Value:{{City}}… this is the name of the desired target column in the CSV file (header name of column to be extracted)