Copyright ©
Mindbreeze GmbH, A-4020 Linz, 2023.
All rights reserved. All hardware and software names used are brand names and/or trademarks of their respective manufacturers.
These documents are strictly confidential. The submission and presentation of these documents does not confer any rights to our software, our services and service outcomes, or any other protected rights. The dissemination, publication, or reproduction hereof is prohibited.
For ease of readability, gender differentiation has been waived. Corresponding terms and definitions apply within the meaning and intent of the equal treatment principle for both sexes.
This document covers the installation and configuration of the NLQA (Natural Language Question Answering) plugin and describes the general functionality of this service.
NLQA has the task of parsing a natural language query into a lower-level search query, specifying the named entities to search for.
For example, a search for “Who is the head of mindbreeze academy?” gives you following results:
The plugin can be installed in the MMC by selecting and uploading the File “mindbreeze.plugins.nlqa.base” in the “Configuration” -> “Plugins” area.
To use the NLQA functionality, you must ensure that NER (Named Entity Recognition) is enabled for the used indices.
For more details on NER and how to enable it, see this documentation.
After installing the plugin, navigate to the “Services” section within the “Indices” tab and add a service by clicking “Add Service”. Then select “QueryServicePlugin.NLQA” from the “Service” drop-down menu. “Display name“ can be chosen freely.
Now configure the settings in the respective areas.
Legend:
Bind Port | Specifies the TCP port on which this Service will be accessible. It is important that the port is not already in use by another service (e.g. principal resolution, index or client service). |
Bind address | Here you configure on which IP address the service is accessible. By default (value not set) the IP address 0.0.0.0 (all IP addresses) is used. If, for example, the service should only be accessible on localhost, this option must be set to the value 127.0.0.1. |
Backend Threads | The number of threads that process HTTP requests in parallel. |
Path to the Service Config JSON* | With this option, you must configure the path to your Service Config JSON file that contains the following entries:
|
Path to the Query-Expr Templates* | This option defines the path to all Query-Expression templates that can be used by this service. |
Path to Environment Config JSON | With this option, you can configure the path to your Environment Config JSON file that contains optional key-value pairs of variables to be used in the pipeline config. |
URLs of the Indices to query* | With this option, you have to configure the URLs of all indices to be searchable by the NLQA plugin. E.g. https://localhost:<index port number>/find |
Query Expression Transformation Project ID | The “Query Expression Transformation” pipeline is used for processing the natural language query and transforming it into a lower-level search query. This option allows you to specify the project by defining the project ID to be used for this function. default is used as the default value for your default project. |
Query Expression Transformation Pipeline ID | This option allows you to specify the Query Expression Transformation pipeline of your specified project by defining the pipeline ID to be used. If not set, the default Query Expression Transformation pipeline of your specified project will be used. |
Search Response Transformation Project ID | The “Search Response Transformation” pipeline is used for processing the answers returned by the index. This option allows you to specify the project by defining the project ID to be used for this function. default is used as the default value for your default project. |
Search Response Transformation Pipeline ID | This option allows you to specify the Search Response Transformation pipeline of your specified project by defining the pipeline ID to be used. If not set, the default Search Response Transformation pipeline of your specified project will be used. |
Metadata Sample Length | The maximum length of the metadata in the search result. (in bytes) |
Content Sample Length | The maximum length of the sample text. (in bytes) |
Max content bytes | The specified amount of text (in bytes) corresponds to the maximum amount of text that is considered for answer extraction from the beginning of the document. The amount of text in the document that exceeds this is therefore not relevant for the answer extraction. |
Max Search Results | This option allows you to specify the number of results to be used to get the final answer. Default: 3 |
Path to JSON with custom Relevance Factors | This option allows you to specify a Path to a custom JSON File, that contains relevance factors. All of the configured “Relevance Factors” will override the default values. More information about Relevance Factors can be found here. |
SampleText from Beginning | When enabled, the content of the document is returned to the response extraction service from the beginning. |
Detail Limit | Overrides the “Detail Limit” in the search request. If the override is not desired, the “Don’t change” option can be selected. Note: The "Detail Limit" "Content" is required for extracting answers from text. |
OAuth authentication should only be required, if JWT authentication is disabled/not possible.
OAuth-Token URL | The URL of the Keycloak, where the OAuth-Token can be obtained. |
OAuth Username | Username of your OAuth account. |
OAuth Password | Password of your OAuth account. |
OAuth Client-ID | OAuth Client-ID. |
Note: All these client service settings are mandatory for the NLQA plugin.
The NLQA Plugin uses two pipelines for text processing. A “Query Result Transformation Pipeline” is used for processing the answer returned by the index, and a “Query Expression Transformation Pipeline” is used for processing the natural language query and transforming it into a lower-level search query.
When executing sub-queries with values from aggregations, this processor can be used to define the required aggregated metadata for a specific query template. The template file should then be located under: <query_templates_dir>/<template_name>.txt. These aggregated values can then be used inside of these query templates. The templates are processed by the templating engine Jinja (https://jinja.palletsprojects.com/en/3.0.x/).
Normalize Text:
This processor is responsible for normalizing the text and defining the importance of each infinitive word by its part of speech (PoS).
Example: Verbs are usually important, but the verb "to be" does not contain any information. By adding "be" to the ignored patterns under the "Verb" entry, all variations of the verb (is, am, were, etc.) are ignored.
Find Answers in Search Result: Is a processer responsible for extracting an answer from the search result text.
Recognize Query Keywords: Similar processor to “Normalize Text” with more options to choose for the word importance:
These extracted keywords are then used for the actual search.
Recognize Query Question Type: This processor determines which named entities (e.g. Person, Location) to search for, based on a question phrase.
Example: Question Phrase=Who -> named entity=Person or Organization
Entity Catalogs: This functionality is provided by two processors, “Recognize Entities from Catalogs”, and “Recognize Entities with Patterns and Keywords”. These processors can be used to define more detailed rules for recognizing entities.
Recognize Question Intent:
In this processor all intents + their requirements and a priority are configured. If multiple intents match, the one with the highest priority wins. Configurable criteria consist of:
In order to use some data in the query-template, that is coming from spans, this processor can be used.
The “Label Type” defines the type of the labels to use. There is also the option to filter for a specific “Label Value”. The “New Label Type” can be used to override the existing one. To add one value only once, the “Unique Values” option can be used. With “Use Label Data instead of Label Value” the label data of the original label is used as label value of the new one.