Copyright ©
Mindbreeze GmbH, A-4020 Linz, 2024.
All rights reserved. All hardware and software names used are brand names and/or trademarks of their respective manufacturers.
These documents are strictly confidential. The submission and presentation of these documents does not confer any rights to our software, our services and service outcomes, or any other protected rights. The dissemination, publication, or reproduction hereof is prohibited.
For ease of readability, gender differentiation has been waived. Corresponding terms and definitions apply within the meaning and intent of the equal treatment principle for both sexes.
In this documentation, you will learn how to index into a Mindbreeze InSpire Appliance using a Mindbreeze Proxy Environment.
A Mindbreeze Proxy Environment can be useful for you if your actual Mindbreeze InSpire Appliance does not have access to the data sources to be indexed due to the network infrastructure - or in other words - if your data sources in your LAN are not accessible from the outside (Internet or VPN) and your Mindbreeze InSpire Appliance is located at another site (different LAN). This can be the case, for example, if your Mindbreeze InSpire Appliance is hosted in the cloud (SaaS).
In such a case, a Mindbreeze Proxy Environment is ideal. This can be run as a Virtual Machine (VM) on your LAN and crawl the documents from the local data sources. These documents are then sent to the Mindbreeze InSpire Appliance and indexed there. The Semantic Pipeline is then run on the appliance; search queries are also processed by the appliance. The Mindbreeze Proxy Environment is only responsible for crawling.
See Initial Operation (for VMs, the sections on hardware and iDRAC can be skipped). Furthermore, a special license for the product "Mindbreeze InSpire Remote Connector" must be installed on the Mindbreeze Proxy environment.
You need an index to which the Mindbreeze Proxy Environment sends the documents that should be indexed. In addition, you need backend credentials with which the Mindbreeze Proxy Environment authenticates itself on the Mindbreeze InSpire Appliance.
Create a new index on the Mindbreeze InSpire Appliance. To do so, navigate to "Configuration" and the "Indices" tab in the Mindbreeze Management Center. Then click on "Add Index". Note the "Index Port (HTTP)" that was automatically assigned. You can also change the port.
Then disable the data source by clicking "Disable" under "Data Source". The data source will be configured later on the Mindbreeze Proxy Environment. Note that you must not delete the data source, but disable it instead, otherwise the search will not work correctly.
Now create a new Caching Principal Resolution Service on the Mindbreeze InSpire appliance. To do this, navigate to "Configuration" and the "Indices" tab in the Mindbreeze Management Center. Then click on "Add Service".
Now select the appropriate "Service" you need for your data source. Then activate the "Readonly" checkbox. Otherwise, no further configuration is necessary on the "Caching Principal Resolution Service".
Finally, select the just configured "Caching Principal Resolution Service" in the data source (under "Data Sources" at the configured index).
Now switch to the "Filters" tab and activate "Advanced Settings". Scroll down to the "Base Configuration" section and configure the following options:
Destination Pattern | https://mycompany\.mindbreeze\.com:8443/realm/master/api/v1/index/(\d+)(.*) Replace "mycompany\.mindbreeze\.com" with your appliance hostname (see also "Remote Base URL" option in Mindbreeze Proxy Environment configuration). Also, replace the realm "master" (see the "Realm" option in the Mindbreeze Proxy Environment configuration). |
Destination Replacement | http://localhost:\1/\2 |
In order for the Mindbreeze Proxy environment to have access to the filters and index services via OAuth 2, a user is required that holds at least the "InSpire Index Writer" role. If you do not have a suitable user, please create a new one. For more information, see Configuration Backend Credentials.
Go to the Mindbreeze Management Center of the Mindbreeze Proxy Environment. Here you can configure the information required to use the services of the Mindbreeze Proxy environment. In addition, you can also configure the crawlers for your data sources here.
In the "Configuration" menu, switch to the "Indices" tab and click on the "Add Index" drop down menu and then select "Add Remote Index".
In the opened dialogue, select the node ID of the proxy environment under "Remote Index Node". Then select the desired data source under "Data Source".
Only data sources that have a caching principal resolution service are suitable for remote connectors.
Depending on the data source, you will find more information on setting it up in the associated data source documentation.
Configure the following fields for the newly added remote index:
Remote Base URL | The URL to the Mindbreeze Management Center of the Mindbreeze InSpire appliance. If hosted in the cloud, usually https://mycompany.mindbreeze.com:8443 |
Realm | For on-prem appliances "master" by default, in the cloud this value must be adjusted |
Index Port | The Index Port on the Mindbreeze InSpire Appliance |
Filter Service ID | The Filter Service ID on the Mindbreeze InSpire Appliance |
Filter Port | The Filter Port on the Mindbreeze InSpire Appliance. The configuration of a Filter Port is only required, if the Filter Service ID is not configured. |
The configuration of the Filter Service ID:
If the selected data source is to be used with ACLs and is not public, there are certain limitations.
Remote indexes require a caching principal resolution service to handle ACLs, which are not available for all data sources.
The following data sources cannot be used as a remote index with ACLs:
Now create a new Caching Principal Resolution Service on the Mindbreeze proxy environment. To do this, navigate to "Configuration" and the "Indices" tab in the Mindbreeze Management Center. Then click on "Add Service".
Configure the service according to the Configuration of Caching Principal Resolution Service. Then click on "Add Property" in the "Consumer Caching Principal Resolution Services" section and configure the following fields:
Readonly on Consumer | This checkbox should be selected only on producer nodes of Mindbreeze InSpire environments. |
Base URL | The URL to the Mindbreeze Management Center of the Mindbreeze InSpire appliance. If hosted in the cloud, usually https://mycompany.mindbreeze.com:8443 |
Realm | For on-prem appliances "master" by default, in the cloud this value must be adjusted |
Service Port | The Caching Principal Resolution Service Port on the Mindbreeze InSpire Appliance |
Disable | Disable updating remote cache |
Switch to the "Network" tab and click on "Add Credential". Configure the following fields:
Name | Assign an arbitrary, but meaningful name |
Type | OAuth 2 |
Access Token URL | The URL from which OAuth 2 access tokens can be requested. If hosted in the cloud, usually https://mycompany.mindbreeze.com:8443/auth/realms/master/protocol/openid-connect/token. Please note that the realm ("master") must be customized for cloud environments. |
Client ID | OAuth 2 Client ID. The default client "mindbreeze-inspire-public" is recommended |
Username | Username of a user who has the role "InSpire Index Writer". See also Configuration Backend Credentials |
Password | Password of this user |
Click Add Endpoint and configure the following fields:
Location | https://mycompany.mindbreeze.com:8443/realm/master (Please note that the realm ("master") must be customized for cloud environments). |
Credential | The credential that you have previously created |
If you have problems, here is a list of possible solutions:
On the Mindbreeze Proxy environment, open the current crawler log file (in /data/logs/log-mescrawler_launchedservice-<service>/current/log-mescrawler_launchedservice.log). If you find an error message there, it will probably indicate incorrect or missing configuration parameters. Depending on the error message, please check the following parts of the configuration:
If you did not find any errors or other errors in the crawler logs on the Mindbreeze Proxy Environment that indicate filter or index problems, please check the filter or index logs on the Mindbreeze InSpire Appliance.
You can check whether a connection to the remote index is possible by executing the following command on your appliance in the inspire container. In this case, we will try to call the Remote Base URL https://mycompany.mindbreeze.com:8443/:
curl -kv https://mycompany.mindbreeze.com:8443/
If the connection is successful, you should receive an arbitrary HTML status code as a response (no login is performed here, this is only a connection test):
…
< HTTP/1.1 401 Unauthorized
…
However, if you get a response like this, it means that you may need to configure a proxy or that a firewall activation is necessary:
* Connection refused
* Failed connect to mycompany.mindbreeze.com:8443; Connection refused
* Closing connection 0
curl: (7) Failed connect to mycompany.mindbreeze.com:8443; Connection refused
If a proxy is necessary, you must configure it in the Management Center as described here.
Furthermore, in order to be able to make a connection test with the proxy using the curl command, an environment variable must also be set as follows:
export https_proxy=myproxy.mycompany.com:8080
Ideally, you should now be able to establish a successful connection with the curl connection test mentioned above.
However, if you get an answer like this:
< HTTP/1.1 403 Forbidden
< Server: squid/...
...
< X-Squid-Error: ERR_ACCESS_DENIED 0
...
this means that you have successfully configured a proxy, but the proxy does not allow a connection. In this case, a proxy activation is necessary.