Help

Copyright ©

Mindbreeze GmbH, A-4020 Linz, 2017.

 

All rights reserved. All hardware and software names are trade names and/or trademarks of their respective owners.

These documents are confidential. The delivery and presentation of these documents alone does not justify any rights whatsoever to our software, our services and service performance results or other protected rights. The disclosure, publication or reproduction is not permitted.

For reasons of easier legibility, gender differentiation has been dispensed with. In terms of equal treatment, appropriate terms apply to both sexes.

.

IntroductionPermanent link for this heading

Mindbreeze InSpire® can be operated using determined producer and consumer nodes.

One or more servers serve as Producer. Initial indexing and delta indexing are carried out according to the respective valid configuration on these nodes.

In addition, these servers operate all Mindbreeze indices as well as one Mindbreeze Filter Service each. On these servers, the indices are produced (indexed) and they also perform the delta indexing. The Producer nodes are thus pure producers of indices.

The thus generated or renewed indices are automatically distributed by copying to the consumer nodes.

The following are running on the consumer node (also spread out over several producers):

  • all Mindbreeze indices in read mode,
  • the relevant sandbox processes for contextualization and authorization. as well as
  • client services.

These Consumer servers are responsible for answering search queries and providing client services. In order to ensure an efficient distribution of newly created or updated indices, the use of an elastic index is necessary. This feature enables automatic sizing of indices and allows only the actual changes in the index to be transferred from the producer to the consumer.


An elastic index automatically scales with the amount of indexed objects. Therefore, an index is only limited by the hardware on which the index runs.

If one uses an elastic index, only those files of the index in which the data resides will be adjusted during delta Indexing. This makes it possible to copy only those files that have changed in the distribution of updated indices. The transfer amount and the time needed to update the data between producer and consumer are reduced to a minimum.

Each configuration is generated on the Producer server(s) for all involved nodes. The generated configuration is then distributed to all other servers (Consumer). This concept also ensures the failsafe configuration of Mindbreeze InSpire. For this, load distributors are needed / required.

Key BenefitsPermanent link for this heading

  • Indexing incl. delta crawl runs do not negatively affect search performance
  • The following have no effect on the search performance
  • Initial indexing during ongoing operations
  • Delta indexing during ongoing operations
  • High-frequency delta indexing (e.g. 15 minute updating)
  • no loss of data sources during the re-indexing (search on the Consumers with existing indices, re-indexing on the Producers)
  • Easy migration to new versions of the product even if re-indexing is recommended / necessary.
  • Periodic re-indexing is possible without affecting the operation
  • Flexibility e.g. in configuration updates

Pre-RequisitesPermanent link for this heading

  • Additional hardware for Producer and Consumer nodes.
  • For increased failsafe performance, additional load distributors are also required.

PreparationPermanent link for this heading

SSH & SCP Configuration (Linux only)Permanent link for this heading

First, determine the configured service user under /etc/mindbreeze/runtime.conf (normally mes).

Then, make sure that it is possible to copy without interaction via SCP to the Consumer from the Producer.

Example:

An SSH PubKey without passphrase must be present on the Producer, and this must be entered in the $HOME/.ssh/authorized_keys of the Consumer (ssh-keygen ssh-copy-id).

Also the known hosts connections are to be made, for example, by manually creating the connection from the Producer via ssh:

Example:

Known Hosts:

su mes (execute on Consumer and Producer)

ssh –vt mes@Producer (execute on Consumer)

ssh –vt mes@Consumer (execute on Producer)

Create RSA Key:

Must be done for Consumer and Producer.

ssh-keygen -t rsa:

Generating public/private rsa key pair.

Enter file in which to save the key (/home/demo/.ssh/id_rsa):

Enter passphrase (empty for no passphrase):

Enter same passphrase again:

Your identification has been saved in /home/demo/.ssh/id_rsa.

Your public key has been saved in /home/demo/.ssh/id_rsa.pub.

The key fingerprint is:

4a:dd:0a:c6:35:4e:3f:ed:27:38:8c:74:44:4d:93:67 demo@a

The key's randomart image is:

+--[ RSA 2048]----+

|          .oo.   |

|         .  o.E  |

|        + .  o   |

|     . = = .     |

|      = S = .    |

|     o + = +     |

|      . o + o .  |

|           . o   |

|                 |

+-----------------+

The public key is filed under /home/mes/.ssh/id_rsa.pub. The private key (identification) can be found under /home/mes/.ssh/id_rsa.

su mes (executed on Consumer and Producer)
ssh-copy-id mes@consumervm (executed on Producer)
ssh-copy-id mes@producervm (executed on Consumer)

  • Test the configuration
  • ssh mes@localhost
  • ssh mes@examplehost      (not fully qualified)
  • ssh mes@examplehost.exampledomain     (also fully qualified!)

Incoming DirectoryPermanent link for this heading

Create a directory on the consumer node that has enough space for an index, and to which the producer node can copy the indexing deltas.

The default on Linux is /data/incoming, the service user should be the owner and the group:

  • mkdir /data/incoming
  • chown mes:mes /data/incoming

ConfigurationPermanent link for this heading

Service Configuration in the Manager UIPermanent link for this heading

Create a Producer index in the Manager UI, whereby the following settings are to be applied (under "Advanced Settings"):

  • On Consumer, the mesmasteruri has to be converted to the Producer from the server itself.
  • Open Port 23000, port of the index and Port 5432 on the Producer for the Consumer and vice-versa (firewall activation)
  • Then re-start services on the Producer and Consumer
  • Connect to the Producer with an SCP client and copy the index from Producer to Consumer
  • Then apply the following settings to the Producer
  • Elastic Index: Is already enabled by default
  • Set the external URL on the Producer to https://localhost:Queryport
  • Check the desired Consumer nodes under „Query Services“ and activate the „Sync“ setting. The following settings are necessary for this:
    • Create an “Incoming“ directory which is accessible to the consumer host, to which the newly produced index components can be copied:
  • If the Consumer Query Service should be running on the same node as the Producer, additional settings are required due to resource conflicts:
    • Consumer index directory
    • Consumer index port (may not be used on this node)
    • Consumer data port (may not be used on this node)
    • Disable Producer Query Service: Only Consumer provides a query service.

  • Example:
  • /etc/mindbreeze/mesmasteruri.conf:
  • http://producervm:23000
  • Unlock Port 23000, index port of the Producer index + Port 5432 on the Consumer and on the Producer
  • Open /etc/sysconfig/iptables with an editor of your choice, add the following 3 lines after the default rules and save the file afterwards:

    /etc/sysconfig/iptables:

    -A INPUT -p tcp -m tcp --dport 443 -j ACCEPT

    -A INPUT -p tcp -m tcp --dport 8443 -j ACCEPT

    -A INPUT -p tcp -m tcp --dport 8444 -j ACCEPT

    -A INPUT -p tcp -m tcp --dport 23000 –s <producervm/consumervm> -j ACCEPT

    -A INPUT -p tcp -m tcpdport <INDEXPORT> –s <producervm/consumervm>  -j ACCEPT

    -A INPUT -p tcp -m tcp --dport 5432 –s <producervm/consumervm>  -j ACCEPT

    -A INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT

    -A INPUT -p icmp -j ACCEPT

    -A INPUT -i lo -j ACCEPT

    -A INPUT -p tcp -m state --state NEW -m tcp --dport 22 -j ACCEPT

    -A INPUT -j REJECT --reject-with icmp-host-prohibited

    -A FORWARD -j REJECT --reject-with icmp-host-prohibited

    COMMIT

    • Restart the services on the Producer and Consumer by running the following command:
    • /etc/init.d/iptables restart
    • /etc/init.d/mesmaster restart
    • /etc/init.d/mesnode restart
    • If everything is correct Consumer registers with Producer:


    • Enable Search for the consumer (optional) to search contents on consumer-node

    The consumer nodes that are checked will be automatically configured as query service, and coupled to the Producer as synchronization goals.

    If a synchronization was initiated manually or via script, the index produced will be distributed to all consumer query services for which the "Sync" has been selected.

    Make sure in "Query engines" under "Client Services" that the above-configured consumer "Query Services" are checked.

    Restart the services on the consumernode:

    • /etc/init.d/mesmaster restart
    • /etc/init.d/mesnode restart

    If there is no index-folder after full start of the services (for instance /data/indices/producer) follow the steps below:

    • Copy the index using SCP from the Producer to the Consumer and authorize mes for the directory (paths must be the same):
    • [root@producervm ~]# scp -r /data/indices/producer root@consumervm:/data/indices
    • [root@producervm ~]# chown -R mes:mes /data/indices/producer

    Manual Synchronization via mescontrolPermanent link for this heading


    Once the crawling is complete, synchronization via mescontrol can be done manually

    mescontrol >:> syncdelta

    • If this does not work the first time and you receive the error message shown below, the synchronization must be re-started:

    If everything worked correctly, no output is given.

    Additional TopicsPermanent link for this heading

    Changing of filterable Properties (Docinfo-Reinvertion)Permanent link for this heading

    Filterable properties and regex-matchable properties can be defiend by:

    • CategoryDescriptor (regexmatchable, aggregatable)
    • Added Metadata (Entity Recognition, CSV Transformation, Precomputed Synthesized Metadata)
    • Manually configured “Aggregaed Metadata Keys”

    This requires a reinvertion of these properties. The procuder does this automatically after changing the configuration. On the consumer this leads to the following error message:

    Failed to start IndexQueryService: Readonly index: Failed to convert index (needsCompleteReinvertion=0,needsDocInfoReinvertion=1).  Please convert in readwrite mode

    To prevent this please follow these instructions instead of saving and automatically applying the configuration:

    • Stop the Mindbreeze Services (mesnode) on all Nodes
    • Move the Consumer Indices to another location (or delete them)
    • Verify your Changes again and save them
    • Start the Nodes
    • Run syncdelta after the producer index has finished reinverting to distribute the index