Mindbreeze GmbH, A-4020 Linz, 2019.
All rights reserved. All hardware and software names used are brand names and/or trademarks of their respective manufacturers.
These documents are strictly confidential. The submission and presentation of these documents does not confer any rights to our software, our services and service outcomes, or any other protected rights. The dissemination, publication, or reproduction hereof is prohibited.
For ease of readability, gender differentiation has been waived. Corresponding terms and definitions apply within the meaning and intent of the equal treatment principle for both sexes.
This document deals with the HANLP text tokenizer plugin. This allows Mindbreeze InSpire to crawl and understand Chinese content. Essentially, this technology splits sentences into individual interrelated parts (tokens) in order to provide an optimized search experience. The tokenizer plugin requires a tokenizer service (not included).
A tokenizer service must already be configured before using the plugin.
To activate the HANLP tokenizer, the following steps must be carried out:
In the tokenizer, the post filter is used to tokenize (split) the contents during crawling and before they are stored in the index.
In the tokenizer, the query transformation service is used to ensure that the text entered in the search field by the end user is also tokenized before the query. If this is not the case, the index tokenization doesn’t match that of the search query. This would have the same effect as if you had not configured a tokenizer.
If documents already exist in your index, they must be re-indexed because the existing documents have not yet been tokenized.