Copyright ©
Mindbreeze GmbH, A-4020 Linz, 2024.
All rights reserved. All hardware and software names used are brand names and/or trademarks of their respective manufacturers.
These documents are strictly confidential. The submission and presentation of these documents does not confer any rights to our software, our services and service outcomes, or any other protected rights. The dissemination, publication, or reproduction hereof is prohibited.
For ease of readability, gender differentiation has been waived. Corresponding terms and definitions apply within the meaning and intent of the equal treatment principle for both sexes.
This document deals with the Japanese tokenizer. This allows Mindbreeze InSpire to crawl and understand Japanese content. Essentially, this technology splits sentences into individual interrelated parts (tokens) in order to provide an optimized search experience. The tokenizer is based on the Kuromoji framework.
WARNING: The Japanese (Kuromoji) Tokenizer is deprecated and will be removed in future releases. Please use the CJK Tokenizer instead.
Before using the tokenizer, make sure that the Mindbreeze server is installed. To use the plugin, an index with an active crawler that contains the Japanese content must be configured on the Mindbreeze InSpire appliance.
To activate the Japanese tokenizer, the following steps must be carried out:
The tokenizer is available as a ZIP file. This file must be installed as follows on the Mindbreeze InSpire Appliance using the Management Center:
To uninstall the tokenizer, you must first delete all uses of the tokenizer in the configuration and then delete the plugin from the Mindbreeze InSpire Appliance as described in the following steps:
In the tokenizer, the post filter is used to tokenize (split) the contents during crawling and before they are stored in the index.
In the tokenizer, the query transformation service is used to ensure that the text entered by the end user in the search field is also “tokenized” before the query. If this is not the case, the index tokenization doesn’t match that of the search query. This would have the same effect as as if you had not configured a tokenizer.
If documents already exist in your index, they must be re-indexed because the existing documents have not yet been tokenized.