Home
Home
German Version
Support
Impressum
20.5 Release ►

    Main Navigation

    • Preparation
      • Connectors
      • Initial Startup for G6 appliances (before January 2018)
      • Initial Startup for G7 appliances
      • Setup InSpire G7 primary and Standby Appliances
    • Datasources
      • Configuration - Atlassian Confluence Connector
      • Configuration - Best Bets Connector
      • Configuration - Data Integration Connector
      • Configuration - Documentum Connector
      • Configuration - Dropbox Connector
      • Configuration - Egnyte Connector
      • Configuration - GitHub Connector
      • Configuration - Google Drive Connector
      • Configuration - GSA Adapter Service
      • Configuration - HL7 Connector
      • Configuration - IBM Connections Connector
      • Configuration - IBM Lotus Connector
      • Configuration - Jira Connector
      • Configuration - JiveSoftware Jive Connector
      • Configuration - JVM Launcher Service
      • Configuration - LDAP Connector
      • Configuration - Microsoft Dynamics CRM Connector
      • Configuration - Microsoft Exchange Connector
      • Configuration - Microsoft File Connector (Legacy)
      • Configuration - Microsoft File Connector
      • Configuration - Microsoft SharePoint Connector
      • Configuration - Salesforce Connector
      • Configuration - SAP KMC Connector
      • Configuration - SemanticWeb Connector
      • Configuration - ServiceNow Connector
      • Configuration - SharePoint Online Connector
      • Configuration - Web Connector
      • Data Integration Guide with SQL Database by Example
      • Indexing user-specific properties (Documentum)
      • Installation & Configuration - Atlassian Confluence Sitemap Generator Add-On
      • Installation & Configuration - Caching Principal Resolution Service
      • Installation & Configuration - Jive Sitemap Generator
      • Mindbreeze InSpire Search Apps in Microsoft SharePoint 2010
      • Mindbreeze InSpire Search Apps in Microsoft SharePoint 2013
      • Mindbreeze InSpire Search Apps in Microsoft SharePoint Online
      • Mindbreeze Web Parts for Microsoft SharePoint
      • User Defined Properties (SharePoint 2013 Connector)
    • Configuration
      • CAS_Authentication
      • Cognito JWT Authentification
      • Configuration - Alternative Search Suggestions and Automatic Search Expansion
      • Configuration - Back-End Credentials
      • Configuration - Chinese Tokenization Plugin (Jieba)
      • Configuration - CJK Tokenizer Plugin
      • Configuration - Collected Results
      • Configuration - CSV Metadata Mapping Item Transformation Service
      • Configuration - Entity Recognition
      • Configuration - Exporting Results
      • Configuration - GSA Late Binding Authentication
      • Configuration - Index-Servlets
      • Configuration - Item Property Generator
      • Configuration - Japanese Language Tokenizer
      • Configuration - Kerberos Authentication
      • Configuration - Management Center Menu
      • Configuration - Metadata Enrichment
      • Configuration - Metadata Reference Builder Plugin
      • Configuration - Notifications
      • Configuration - Personalized Relevance
      • Configuration - Plugin Installation
      • Configuration - Principal Validation Plugin
      • Configuration - Profile
      • Configuration - Reporting Query Logs
      • Configuration - Reporting Query Performance Tests
      • Configuration - Request Header Session Authentication
      • Configuration - Vocabularies for Synonyms and Suggest
      • Configuration of Thumbnail Images
      • Cookie-Authentication
      • Documentation - Mindbreeze InSpire
      • Google Search Appliance Migration to Mindbreeze InSpire
      • I18n Item Transformation
      • Installation & Configuration - Outlook Add-In
      • Installation - GSA Base Configuration Package
      • Language detection - LanguageDetector Plugin
      • Mindbreeze Personalization
      • Mindbreeze Prediction Service Text Classification
      • Mindbreeze Property Expression Language
      • Mindbreeze Query Expression Transformation
      • Non-Inverted Metadata Item Transformer
      • SAML-based Authentication
      • Trusted Peer Authentication for Mindbreeze InSpire
      • Using the InSpire Snapshot for Development in a CI_CD Scenario
    • Operations
      • app.telemetry Statistics Regarding Search Queries
      • Configuration Usage Analysis
      • Deletion of Hard Disks
      • Handbook - Backup & Restore
      • Handbook - Command Line Tools
      • Handbook - Distributed Operation (G7)
      • Handbook - Filemanager
      • Handbook - Synchronized Operation (G6)
      • Index Operating Concepts
      • Indexing and Search Logs
      • Inspire Diagnostics and Resource Monitoring
      • InSpire Support Documentation
      • Mindbreeze InSpire SFX Update
      • Provision of app.telemetry Information on G7 Appliances via SNMPv3
      • Restoring to As-Delivered Condition
    • User Manual
      • Cheat Sheet
      • iOS App
      • Keyboard Operation
    • SDK
      • api.v2.alertstrigger Interface Description
      • api.v2.export Interface Description
      • api.v2.personalization Interface Description
      • api.v2.search Interface Description
      • api.v2.suggest Interface Description
      • api.v3.admin.SnapshotService Interface Description
      • Debugging (Eclipse)
      • Developing an API V2 search request response transformer
      • Developing Item Transformation and Post Filter Plugins with the Mindbreeze SDK
      • Development of Insight Apps
      • Java API Interface Description
    • Release Notes
      • Release Notes 20.1 Release - Mindbreeze InSpire
      • Release Notes 20.2 Release - Mindbreeze InSpire
      • Release Notes 20.3 Release - Mindbreeze InSpire
      • Release Notes 20.4 Release - Mindbreeze InSpire
      • Release Notes 20.5 Release - Mindbreeze InSpire
      • Release Notes 2018 Spring - Mindbreeze InSpire
      • Release Notes 2018 Winter - Mindbreeze InSpire
      • Release Notes 2019 Fall - Mindbreeze InSpire
      • Release Notes 2019 Winter - Mindbreeze InSpire
    • Security
      • Known Vulnerablities
    • Product Information
      • Product Information - Mindbreeze InSpire - Standby
      • Product Information - Mindbreeze InSpire
    Home

    Path

    Japanese Language Tokenizer

    Kuromoji

    Copyright ©

    Mindbreeze GmbH, A-4020 Linz, 2020.

    All rights reserved. All hardware and software names used are brand names and/or trademarks of their respective manufacturers.

    These documents are strictly confidential. The submission and presentation of these documents does not confer any rights to our software, our services and service outcomes, or any other protected rights. The dissemination, publication, or reproduction hereof is prohibited.

    For ease of readability, gender differentiation has been waived. Corresponding terms and definitions apply within the meaning and intent of the equal treatment principle for both sexes.

    IntroductionPermanent link for this heading

    This document deals with the Japanese tokenizer. This allows Mindbreeze InSpire to crawl and understand Japanese content. Essentially, this technology splits sentences into individual interrelated parts (tokens) in order to provide an optimized search experience. The tokenizer is based on the Kuromoji framework.

    RequirementsPermanent link for this heading

    Before using the tokenizer, make sure that the Mindbreeze server is installed. To use the plugin, an index with an active crawler that contains the Japanese content must be configured on the Mindbreeze InSpire appliance.

    Set-upPermanent link for this heading

    To activate the Japanese tokenizer, the following steps must be carried out:

    • Installing the plugin
    • Setting up the post filter
    • Setting up query transformation services
    • Reindexing contents that were already indexed before the tokenizer was installed

    Installing the pluginPermanent link for this heading

    The tokenizer is available as a ZIP file. This file must be installed as follows on the Mindbreeze InSpire Appliance using the Management Center:

    • Navigate to the Management Center
    • Then select the Plugins tab and upload the tokenizer plugin.zip.

    UninstallingPermanent link for this heading

    To uninstall the tokenizer, you must first delete all uses of the tokenizer in the configuration and then delete the plugin from the Mindbreeze InSpire Appliance as described in the following steps:

    • Navigate to the Management Center
    • Then select the “Plugins” tab and upload the tokenizer plugin.zip.
    • Select the “Plugins” tab and locate the installed tokenizer plugin.

    Setting up the post filterPermanent link for this heading

    In the tokenizer, the post filter is used to tokenize (split) the contents during crawling and before they are stored in the index.

    • Navigate to the Management Center
    • Select the “Filter” tab, activate “Advanced Settings” and open the filter that you want to use to tokenize Japanese content:

    • Then search for the “Post Filter Transformation Services” option and add the tokenizer post filter plugin (TextPlugin.Kuromoji):


      To modify the tokenizer settings, expand them by clicking the “plus sign” icon.


      Tokenizer mode: This allows you to switch between the different tokenizer modes of the Kuromoji framework:


      Regex metadata name: In this field, you can use regex to define which metadata names should be excluded from tokenizing. If you formulate several regular expressions, they can be linked using the “|” symbol. In most cases, this can be left empty by default.

    Setting up the query transformation servicePermanent link for this heading

    In the tokenizer, the query transformation service is used to ensure that the text entered by the end user in the search field is also “tokenized” before the query. If this is not the case, the index tokenization doesn’t match that of the search query. This would have the same effect as as if you had not configured a tokenizer.

    • Navigate to the Management Center
    • Choose the “Indices” tab
    • Activate the “Advanced Settings” and open the index containing the Japanese contents. Select the filter on which you have configured the post filter:

    • Look for the setting Query Transformation Services and add the tokenizer service:

    • Then open the settings of the Query Transformation Service by clicking the “plus sign” icon, and configure this equivalent to the post filter:

    Content re-indexingPermanent link for this heading

    If documents already exist in your index, they must be re-indexed because the existing documents have not yet been tokenized.

    Download PDF

    • Configuration - Japanese Language Tokenizer

    Content

    • Introduction
    • Requirements
    • Set-up

    Download PDF

    • Configuration - Japanese Language Tokenizer