Home
Home
German Version
Support
Impressum
25.3 Release ►

Start Chat with Collection

    Main Navigation

    • Preparation
      • Connectors
      • Create an InSpire VM on Hyper-V
      • Initial Startup for G7 appliances
      • Setup InSpire G7 primary and Standby Appliances
    • Datasources
      • Configuration - Atlassian Confluence Connector
      • Configuration - Best Bets Connector
      • Configuration - Box Connector
      • Configuration - COYO Connector
      • Configuration - Data Integration Connector
      • Configuration - Database Connector
      • Configuration - Documentum Connector
      • Configuration - Dropbox Connector
      • Configuration - Egnyte Connector
      • Configuration - GitHub Connector
      • Configuration - Google Drive Connector
      • Configuration - GSA Adapter Service
      • Configuration - HL7 Connector
      • Configuration - IBM Connections Connector
      • Configuration - IBM Lotus Connector
      • Configuration - Jira Connector
      • Configuration - JVM Launcher Service
      • Configuration - LDAP Connector
      • Configuration - Microsoft Azure Principal Resolution Service
      • Configuration - Microsoft Dynamics CRM Connector
      • Configuration - Microsoft Exchange Connector
      • Configuration - Microsoft File Connector (Legacy)
      • Configuration - Microsoft File Connector
      • Configuration - Microsoft Graph Connector
      • Configuration - Microsoft Loop Connector
      • Configuration - Microsoft Project Connector
      • Configuration - Microsoft SharePoint Connector
      • Configuration - Microsoft SharePoint Online Connector
      • Configuration - Microsoft Stream Connector
      • Configuration - Microsoft Teams Connector
      • Configuration - Salesforce Connector
      • Configuration - SCIM Principal Resolution Service
      • Configuration - SemanticWeb Connector
      • Configuration - ServiceNow Connector
      • Configuration - Web Connector
      • Configuration - Yammer Connector
      • Data Integration Guide with SQL Database by Example
      • Indexing user-specific properties (Documentum)
      • Installation & Configuration - Atlassian Confluence Sitemap Generator Add-On
      • Installation & Configuration - Caching Principal Resolution Service
      • Installation & Configuration - Mindbreeze InSpire Insight Apps in Microsoft SharePoint On-Prem
      • Mindbreeze InSpire Insight Apps in Microsoft SharePoint Online
      • Mindbreeze Web Parts for Microsoft SharePoint
      • User Defined Properties (SharePoint 2013 Connector)
      • Whitepaper - Mindbreeze InSpire Insight Apps in Salesforce
      • Whitepaper - Web Connector - Setting Up Advanced Javascript Usecases
    • Configuration
      • CAS_Authentication
      • Configuration - Alerts
      • Configuration - Alternative Search Suggestions and Automatic Search Expansion
      • Configuration - Back-End Credentials
      • Configuration - Chinese Tokenization Plugin (Jieba)
      • Configuration - CJK Tokenizer Plugin
      • Configuration - Collected Results
      • Configuration - CSV Metadata Mapping Item Transformation Service
      • Configuration - Entity Recognition
      • Configuration - Exporting Results
      • Configuration - External Query Service
      • Configuration - Filter Plugins
      • Configuration - GSA Late Binding Authentication
      • Configuration - Identity Conversion Service - Replacement Conversion
      • Configuration - InceptionImageFilter
      • Configuration - Index-Servlets
      • Configuration - InSpire AI Chat and Insight Services for Retrieval Augmented Generation
      • Configuration - Item Property Generator
      • Configuration - Japanese Language Tokenizer
      • Configuration - Kerberos Authentication
      • Configuration - Management Center Menu
      • Configuration - Metadata Enrichment
      • Configuration - Metadata Reference Builder Plugin
      • Configuration - Mindbreeze Proxy Environment (Remote Connector)
      • Configuration - Personalized Relevance
      • Configuration - Plugin Installation
      • Configuration - Principal Validation Plugin
      • Configuration - Profile
      • Configuration - Reporting Query Logs
      • Configuration - Reporting Query Performance Tests
      • Configuration - Request Header Session Authentication
      • Configuration - Shared Configuration (Windows)
      • Configuration - Vocabularies for Synonyms and Suggest
      • Configuration of Thumbnail Images
      • Cookie-Authentication
      • Documentation - Mindbreeze InSpire
      • I18n Item Transformation
      • Installation & Configuration - Outlook Add-In
      • Installation - GSA Base Configuration Package
      • JWT Authentication
      • Language detection - LanguageDetector Plugin
      • Mindbreeze Personalization
      • Mindbreeze Property Expression Language
      • Mindbreeze Query Expression Transformation
      • SAML-based Authentication
      • Trusted Peer Authentication for Mindbreeze InSpire
      • Using the InSpire Snapshot for Development in a CI_CD Scenario
      • Whitepaper - AI Chat
      • Whitepaper - Create a Google Compute Cloud Virtual Machine InSpire Appliance
      • Whitepaper - Create a Microsoft Azure Virtual Machine InSpire Appliance
      • Whitepaper - Create AWS 10M InSpire Appliance
      • Whitepaper - Create AWS 1M InSpire Appliance
      • Whitepaper - Create AWS 2M InSpire Appliance
      • Whitepaper - Create Oracle Cloud 10M InSpire Application
      • Whitepaper - Create Oracle Cloud 1M InSpire Application
      • Whitepaper - MMC_ Services
      • Whitepaper - Natural Language Question Answering (NLQA)
      • Whitepaper - SSO with Microsoft AAD or AD FS
      • Whitepaper - Text Classification Insight Services
    • Operations
      • Adjusting the InSpire Host OpenSSH Settings - Set LoginGraceTime to 0 (Mitigation for CVE-2024-6387)
      • app.telemetry Statistics Regarding Search Queries
      • CIS Level 2 Hardening - Setting SELinux to Enforcing mode
      • Configuration - app.telemetry dashboards for usage analysis
      • Configuration - Usage Analysis
      • Deletion of Hard Disks
      • Handbook - Backup & Restore
      • Handbook - Command Line Tools
      • Handbook - Distributed Operation (G7)
      • Handbook - Filemanager
      • Handbook - Indexing and Search Logs
      • Handbook - Updates and Downgrades
      • Index Operating Concepts
      • Inspire Diagnostics and Resource Monitoring
      • Provision of app.telemetry Information on G7 Appliances via SNMPv3
      • Restoring to As-Delivered Condition
      • Whitepaper - Administration of Insight Services for Retrieval Augmented Generation
    • User Manual
      • Browser Extension
      • Cheat Sheet
      • iOS App
      • Keyboard Operation
    • SDK
      • api.chat.v1beta.generate Interface Description
      • api.v2.alertstrigger Interface Description
      • api.v2.export Interface Description
      • api.v2.personalization Interface Description
      • api.v2.search Interface Description
      • api.v2.suggest Interface Description
      • api.v3.admin.SnapshotService Interface Description
      • Debugging (Eclipse)
      • Developing an API V2 search request response transformer
      • Developing Item Transformation and Post Filter Plugins with the Mindbreeze SDK
      • Development of a Query Expression Transformer
      • Development of Insight Apps
      • Embedding the Insight App Designer
      • Java API Interface Description
      • OpenAPI Interface Description
    • Release Notes
      • Release Notes 20.1 Release - Mindbreeze InSpire
      • Release Notes 20.2 Release - Mindbreeze InSpire
      • Release Notes 20.3 Release - Mindbreeze InSpire
      • Release Notes 20.4 Release - Mindbreeze InSpire
      • Release Notes 20.5 Release - Mindbreeze InSpire
      • Release Notes 21.1 Release - Mindbreeze InSpire
      • Release Notes 21.2 Release - Mindbreeze InSpire
      • Release Notes 21.3 Release - Mindbreeze InSpire
      • Release Notes 22.1 Release - Mindbreeze InSpire
      • Release Notes 22.2 Release - Mindbreeze InSpire
      • Release Notes 22.3 Release - Mindbreeze InSpire
      • Release Notes 23.1 Release - Mindbreeze InSpire
      • Release Notes 23.2 Release - Mindbreeze InSpire
      • Release Notes 23.3 Release - Mindbreeze InSpire
      • Release Notes 23.4 Release - Mindbreeze InSpire
      • Release Notes 23.5 Release - Mindbreeze InSpire
      • Release Notes 23.6 Release - Mindbreeze InSpire
      • Release Notes 23.7 Release - Mindbreeze InSpire
      • Release Notes 24.1 Release - Mindbreeze InSpire
      • Release Notes 24.2 Release - Mindbreeze InSpire
      • Release Notes 24.3 Release - Mindbreeze InSpire
      • Release Notes 24.4 Release - Mindbreeze InSpire
      • Release Notes 24.5 Release - Mindbreeze InSpire
      • Release Notes 24.6 Release - Mindbreeze InSpire
      • Release Notes 24.7 Release - Mindbreeze InSpire
      • Release Notes 24.8 Release - Mindbreeze InSpire
      • Release Notes 25.1 Release - Mindbreeze InSpire
      • Release Notes 25.2 Release - Mindbreeze InSpire
      • Release Notes 25.3 Release - Mindbreeze InSpire
    • Security
      • Known Vulnerablities
    • Product Information
      • Product Information - Mindbreeze InSpire - Standby
      • Product Information - Mindbreeze InSpire
    Home

    Path

    Sure, you can handle it. But should you?
    Let our experts manage the tech maintenance while you focus on your business.
    See Consulting Packages

    Language Detection
    LanguageDetector Plug-In

    IntroductionPermanent link for this heading

    Mindbreeze provides languge dectection for documents using the LangugageDector ItemTransformer plugin.

    LanguageDetector Plug-InPermanent link for this heading

    To use the language detection the LanguageDetector has to be added to you Mindbreeze installation by loading the corresponding plugin (the Item Transformation Services are included in the package “ Mindbreeze Item Transformation Plugins”). Install the plugin use the manager UI.

    The plugin also has to be included in your Mindbreeze license.

    ConfigurationPermanent link for this heading

    • Activate the plugin for each needed index using the manager UI:
      • Select the tab „Indices“ and activate „Advanced Settings“
      • Scroll to the „Item Transformation Services” section
      • Select the “TextPlugin.LanguageDetector” plugin and click add.

    • Language Probability Threshold: Specifies the probability threshold which has to be reached for a language to be included.
    • Source Property Pattern: Specifies the property used for language detection.
    • Language Target Property: Specifies the new property for the detected languages. To be able to filter by this metadata, it must be aggregatable. To do this, activate the Advanced Settings and add the metadata in the Aggregated Metadata Keys option in the index configuration.
    • Language Property: defines the property which already includes the language. This skips the language detection and sets target property.
    • Language Property Pattern: Defines languages that should be considered from the “Language Property”
    • Included Languages: Defines the languages for language detection (use the language abbreviations, see table in the appendix section, separated by commas, e.g. en,de). If no languages are specified, the plugin tries to detect all supported languages. It is recommended to specify the required languages, as this will improve the quality of the language detection significantly.
    • Force Included Languages: If enabled, the probabilities are only calculated on the basis of the “Included Languages" (and not on the basis of all supported languages). If only a few languages are configured in "Included Languages", it is advisable to disable this option.
    • Short Text Algorithm Text Length: For short texts, the quality of speech recognition can be improved by using the "Short Text Algorithm". This setting determines the maximum length of the text (in characters) for which the "Short Text Algorithm" is used. Longer texts are analyzed with the “normal” algorithm.
    • Max Text Length (characters): Determines the maximum length of the text (in number of characters) to be used for the analysis. For performance reasons, only the first characters of longer texts are used for analysis, the rest is skipped. The length of the text includes the sum of the contents of all metadata found with the Source Property Pattern. Default value: 100000
    • No Language found set property key and No Language found property value: If speech recognition could not determine a language, a metadata can be set with a name (key) and a value (value). This can be useful to explicitly mark documents with no recognized language.

    Run the LanguageDetector as separate ServicePermanent link for this heading

    The LanguageDetector plugin can be used not only as an item transformation service, but also as a separate service. This can provide performance advantages for large installations with multiple indices, since only one single LanguageDetector service is operated for all indices and not one instance per index.

    To run the LanguageDetector plugin as a standalone service, install the MetadataTransformationService-<version>.zip plugin. Add a new service in the "Indices" tab in the "Services" section and select "ItemTransformationServicePlugin.LanguageDetector". In the settings of the new service set a "Display Name" and the "Bind port" to a free TCP port. The remaining settings are to be set according to the section "Configuration". Finally, switch to the "Indices" section in the "Indices" tab and add an Item Transformation Service to the respective index and reference the created service.

    AppendixPermanent link for this heading

    Language profilesPermanent link for this heading

    Here you can find a list of language profiles supported by the LanguageDetector. The listed languages can be used in the configuration of the LanguageDetector ("Included Languages" option). The option "Short Text Algorithm Text Length" defines for which text lengths the long or short text profile of the respective languages is selected. For more information on configuration, see below.

    Abbreviation
    („Included
    Languages“)

    Language

    Long text
    profile available

    Short text profile available

    af

    Afrikaans

    X

    an

    Aragonese

    X

    ar

    Arabic

    X

    ast

    Asturian

    X

    be

    Belarusian

    X

    br

    Breton

    X

    ca

    Catalan

    X

    bg

    Bulgarian

    X

    bn

    Bengali

    X

    cs

    Czech

    X

    X

    cy

    Welsh

    X

    da

    Danish

    X

    X

    de

    German

    X

    X

    el

    Greek

    X

    en

    English

    X

    X

    es

    Spanish

    X

    X

    et

    Estonian

    X

    eu

    Basque

    X

    fa

    Persian

    X

    fi

    Finnish

    X

    X

    fr

    French

    X

    X

    ga

    Irish

    X

    gl

    Galician

    X

    gu

    Gujarati

    X

    he

    Hebrew

    X

    hi

    Hindi

    X

    hr

    Croatian

    X

    ht

    Haitian

    X

    hu

    Hungarian

    X

    id

    Indonesian

    X

    X

    is

    Icelandic

    X

    it

    Italian

    X

    X

    ja

    Japanese

    X

    km

    Khmer

    X

    kn

    Kannada

    X

    ko

    Korean

    X

    lt

    Lithuanian

    X

    lv

    Latvian

    X

    mk

    Macedonian

    X

    ml

    Malayalam

    X

    mr

    Marathi

    X

    ms

    Malay

    X

    mt

    Maltese

    X

    ne

    Nepali

    X

    nl

    Dutch

    X

    X

    no

    Norwegian

    X

    X

    oc

    Occitan

    X

    pa

    Punjabi

    X

    pl

    Polish

    X

    X

    pt

    Portuguese

    X

    X

    ro

    Romanian

    X

    X

    ru

    Russian

    X

    sk

    Slovak

    X

    sl

    Slovene

    X

    so

    Somali

    X

    sq

    Albanian

    X

    sr

    Serbian

    X

    sv

    Swedish

    X

    X

    sw

    Swahili

    X

    ta

    Tamil

    X

    te

    Telugu

    X

    th

    Thai

    X

    tl

    Tagalog

    X

    tr

    Turkish

    X

    X

    uk

    Ukrainian

    X

    ur

    Urdu

    X

    vi

    Vietnamese

    X

    X

    yi

    Yiddish

    X

    zh-cn

    Simplified Chinese

    X

    zh-tw

    Traditional Chinese

    X

    Download PDF

    • Language detection - LanguageDetector Plugin

    Content

    • Introduction
    • LanguageDetector Plug-In
    • Appendix

    Download PDF

    • Language detection - LanguageDetector Plugin