Home
Home
German Version
Support
Impressum
25.2 Release ►

Start Chat with Collection

    Main Navigation

    • Preparation
      • Connectors
      • Create an InSpire VM on Hyper-V
      • Initial Startup for G7 appliances
      • Setup InSpire G7 primary and Standby Appliances
    • Datasources
      • Configuration - Atlassian Confluence Connector
      • Configuration - Best Bets Connector
      • Configuration - Box Connector
      • Configuration - COYO Connector
      • Configuration - Data Integration Connector
      • Configuration - Documentum Connector
      • Configuration - Dropbox Connector
      • Configuration - Egnyte Connector
      • Configuration - GitHub Connector
      • Configuration - Google Drive Connector
      • Configuration - GSA Adapter Service
      • Configuration - HL7 Connector
      • Configuration - IBM Connections Connector
      • Configuration - IBM Lotus Connector
      • Configuration - Jira Connector
      • Configuration - JVM Launcher Service
      • Configuration - LDAP Connector
      • Configuration - Microsoft Azure Principal Resolution Service
      • Configuration - Microsoft Dynamics CRM Connector
      • Configuration - Microsoft Exchange Connector
      • Configuration - Microsoft File Connector (Legacy)
      • Configuration - Microsoft File Connector
      • Configuration - Microsoft Graph Connector
      • Configuration - Microsoft Loop Connector
      • Configuration - Microsoft Project Connector
      • Configuration - Microsoft SharePoint Connector
      • Configuration - Microsoft SharePoint Online Connector
      • Configuration - Microsoft Stream Connector
      • Configuration - Microsoft Teams Connector
      • Configuration - Salesforce Connector
      • Configuration - SCIM Principal Resolution Service
      • Configuration - SemanticWeb Connector
      • Configuration - ServiceNow Connector
      • Configuration - Web Connector
      • Configuration - Yammer Connector
      • Data Integration Guide with SQL Database by Example
      • Indexing user-specific properties (Documentum)
      • Installation & Configuration - Atlassian Confluence Sitemap Generator Add-On
      • Installation & Configuration - Caching Principal Resolution Service
      • Installation & Configuration - Mindbreeze InSpire Insight Apps in Microsoft SharePoint On-Prem
      • Mindbreeze InSpire Insight Apps in Microsoft SharePoint Online
      • Mindbreeze Web Parts for Microsoft SharePoint
      • User Defined Properties (SharePoint 2013 Connector)
      • Whitepaper - Mindbreeze InSpire Insight Apps in Salesforce
      • Whitepaper - Web Connector - Setting Up Advanced Javascript Usecases
    • Configuration
      • CAS_Authentication
      • Configuration - Alerts
      • Configuration - Alternative Search Suggestions and Automatic Search Expansion
      • Configuration - Back-End Credentials
      • Configuration - Chinese Tokenization Plugin (Jieba)
      • Configuration - CJK Tokenizer Plugin
      • Configuration - Collected Results
      • Configuration - CSV Metadata Mapping Item Transformation Service
      • Configuration - Entity Recognition
      • Configuration - Exporting Results
      • Configuration - External Query Service
      • Configuration - Filter Plugins
      • Configuration - GSA Late Binding Authentication
      • Configuration - Identity Conversion Service - Replacement Conversion
      • Configuration - InceptionImageFilter
      • Configuration - Index-Servlets
      • Configuration - InSpire AI Chat and Insight Services for Retrieval Augmented Generation
      • Configuration - Item Property Generator
      • Configuration - Japanese Language Tokenizer
      • Configuration - Kerberos Authentication
      • Configuration - Management Center Menu
      • Configuration - Metadata Enrichment
      • Configuration - Metadata Reference Builder Plugin
      • Configuration - Mindbreeze Proxy Environment (Remote Connector)
      • Configuration - Personalized Relevance
      • Configuration - Plugin Installation
      • Configuration - Principal Validation Plugin
      • Configuration - Profile
      • Configuration - Reporting Query Logs
      • Configuration - Reporting Query Performance Tests
      • Configuration - Request Header Session Authentication
      • Configuration - Shared Configuration (Windows)
      • Configuration - Vocabularies for Synonyms and Suggest
      • Configuration of Thumbnail Images
      • Cookie-Authentication
      • Documentation - Mindbreeze InSpire
      • I18n Item Transformation
      • Installation & Configuration - Outlook Add-In
      • Installation - GSA Base Configuration Package
      • JWT Authentication
      • Language detection - LanguageDetector Plugin
      • Mindbreeze Personalization
      • Mindbreeze Property Expression Language
      • Mindbreeze Query Expression Transformation
      • SAML-based Authentication
      • Trusted Peer Authentication for Mindbreeze InSpire
      • Using the InSpire Snapshot for Development in a CI_CD Scenario
      • Whitepaper - AI Chat
      • Whitepaper - Create a Google Compute Cloud Virtual Machine InSpire Appliance
      • Whitepaper - Create a Microsoft Azure Virtual Machine InSpire Appliance
      • Whitepaper - Create AWS 10M InSpire Appliance
      • Whitepaper - Create AWS 1M InSpire Appliance
      • Whitepaper - Create AWS 2M InSpire Appliance
      • Whitepaper - Create Oracle Cloud 10M InSpire Application
      • Whitepaper - Create Oracle Cloud 1M InSpire Application
      • Whitepaper - MMC_ Services
      • Whitepaper - Natural Language Question Answering (NLQA)
      • Whitepaper - SSO with Microsoft AAD or AD FS
      • Whitepaper - Text Classification Insight Services
    • Operations
      • Adjusting the InSpire Host OpenSSH Settings - Set LoginGraceTime to 0 (Mitigation for CVE-2024-6387)
      • app.telemetry Statistics Regarding Search Queries
      • CIS Level 2 Hardening - Setting SELinux to Enforcing mode
      • Configuration - app.telemetry dashboards for usage analysis
      • Configuration - Usage Analysis
      • Deletion of Hard Disks
      • Handbook - Backup & Restore
      • Handbook - Command Line Tools
      • Handbook - Distributed Operation (G7)
      • Handbook - Filemanager
      • Handbook - Indexing and Search Logs
      • Handbook - Updates and Downgrades
      • Index Operating Concepts
      • Inspire Diagnostics and Resource Monitoring
      • Provision of app.telemetry Information on G7 Appliances via SNMPv3
      • Restoring to As-Delivered Condition
      • Whitepaper - Administration of Insight Services for Retrieval Augmented Generation
    • User Manual
      • Browser Extension
      • Cheat Sheet
      • iOS App
      • Keyboard Operation
    • SDK
      • api.chat.v1beta.generate Interface Description
      • api.v2.alertstrigger Interface Description
      • api.v2.export Interface Description
      • api.v2.personalization Interface Description
      • api.v2.search Interface Description
      • api.v2.suggest Interface Description
      • api.v3.admin.SnapshotService Interface Description
      • Debugging (Eclipse)
      • Developing an API V2 search request response transformer
      • Developing Item Transformation and Post Filter Plugins with the Mindbreeze SDK
      • Development of a Query Expression Transformer
      • Development of Insight Apps
      • Embedding the Insight App Designer
      • Java API Interface Description
      • OpenAPI Interface Description
    • Release Notes
      • Release Notes 20.1 Release - Mindbreeze InSpire
      • Release Notes 20.2 Release - Mindbreeze InSpire
      • Release Notes 20.3 Release - Mindbreeze InSpire
      • Release Notes 20.4 Release - Mindbreeze InSpire
      • Release Notes 20.5 Release - Mindbreeze InSpire
      • Release Notes 21.1 Release - Mindbreeze InSpire
      • Release Notes 21.2 Release - Mindbreeze InSpire
      • Release Notes 21.3 Release - Mindbreeze InSpire
      • Release Notes 22.1 Release - Mindbreeze InSpire
      • Release Notes 22.2 Release - Mindbreeze InSpire
      • Release Notes 22.3 Release - Mindbreeze InSpire
      • Release Notes 23.1 Release - Mindbreeze InSpire
      • Release Notes 23.2 Release - Mindbreeze InSpire
      • Release Notes 23.3 Release - Mindbreeze InSpire
      • Release Notes 23.4 Release - Mindbreeze InSpire
      • Release Notes 23.5 Release - Mindbreeze InSpire
      • Release Notes 23.6 Release - Mindbreeze InSpire
      • Release Notes 23.7 Release - Mindbreeze InSpire
      • Release Notes 24.1 Release - Mindbreeze InSpire
      • Release Notes 24.2 Release - Mindbreeze InSpire
      • Release Notes 24.3 Release - Mindbreeze InSpire
      • Release Notes 24.4 Release - Mindbreeze InSpire
      • Release Notes 24.5 Release - Mindbreeze InSpire
      • Release Notes 24.6 Release - Mindbreeze InSpire
      • Release Notes 24.7 Release - Mindbreeze InSpire
      • Release Notes 24.8 Release - Mindbreeze InSpire
      • Release Notes 25.1 Release - Mindbreeze InSpire
      • Release Notes 25.2 Release - Mindbreeze InSpire
    • Security
      • Known Vulnerablities
    • Product Information
      • Product Information - Mindbreeze InSpire - Standby
      • Product Information - Mindbreeze InSpire
    Home

    Path

    Sure, you can handle it. But should you?
    Let our experts manage the tech maintenance while you focus on your business.
    See Consulting Packages

    Operations
    Index Operating Concepts

    IntroductionPermanent link for this heading

    This document describes the concepts of Mindbreeze InSpire. These concepts refer on the one hand to standalone operation (with only one appliance), but also to distributed operation (with several appliances).

    GlossaryPermanent link for this heading

    AggregatablePermanent link for this heading

    If a metadatum is aggregatable, it is automatically also regexmatchable, with the added property that the metadatum is available as a facet (filter). A distinction is made between:

    • Static Aggregatable: globally defined per metadatum for the whole index in the index scheme. An index schema change requires a re-inversion of the index.
    • Dynamic Aggregatable: defined per metadatum and per document. Since this is not defined in the index scheme, no re-inversion is necessary. Thus, metadata for certain documents can be made "aggregatable" in a very flexible way.

    Aggregated Metadata KeysPermanent link for this heading

    The "Aggregated Metadata Keys" can be configured per index, whereby the "Advanced Settings" must be activated for this option to be visible. This makes it possible to mark metadata as "aggregatable". Changes in this option entail a change in the index scheme.

    Built-In Metadata KeysPermanent link for this heading

    The following metadata keys are reserved for Built-In metadata:

    Name

    Type

    mes:docid

    Integer

    mes:key

    String

    mes:size

    Integer

    category

    String

    fqcategory

    String

    categoryclass

    String

    categoryscope

    String

    mes:date

    String

    title

    String

    datasource/mes:key

    String

    datasource/category

    String

    datasource/fqcategory

    String

    extension

    String

    mes:boost

    Float

    mes:uniformdocid

    Integer

    RegexmatchablePermanent link for this heading

    • Regexmatchable metadata can be searched with RegEx (relevant for custom search clients, see api.v2.search).

    Category / Category Instance / Fully Qualified CategoryPermanent link for this heading

    The "Category", "Category Instance" and "Fully Qualified Category" are described in the table below:

    Name

    Metadata

    Description

    Category

    datasource/category

    Documents that are indexed by a particular crawler always have the same category. This is therefore not configurable.

    Category Instance

    datasource/categoryinstance

    The Category Instance can be configured for most crawlers so that they set the Category Instance for their crawled documents.

    Fully Qualified Category

    datasource/fqcategory

    The Fully Qualified Category is generated by combining the Category and Category Instance (with a colon in the middle, e.g. Web:Default). This must be unique for each crawler if each crawler in the search client is to receive its own filter value for the Source filter.

    Index Document InfoPermanent link for this heading

    The part of the index that is available in memory for analysis is named Document Info. The Document Info Zones (properties) can be controlled using the Category Descriptor, the Semantic Pipeline or the Aggregated Metadata Keys.

    Index Document Info Schema (Index Schema)Permanent link for this heading

    The characteristic value which properties are available via the Document Info is also called the Document Info Schema.

    Index ConfigurationPermanent link for this heading

    The index configuration includes everything that configures the index. The index configuration is stored in the index file system.

    Index schema changePermanent link for this heading

    A schema change results in a document info reinversion. The following list contains examples that cause a schema change:

    • Changes in Aggregated Metadata Keys
    • Changes in Category Descriptor (related to aggregatable and regexmatchable)
    • Precomputed Synthesized Metadata (if aggregateable)
    • Entity recognition

    Index Inversion / Re-InversionPermanent link for this heading

    After a filtered document is stored in the index, it is inverted so that it becomes searchable ("index inversion"). In addition, documents are enriched with metadata during inversion (described in the Semantic Pipeline).

    When a schema is changed, the index is automatically inverted with regard to the document info.

    Full Re-InversionPermanent link for this heading

    Full Re-Inversion not only re-inverts the document info but rebuilds the whole inverted index.

    This can be triggered using the script /opt/mindbreeze/scripts/move_inverted_index.sh.
    It moves the inverted index to a specified backup directory. The inverted index will be rebuilt on the next index startup.
    The index has to be stopped when using this script. After starting the index it is only available after the re-inversion has finished.

    ./move_inverted_index.sh

        --basedir INDEX_DIRECTORY

        --destdir BACKUP_DIRECTORY

        [--category CATEGORY]

        [--bucket BUCKET_NR]

        [--overwrite]

        | --help | -h

    If neither category nor bucket are specified, the inverted index of all categories of all buckets is moved.

    The parameter category restricts this to a specific category

    The parameter bucket restricts this to a specific bucket

    Multi Index LayoutPermanent link for this heading

    A special form of the structure of an index. By default, the "Multi Index Layout" is used for all indexes, which is especially important for distributed operation with multiple Mindbreeze InSpire appliances. See also Handbook - Distributed Operation (G7) - Index Layout.

    Semantic PipelinePermanent link for this heading

    Documents are processed by the crawler or pusher in the semantic pipeline and then indexed. The following steps are performed:

    Filter / Content FilterPermanent link for this heading

    Depending on the file type, the filter forwards documents to the respective content filters. The filtered documents are sent back to the filter so that the filtered documents can possibly be sent back to the respective content filters. An example of this is ZIP documents that must first be unpacked with a content filter and then processed with other content filters. Filters can be configured in the Mindbreeze Management Center under "Configuration" in the tab "Filter" and selected in the tab "Indices" for the respective indices.

    Post FilterPermanent link for this heading

    Using Post Filter, the content of already filtered documents can be processed and modified before the document is sent to the index.

    Precomputed Synthesized MetadataPermanent link for this heading

    Precomputed Synthesized Metadata can be used to generate new metadata based on other metadata. The time when this metadata is to be generated (in the semantic pipeline) can be determined using the “Transformation Pipeline Slot” option. A detailed documentation can be found here.

    Entity RecognitionPermanent link for this heading

    Entity Recognition can be used to generate metadata by recognizing certain patterns in a text (using Regex). For example, date, UNC paths, etc. can be recognized. A detailed documentation can be found here.

    CSV TransformationPermanent link for this heading

    The "CSV Transformation" can also be used to generate metadata. It is possible to compare a value of a metadatum with a value of a certain column in the CSV. If the metadatum value matches the value from the column, you can write the value of another column from the same row into a new metadatum and append it to the result. More information can be found in the CSV-Transformation documentation.

    Item TransformationPermanent link for this heading

    Item transformers are another way to enrich documents with metadata.  Mindbreeze InSpire offers various item transformers, such as the LanguageDetector Plugin.

    Language Detection & Named Entity RecognitionPermanent link for this heading

    With the help of the "Language Detection" integrated in the index, the language of a document can be recognized without an additional plugin.

    The subsequent "Named Entity Recognition (NER)" can identify and classify named entities both in the content and in the metadata of a document. A detailed documentation can be found here.

    Download PDF

    • Index Operating Concepts

    Content

    • Introduction
    • Glossary

    Download PDF

    • Index Operating Concepts