Home
Home
German Version
Support
Impressum
22.1 Release ►

    Main Navigation

    • Preparation
      • Connectors
      • Initial Startup for G7 appliances
      • Setup InSpire G7 primary and Standby Appliances
    • Datasources
      • Configuration - Atlassian Confluence Connector
      • Configuration - Best Bets Connector
      • Configuration - COYO Connector
      • Configuration - Data Integration Connector
      • Configuration - Documentum Connector
      • Configuration - Dropbox Connector
      • Configuration - Egnyte Connector
      • Configuration - GitHub Connector
      • Configuration - Google Drive Connector
      • Configuration - GSA Adapter Service
      • Configuration - HL7 Connector
      • Configuration - IBM Connections Connector
      • Configuration - IBM Lotus Connector
      • Configuration - Jira Connector
      • Configuration - JiveSoftware Jive Connector
      • Configuration - JVM Launcher Service
      • Configuration - LDAP Connector
      • Configuration - Microsoft Azure Principal Resolution Service
      • Configuration - Microsoft Dynamics CRM Connector
      • Configuration - Microsoft Exchange Connector
      • Configuration - Microsoft File Connector (Legacy)
      • Configuration - Microsoft File Connector
      • Configuration - Microsoft Graph Connector
      • Configuration - Microsoft Project Connector
      • Configuration - Microsoft SharePoint Connector
      • Configuration - Microsoft Stream Connector
      • Configuration - Microsoft Teams Connector
      • Configuration - Salesforce Connector
      • Configuration - SAP KMC Connector
      • Configuration - SemanticWeb Connector
      • Configuration - ServiceNow Connector
      • Configuration - SharePoint Online Connector
      • Configuration - Sitecore Connector
      • Configuration - Web Connector
      • Configuration - Yammer Connector
      • Data Integration Guide with SQL Database by Example
      • Indexing user-specific properties (Documentum)
      • Installation & Configuration - Atlassian Confluence Sitemap Generator Add-On
      • Installation & Configuration - Caching Principal Resolution Service
      • Installation & Configuration - Jive Sitemap Generator
      • Installation & Configuration - Mindbreeze InSpire Insight Apps in Microsoft SharePoint On-Prem
      • Mindbreeze InSpire Insight Apps in Microsoft SharePoint Online
      • Mindbreeze Web Parts for Microsoft SharePoint
      • User Defined Properties (SharePoint 2013 Connector)
      • Whitepaper - Mindbreeze InSpire Insight Apps in Salesforce
    • Configuration
      • CAS_Authentication
      • Cognito JWT Authentication
      • Configuration - Alternative Search Suggestions and Automatic Search Expansion
      • Configuration - Back-End Credentials
      • Configuration - Chinese Tokenization Plugin (Jieba)
      • Configuration - CJK Tokenizer Plugin
      • Configuration - Collected Results
      • Configuration - CSV Metadata Mapping Item Transformation Service
      • Configuration - Entity Recognition
      • Configuration - Exporting Results
      • Configuration - Filter Plugins
      • Configuration - GSA Late Binding Authentication
      • Configuration - Identity Conversion Service - Replacement Conversion
      • Configuration - Index-Servlets
      • Configuration - Item Property Generator
      • Configuration - Japanese Language Tokenizer
      • Configuration - Kerberos Authentication
      • Configuration - Management Center Menu
      • Configuration - Metadata Enrichment
      • Configuration - Metadata Reference Builder Plugin
      • Configuration - Mindbreeze Proxy Environment (Remote Connector)
      • Configuration - Notifications
      • Configuration - Personalized Relevance
      • Configuration - Plugin Installation
      • Configuration - Principal Validation Plugin
      • Configuration - Profile
      • Configuration - QueryExpr Label Transformer Service
      • Configuration - Reporting Query Logs
      • Configuration - Reporting Query Performance Tests
      • Configuration - Request Header Session Authentication
      • Configuration - Shared Configuration (Windows)
      • Configuration - Vocabularies for Synonyms and Suggest
      • Configuration of Thumbnail Images
      • Cookie-Authentication
      • Documentation - Mindbreeze InSpire
      • I18n Item Transformation
      • Installation & Configuration - Outlook Add-In
      • Installation - GSA Base Configuration Package
      • Language detection - LanguageDetector Plugin
      • Mindbreeze Personalization
      • Mindbreeze Property Expression Language
      • Mindbreeze Query Expression Transformation
      • Non-Inverted Metadata Item Transformer
      • SAML-based Authentication
      • Trusted Peer Authentication for Mindbreeze InSpire
      • Using the InSpire Snapshot for Development in a CI_CD Scenario
      • Whitepaper - SSO with Microsoft AAD or AD FS
      • Whitepaper - Text Classification Insight Services
    • Operations
      • app.telemetry Statistics Regarding Search Queries
      • Configuration - app.telemetry dashboards for usage analysis
      • Configuration Usage Analysis
      • Deletion of Hard Disks
      • Handbook - Backup & Restore
      • Handbook - Command Line Tools
      • Handbook - Distributed Operation (G7)
      • Handbook - Filemanager
      • Handbook - Indexing and Search Logs
      • Handbook - Updates and Downgrades
      • Index Operating Concepts
      • Inspire Diagnostics and Resource Monitoring
      • InSpire Support Documentation
      • Mindbreeze InSpire SFX Update
      • Provision of app.telemetry Information on G7 Appliances via SNMPv3
      • Restoring to As-Delivered Condition
    • User Manual
      • Cheat Sheet
      • iOS App
      • Keyboard Operation
    • SDK
      • api.v2.alertstrigger Interface Description
      • api.v2.export Interface Description
      • api.v2.personalization Interface Description
      • api.v2.search Interface Description
      • api.v2.suggest Interface Description
      • api.v3.admin.SnapshotService Interface Description
      • Debugging (Eclipse)
      • Developing an API V2 search request response transformer
      • Developing Item Transformation and Post Filter Plugins with the Mindbreeze SDK
      • Development of Insight Apps
      • Embedding the Insight App Designer
      • Java API Interface Description
    • Release Notes
      • Release Notes 20.1 Release - Mindbreeze InSpire
      • Release Notes 20.2 Release - Mindbreeze InSpire
      • Release Notes 20.3 Release - Mindbreeze InSpire
      • Release Notes 20.4 Release - Mindbreeze InSpire
      • Release Notes 20.5 Release - Mindbreeze InSpire
      • Release Notes 21.1 Release - Mindbreeze InSpire
      • Release Notes 21.2 Release - Mindbreeze InSpire
      • Release Notes 21.3 Release - Mindbreeze InSpire
      • Release Notes 22.1 Release - Mindbreeze InSpire
    • Security
      • Known Vulnerablities
    • Product Information
      • Product Information - Mindbreeze InSpire - Standby
      • Product Information - Mindbreeze InSpire
    Home

    Path

    Sure, you can handle it. But should you?
    Let our experts manage the tech maintenance while you focus on your business.
    See Consulting Packages

    Index Operating Concepts

    Copyright ©

    Mindbreeze GmbH, A-4020 Linz, 2022.

    All rights reserved. All hardware and software names used are brand names and/or trademarks of their respective manufacturers.

    These documents are strictly confidential. The submission and presentation of these documents does not confer any rights to our software, our services, and service outcomes, or any other protected rights. The dissemination, publication, or reproduction hereof is prohibited.

    For ease of readability, gender differentiation has been waived. Corresponding terms and definitions apply within the meaning and intent of the equal treatment principle for both sexes.

    IntroductionPermanent link for this heading

    This document describes the concepts of Mindbreeze InSpire. These concepts refer on the one hand to standalone operation (with only one appliance), but also to distributed operation (with several appliances).

    GlossaryPermanent link for this heading

    AggregatablePermanent link for this heading

    If a metadate is aggregatable, it is automatically also regexmatchable, with the added property that the metadate is available as a facet (filter). A distinction is made between:

    • Static Aggregatable: globally defined per metadata for the whole index in the index scheme. An index schema change requires a re-inversion of the index.
    • Dynamic Aggregatable: defined per metadata and per document. Since this is not defined in the index scheme, no re-inversion is necessary. Thus, metadata for certain documents can be made "aggregatable" in a very flexible way.

    Aggregated Metadata KeysPermanent link for this heading

    The "Aggregated Metadata Keys" can be configured per index, whereby the "Advanced Settings" must be activated for this option to be visible. This makes it possible to mark metadata as "aggregatable". Changes in this option entail a change in the index scheme.

    RegexmatchablePermanent link for this heading

    • Regexmatchable metadata can be searched with RegEx (relevant for custom search clients, see api.v2.search).

    Category / Category Instance / Fully Qualified CategoryPermanent link for this heading

    The "Category", "Category Instance" and "Fully Qualified Category" are described in the table below:

    Name

    Metadata

    Description

    Category

    datasource/category

    Documents that are indexed by a particular crawler always have the same category. This is therefore not configurable.

    Category Instance

    datasource/categoryinstance

    The Category Instance can be configured for most crawlers so that they set the Category Instance for their crawled documents.

    Fully Qualified Category

    datasource/fqcategory

    The Fully Qualified Category is generated by combining the Category and Category Instance (with a colon in the middle, e.g. Web:Default). This must be unique for each crawler if each crawler in the search client is to receive its own filter value for the Source filter.

    Index Document InfoPermanent link for this heading

    The part of the index that is available in memory for analysis is named Document Info. The Document Info Zones (properties) can be controlled using the Category Descriptor, the Semantic Pipeline or the Aggregated Metadata Keys.

    Index Document Info Schema (Index Schema)Permanent link for this heading

    The characteristic value which properties are available via the Document Info is also called the Document Info Schema.

    Index ConfigurationPermanent link for this heading

    The index configuration includes everything that configures the index. The index configuration is stored in the index file system.

    Index schema changePermanent link for this heading

    A schema change results in a document info reinversion. The following list contains examples that cause a schema change:

    • Changes in Aggregated Metadata Keys
    • Changes in Category Descriptor (related to aggregatable and regexmatchable)
    • Precomputed Synthsized Metadata (wenn aggregateable)
    • Entity recognition

    Index Inversion / Re-InversionPermanent link for this heading

    After a filtered document is stored in the index, it is inverted so that it becomes searchable ("index inversion"). In addition, documents are enriched with metadata during inversion (described in the Semantic Pipeline).

    When a schema is changed, the index is automatically inverted with regard to the document info.

    Full Re-InversionPermanent link for this heading

    Full Re-Inversion not only re-inverts the document info but rebuilds the whole inverted index.

    This can be triggered using the script /opt/mindbreeze/scripts/move_inverted_index.sh.
    It moves the inverted index to a specified backup directory. The inverted index will be rebuilt on the next index startup.
    The index has to be stopped when using this script. After starting the index it is only available after the re-inversion has finished.

    ./move_inverted_index.sh

        --basedir INDEX_DIRECTORY

        --destdir BACKUP_DIRECTORY

        [--category CATEGORY]

        [--bucket BUCKET_NR]

        [--overwrite]

        | --help | -h

    If neither category nor bucket is specified, the inverted index of all categories of all buckets is moved.

    The parameter category restricts this to a specific category

    The parameter bucket restricts this to a specific bucket

    Multi Index LayoutPermanent link for this heading

    A special form of the structure of an index. By default, the "Multi Index Layout" is used for all indexes, which is especially important for distributed operation with multiple Mindbreeze InSpire appliances. See also Handbook - Distributed Operation (G7) - Index Layout.

    Semantic PipelinePermanent link for this heading

    Documents are processed by the crawler or pusher in the semantic pipeline and then indexed. The following steps are performed:

    Filter / Content FilterPermanent link for this heading

    Depending on the file type, the filter forwards documents to the respective content filters. The filtered documents are sent back to the filter so that the filtered documents can possibly be sent back to the respective content filters. An example of this is ZIP documents that must first be unpacked with a content filter and then processed with other content filters. Filters can be configured in the Mindbreeze Management Center under "Configuration" in the tab "Filter" and selected in the tab "Indices" for the respective indices.

    Post FilterPermanent link for this heading

    Using Post Filter, the content of already filtered documents can be processed and modified before the document is sent to the index.

    Precomputed Synthesized MetadataPermanent link for this heading

    Precomputed Synthesized Metadata can be used to generate new metadata based on other metadata. The time when this metadata is to be generated (in the semantic pipeline) can be determined using the “Transformation Pipeline Slot” option. A detailed documentation can be found here.

    Entity RecognitionPermanent link for this heading

    Entity Recognition can be used to generate metadata by recognizing certain patterns from a text (using Regex). For example, date, UNC paths, etc. can be recognized. A detailed documentation can be found here.

    CSV TransformationPermanent link for this heading

    The "CSV Transformation" can also be used to generate metadata. It is possible to compare a value of a metadata with a value of a certain column in the CSV. If the value from the metadata matches the value from the column, you can write the value of another column from the same row into a new metadata and append it to the result. More information can be found in the CSV-Transformation documentation.

    Item TransformationPermanent link for this heading

    Item transformers are another way to enrich documents with metadata.  Mindbreeze InSpire offers various item transformers, such as the LanguageDetector Plugin.

    Download PDF

    • Index Operating Concepts

    Content

    • Introduction
    • Glossary

    Download PDF

    • Index Operating Concepts