Home
Home
German Version
Support
Impressum
25.2 Release ►

Start Chat with Collection

    Main Navigation

    • Preparation
      • Connectors
      • Create an InSpire VM on Hyper-V
      • Initial Startup for G7 appliances
      • Setup InSpire G7 primary and Standby Appliances
    • Datasources
      • Configuration - Atlassian Confluence Connector
      • Configuration - Best Bets Connector
      • Configuration - Box Connector
      • Configuration - COYO Connector
      • Configuration - Data Integration Connector
      • Configuration - Documentum Connector
      • Configuration - Dropbox Connector
      • Configuration - Egnyte Connector
      • Configuration - GitHub Connector
      • Configuration - Google Drive Connector
      • Configuration - GSA Adapter Service
      • Configuration - HL7 Connector
      • Configuration - IBM Connections Connector
      • Configuration - IBM Lotus Connector
      • Configuration - Jira Connector
      • Configuration - JVM Launcher Service
      • Configuration - LDAP Connector
      • Configuration - Microsoft Azure Principal Resolution Service
      • Configuration - Microsoft Dynamics CRM Connector
      • Configuration - Microsoft Exchange Connector
      • Configuration - Microsoft File Connector (Legacy)
      • Configuration - Microsoft File Connector
      • Configuration - Microsoft Graph Connector
      • Configuration - Microsoft Loop Connector
      • Configuration - Microsoft Project Connector
      • Configuration - Microsoft SharePoint Connector
      • Configuration - Microsoft SharePoint Online Connector
      • Configuration - Microsoft Stream Connector
      • Configuration - Microsoft Teams Connector
      • Configuration - Salesforce Connector
      • Configuration - SCIM Principal Resolution Service
      • Configuration - SemanticWeb Connector
      • Configuration - ServiceNow Connector
      • Configuration - Web Connector
      • Configuration - Yammer Connector
      • Data Integration Guide with SQL Database by Example
      • Indexing user-specific properties (Documentum)
      • Installation & Configuration - Atlassian Confluence Sitemap Generator Add-On
      • Installation & Configuration - Caching Principal Resolution Service
      • Installation & Configuration - Mindbreeze InSpire Insight Apps in Microsoft SharePoint On-Prem
      • Mindbreeze InSpire Insight Apps in Microsoft SharePoint Online
      • Mindbreeze Web Parts for Microsoft SharePoint
      • User Defined Properties (SharePoint 2013 Connector)
      • Whitepaper - Mindbreeze InSpire Insight Apps in Salesforce
      • Whitepaper - Web Connector - Setting Up Advanced Javascript Usecases
    • Configuration
      • CAS_Authentication
      • Configuration - Alerts
      • Configuration - Alternative Search Suggestions and Automatic Search Expansion
      • Configuration - Back-End Credentials
      • Configuration - Chinese Tokenization Plugin (Jieba)
      • Configuration - CJK Tokenizer Plugin
      • Configuration - Collected Results
      • Configuration - CSV Metadata Mapping Item Transformation Service
      • Configuration - Entity Recognition
      • Configuration - Exporting Results
      • Configuration - External Query Service
      • Configuration - Filter Plugins
      • Configuration - GSA Late Binding Authentication
      • Configuration - Identity Conversion Service - Replacement Conversion
      • Configuration - InceptionImageFilter
      • Configuration - Index-Servlets
      • Configuration - InSpire AI Chat and Insight Services for Retrieval Augmented Generation
      • Configuration - Item Property Generator
      • Configuration - Japanese Language Tokenizer
      • Configuration - Kerberos Authentication
      • Configuration - Management Center Menu
      • Configuration - Metadata Enrichment
      • Configuration - Metadata Reference Builder Plugin
      • Configuration - Mindbreeze Proxy Environment (Remote Connector)
      • Configuration - Personalized Relevance
      • Configuration - Plugin Installation
      • Configuration - Principal Validation Plugin
      • Configuration - Profile
      • Configuration - Reporting Query Logs
      • Configuration - Reporting Query Performance Tests
      • Configuration - Request Header Session Authentication
      • Configuration - Shared Configuration (Windows)
      • Configuration - Vocabularies for Synonyms and Suggest
      • Configuration of Thumbnail Images
      • Cookie-Authentication
      • Documentation - Mindbreeze InSpire
      • I18n Item Transformation
      • Installation & Configuration - Outlook Add-In
      • Installation - GSA Base Configuration Package
      • JWT Authentication
      • Language detection - LanguageDetector Plugin
      • Mindbreeze Personalization
      • Mindbreeze Property Expression Language
      • Mindbreeze Query Expression Transformation
      • SAML-based Authentication
      • Trusted Peer Authentication for Mindbreeze InSpire
      • Using the InSpire Snapshot for Development in a CI_CD Scenario
      • Whitepaper - AI Chat
      • Whitepaper - Create a Google Compute Cloud Virtual Machine InSpire Appliance
      • Whitepaper - Create a Microsoft Azure Virtual Machine InSpire Appliance
      • Whitepaper - Create AWS 10M InSpire Appliance
      • Whitepaper - Create AWS 1M InSpire Appliance
      • Whitepaper - Create AWS 2M InSpire Appliance
      • Whitepaper - Create Oracle Cloud 10M InSpire Application
      • Whitepaper - Create Oracle Cloud 1M InSpire Application
      • Whitepaper - MMC_ Services
      • Whitepaper - Natural Language Question Answering (NLQA)
      • Whitepaper - SSO with Microsoft AAD or AD FS
      • Whitepaper - Text Classification Insight Services
    • Operations
      • Adjusting the InSpire Host OpenSSH Settings - Set LoginGraceTime to 0 (Mitigation for CVE-2024-6387)
      • app.telemetry Statistics Regarding Search Queries
      • CIS Level 2 Hardening - Setting SELinux to Enforcing mode
      • Configuration - app.telemetry dashboards for usage analysis
      • Configuration - Usage Analysis
      • Deletion of Hard Disks
      • Handbook - Backup & Restore
      • Handbook - Command Line Tools
      • Handbook - Distributed Operation (G7)
      • Handbook - Filemanager
      • Handbook - Indexing and Search Logs
      • Handbook - Updates and Downgrades
      • Index Operating Concepts
      • Inspire Diagnostics and Resource Monitoring
      • Provision of app.telemetry Information on G7 Appliances via SNMPv3
      • Restoring to As-Delivered Condition
      • Whitepaper - Administration of Insight Services for Retrieval Augmented Generation
    • User Manual
      • Browser Extension
      • Cheat Sheet
      • iOS App
      • Keyboard Operation
    • SDK
      • api.chat.v1beta.generate Interface Description
      • api.v2.alertstrigger Interface Description
      • api.v2.export Interface Description
      • api.v2.personalization Interface Description
      • api.v2.search Interface Description
      • api.v2.suggest Interface Description
      • api.v3.admin.SnapshotService Interface Description
      • Debugging (Eclipse)
      • Developing an API V2 search request response transformer
      • Developing Item Transformation and Post Filter Plugins with the Mindbreeze SDK
      • Development of a Query Expression Transformer
      • Development of Insight Apps
      • Embedding the Insight App Designer
      • Java API Interface Description
      • OpenAPI Interface Description
    • Release Notes
      • Release Notes 20.1 Release - Mindbreeze InSpire
      • Release Notes 20.2 Release - Mindbreeze InSpire
      • Release Notes 20.3 Release - Mindbreeze InSpire
      • Release Notes 20.4 Release - Mindbreeze InSpire
      • Release Notes 20.5 Release - Mindbreeze InSpire
      • Release Notes 21.1 Release - Mindbreeze InSpire
      • Release Notes 21.2 Release - Mindbreeze InSpire
      • Release Notes 21.3 Release - Mindbreeze InSpire
      • Release Notes 22.1 Release - Mindbreeze InSpire
      • Release Notes 22.2 Release - Mindbreeze InSpire
      • Release Notes 22.3 Release - Mindbreeze InSpire
      • Release Notes 23.1 Release - Mindbreeze InSpire
      • Release Notes 23.2 Release - Mindbreeze InSpire
      • Release Notes 23.3 Release - Mindbreeze InSpire
      • Release Notes 23.4 Release - Mindbreeze InSpire
      • Release Notes 23.5 Release - Mindbreeze InSpire
      • Release Notes 23.6 Release - Mindbreeze InSpire
      • Release Notes 23.7 Release - Mindbreeze InSpire
      • Release Notes 24.1 Release - Mindbreeze InSpire
      • Release Notes 24.2 Release - Mindbreeze InSpire
      • Release Notes 24.3 Release - Mindbreeze InSpire
      • Release Notes 24.4 Release - Mindbreeze InSpire
      • Release Notes 24.5 Release - Mindbreeze InSpire
      • Release Notes 24.6 Release - Mindbreeze InSpire
      • Release Notes 24.7 Release - Mindbreeze InSpire
      • Release Notes 24.8 Release - Mindbreeze InSpire
      • Release Notes 25.1 Release - Mindbreeze InSpire
      • Release Notes 25.2 Release - Mindbreeze InSpire
    • Security
      • Known Vulnerablities
    • Product Information
      • Product Information - Mindbreeze InSpire - Standby
      • Product Information - Mindbreeze InSpire
    Home

    Path

    Sure, you can handle it. But should you?
    Let our experts manage the tech maintenance while you focus on your business.
    See Consulting Packages

    Whitepaper
    Text Classification Insight Service

    Motivation and OverviewPermanent link for this heading

    Text classification with Mindbreeze InSpire has never been easier. Tag a portion of your documents with predefined labels. With the help of Mindbreeze Insight Services and Machine Learning, Mindbreeze InSpire is able to expand your knowledge and store it for future use cases. Based on this knowledge, all other documents can subsequently be classified fully automatically.

    Labeling can be done easily and directly via the Insight app - even without a predefined data set.

    The main steps to perform this use case are:

    1. Preparing the training dataset
      1. Define the possible labels for the documents.
      2. Manual labeling via the Insight App for documents and creation of the dataset to train the classification model.
    2. Training a classification model
    3. Labeling of documents from an index using the classification model in Semantic Pipeline (Item Transformation).

    PreparationPermanent link for this heading

    Overview of the required ServicesPermanent link for this heading

    In order to use text classification, certain configuration steps are necessary. Configure the following services:

    • Prediction Service
    • Text Classification Insight Service

    In addition, you still need to make configuration adjustments in the Client Service and Index Services.

    Details can be found in the next sections.

    Configuring the Prediction ServicePermanent link for this heading

    In Mindbreeze Management Center, navigate to the "Configuration" menu and switch to the "Indices" tab, then add a new service.

    For the additional minimal configuration, fill in the following fields in the following configuration sections:

    Base Path

    This parameter specifies the path to be used by the Prediction Service to get the training/test data and where the models learned by the service should be stored. The basepath is freely selectable.

    Bind Port

    Specifies the TCP port on which the Prediction Service will be accessible. It is important that the port is not already in use by another service (e.g. principal resolution, index or client service).


    If you have more specific use cases, see the Detailed Configuration of Prediction Service section for more information.

    Configuring the Text Classification Insight ServicePermanent link for this heading

    Now add the "Text Classification Insight Service". Assign a "Display Name" again and select the "TextClassificationInsightService" under "Service".

    For a minimal configuration, fill in the following fields in the following configuration sections:

    • Prediction Service
      • URL: the URL of the prediction service. E.g. http://localhost:23910 if you have selected 23910 as "Bind Port" for the Prediction Service.
      • Project ID: the name of the classification project
      • Tenant ID: e.g. the company name or the organizational unit
    • Dataset Index Ports: The ports of the indexes in which the documents to be classified are located.
    • Persisted Resources Feedback Processing
      • JDBC URL, Database Credentials, Database Table Prefix: configure the same values as in "Resource Persistence Settings" in the Client Service.

    If you have more specific use cases, see the Detailed Configuration of Text Classification Insight Service section for more information.

    Other required Configuration ChangesPermanent link for this heading

    In addition to the Prediction Service and the Text Classification Insight Service, you still need changes in the configuration of the ClientService and the Index Services.

    Client ServicePermanent link for this heading

    To enable users to label documents in the standard Insight app, you still need to make configuration changes in the client service.

    Activate the Advanced Settings and configure the Resource Persistence Settings. Then enable Document Labeling by enabling the following option:

    Enable Document Labeling

    Enables labeling in the Insight app. Enable this option (default: disabled).

    You only need to change the other options in the "Document Labeling" configuration section if you have changed certain default values in the Text Classification Insight service:

    Label Property

    The same value as for "Label Property Name" in the Text Classification Insight Service.

    Labeling Feedback Collection

    The same value as for "Feedback Collection" in the Text Classification Insight Service.

    Available Labels Collection

    The same value as for "Label Collection" in the Text Classification Insight Service.

    Index Service(s)Permanent link for this heading

    Documents are classified as they pass through the Semantic Pipeline - more specifically, in the "Item Transformation" step.

    Add the previously created "Text Classification Insight Service" to the index at the "Item Transformation Services". If you are using multiple indexes, repeat this step on all index services.

    Definition of Labels and Manual LabelingPermanent link for this heading

    When the configuration is complete, you can define labels. These labels can be used by users to identify documents in the Insight app.

    Defining the LabelsPermanent link for this heading

    In the Mindbreeze Management Center, navigate to the "Insight Services" "Text Classification" menu. Then click on "Edit" at "Label Definitions".

    Now define your labels according to which you want to classify your documents.

    Define translations for the languages you want to support in your Insight app. If there is no translation for languages, the ID will be used for display in the Insight app. With the "Save" button you can confirm your entries.

    More details:

    • If "Ignored" is checked, documents with this label will be ignored when training the model.
    • You can delete labels with the trash icon.
    • Please note that you do not change the ID of labels if they are already assigned to certain documents. (The assigned label IDs on the document remain unchanged, which leads to negative effects).

    Manual LabelingPermanent link for this heading

    Now users have the option to label documents with the labels they have just defined. After searching in the Insight app, the found documents can be labeled by selecting the desired label from the drop-down menu.

    Logged in users can read and assign labels. Anonymous users who are not logged in to the Insight app can read (automatically assigned) labels. (Manually assigned labels are not visible to anonymous users).

    If multiple users assign labels for the same document, all assignments are saved, but effectively only the label of the last assignment is used.

    Users also have the option to remove their own feedback again (trash icon). If the document was previously labeled by another user, the previous label is now effective.

    Creating / Updating the Training DatasetPermanent link for this heading

    If the required documents have now been marked with labels, you can create the training data set that will later be used as the data basis for creating the model. To do this, navigate to the "Insight Services" "Text Classification" menu in the Mindbreeze Management Center. Then click on "Edit" at "Labeled Data".

    You can now check whether the users have manually labeled the documents correctly. If labels were assigned incorrectly, these assignments can be changed here or even ignored. Then click on "Create or Update Dataset" to save your changes and create the training dataset.

    Preparing Models for Text ClassificationPermanent link for this heading

    In the next steps, a model can now be created and tested from the training data set.

    Train the modelPermanent link for this heading

    In the Mindbreeze Management Center, navigate to the "Insight Services" "Text Classification" menu. Then click on "Train" under "Models".

    Now click on "Train Model" to train a model. The default parameters are sufficient for most use cases. However, you can also fine-tune them if your use case requires it. The following parameters can be adjusted:

    Label Property

    Must only be changed if the Dataset Label Property Name option has been changed in the Text Classification Insight Service configuration. The value specified here must match the one in the configuration.

    Training query

    A search query to filter documents in the training dataset, which are then used to train the model. If empty, all documents that have content are used for training.

    Train/Test Split

    The division of the data set into training and testing data. E.g.: "0.8" means that 80% of the data is used for training, 20% for testing.

    Token Pattern

    See MindbreezePredictionService - Train and validate a model "token_pattern".

    If "Custom Regex" is selected, the "Custom Pattern" field appears, in which a custom regex can be specified

    Word Ngram Length

    See MindbreezePredictionService - Train and validate a model "ngram_length".

    Test and use the modelPermanent link for this heading

    In the next step, you can now test the model to get information about the quality of the model you just trained. Scroll to "Test Model". The model you just trained should already be selected. If you now click on "Test Model", the model will be tested with the test data and you will receive key figures that give you information about the quality of the model, such as "Accuracy".

    Then click on "Set Default" so that this model is used for classification.

    Automated Document LabelingPermanent link for this heading

    As already mentioned, the documents are automatically classified when they pass through the SemanticPipeline - more precisely in the Item Transformation step. Unless explicitly configured otherwise in the service configuration, the default model that you set in the previous step with "Set Default" is used for classification.

    Since the Semantic Pipeline is only run through completely for new or changed documents, only new or changed documents are classified. However, to ensure that documents that have already been indexed are also classified, you have two options, which are described in more detail in the next sections:

    1. Reindex: recommended for small indexes where complete indexing is very fast (e.g. on test systems)
    2. Reinvert: recommended for large indexes, where a complete indexing takes a long time

    ReindexPermanent link for this heading

    If the index is small and a full indexing can be performed very quickly, a re-indexing is recommended to trigger a classification of all documents. To do this, navigate to "Services" in the Mindbreeze Management Center. Then click on the gear icon for the index you want to re-index and then click on "Reindex". As soon as the re-indexing is successfully completed, your documents are classified.

    ReinvertPermanent link for this heading

    If the index is large and a complete indexing takes a long time, a re-inversion is recommended to trigger a classification of all documents. To do this, navigate to "Configuration" in the Management Center and switch to the "Indices" tab. Activate the "Advanced Settings" and change the "Aggregated Metadata Keys". Changing this option will automatically re-invert the index. For example, you can specify "label" which will result in filtering by label in the Insight app. However, you can also specify a non-existent metadatum key, such as "V1". Save the configuration afterwards.

    Once the re-inversion is successfully completed, your documents are classified.

    AppendixPermanent link for this heading

    Iterative Improvement of the ModelPermanent link for this heading

    Once your documents are classified, users can also provide feedback widely in the Insight app and change the labeling of the documents if, for example, the automatic classification was inaccurate and in some cases incorrect (see also ManualLabeling).

    This feedback can then be used to update the training dataset (see Create / Update Training Dataset).

    Afterwards, a new model can be trained (see Train Model), tested and used (see Test and Use Model).

    If a document now changes or a new document is indexed, the new, just trained model is already used for the classification. If you want to classify all documents, including the already indexed documents, with the new, improved model, you must trigger a reindex or reinvert.

    You can perform these steps to iteratively improve the model as many times as you like until you are satisfied with the quality of your classification model.

    Detailed Configuration of the Text Classification Insight Service (Advanced Use Cases)Permanent link for this heading

    This section describes all the options available in the Text Classification Insight service. This section is relevant to you only if you have special use cases that require special configuration.

    Base ConfigurationPermanent link for this heading

    Bind port

    The TCP port of the service

    Max Request Handling Threads

    Maximum number of threads used to process the HTTP server requests.

    Max Feedback Processing Threads (advanced)

    Number of threads used to process the user feedbacks ("Labeled Data").

    Prediction ServicePermanent link for this heading

    URL

    The URL of the Prediction Service. E.g. http://localhost:23910 if you have selected 23910 as "Bind Port" for the Prediction Service.

    Project ID

    The project ID used to structure records in the Prediction Service. Stored in: <PredictionService-Data-Directory>/tenants/<TenantID>/projects/<ProjectID>.

    Tenant ID

    The tenant ID used to structure records in the Prediction Service. Stored in: <PredictionService-Data-Directory>/tenants/<TenantID> /projects/<ProjectID>

    Label Property Name

    The name of the metadatum used for the label property on the document.

    Dataset Label Property Name

    The name of the property in the dataset

    Default Label Value

    Documents that are excluded from classification for certain reasons (e.g. because the "Minimum Content Length" has not been reached) are assigned a default value as a label. This default value can be defined here.

    Model ID (optional)

    If empty, the "Default Model" is used (can be set in the Management Center under "Text Classification" "Models"). However, a model ID can also be explicitly specified here, which will then be used for the classification.

    Additional Labeling Models (optional)

    Here you can specify additional models that will be used in the classification.

    Model ID

    As above, but here is a mandatory field

    Label Property Name

    See above

    Dataset Label Property Name

    See above

    Default Label Value

    See above

    For more details on the Prediction Service, see the Mindbreeze Prediction Service documentation with example text classification.

    Text Classification SourcesPermanent link for this heading

    Content Length Limit (Characters)

    The maximum number of characters of the document content that will be used for classification. If the number of characters exceeds this configured value, the characters beyond it are not used during classification for performance reasons. The value "0" or an empty value disables the character limit.

    Minimum Content Length (Characters) (optional)

    The minimum number of characters of the document content that is required for the document to be classified. Documents that do not meet this requirement are classified with the configured "Default Label Value". The value "0" or an empty value disables this filter.

    Source Metadata Keys (optional)

    By default, only the document content is classified. Additional metadata can be specified here, which will be included in the classification.

    Add annotations

    Should always be enabled

    Training Link Extraction (optional)

    Links in documents (HTML anchor tags) are not included in training and classification by default. In order to include certain links that are meaningful for labeling, rules can be defined here.

    Name

    An arbitrary, unique name that describes the type of links.

    Regex

    A regex pattern for selecting certain links.

    Unique

    If active, this rule is applied only once per document.

    Rule Based Labels (optional)Permanent link for this heading

    Here you can define rules to label certain documents without calling the Prediction Service. For example, you can use it to classify all documents as "Documentation" that contain "Doc" or "Documentation" in the title.

    The first rule that matches a document is always applied. If no rule matches, then the prediction service is used to set the label.

    Property Name

    To select the documents to which the rule will be applied. Those documents are selected for which the "Value Pattern" matches the value of the metadata with the "Property Name" key.

    Value Pattern (Regex)

    See above. Value Pattern is a case-sensitive Java regex (ignored if the pattern starts with (?i)).

    Action

    Which action is to be performed:

    • "Predict Label": Prediction Service is used for labeling (default, even if no rule matches).
    • "Set Label: Sets the label to the value configured in Label Value.

    Label Value

    Only relevant if "Action" is set to "Set Label" (see above)

    Dataset Index PortsPermanent link for this heading

    Dataset Index Port

    The ports of the indices in which the documents to be classified are located

    Persisted Resources Feedback ProcessingPermanent link for this heading

    Configure the same values for the following options as for "Resource Persistence Settings" in the Client Service: "JDBC URL", "Database Credentials", "Database Table Prefix".

    See also Resource Persistence Settings.

    JDBC URL

    see Client Service

    Database Credentials

    see Client Service

    Database Table Prefix

    see Client Service

    Owner Encryption Credential

    If you use Identity Encryption in the Client Service, you must select a credential here. In this case, please select the same credential as in the client service option "Identity Encryption Credential".

    Feedback Collection

    The name of the collection in the "itemdata" persisted resources where user label feedback is stored.

    Label Collection

    The name of the collection in the "labeldefinition" persisted resources where the label definitions are stored.

    CSV Feedback Processing (optional)Permanent link for this heading

    In addition to user feedback (via the Insight app), a CSV file can be used to set labels for documents. These labels are not displayed in the Insight app, but can be used to train the classification model.

    Example:

    Fqcategory;Key;LabelValue;IgnoreFeedback

    Web:helpmindbreeze;http://help.mindbreeze.com/de/index.php?topic=doc/Konfiguration---Microsoft-File-Connector/index.htm;performancetest;false

    Web:helpmindbreeze;http://help.mindbreeze.com/de/index.php?topic=doc/Installation--Konfiguration---Caching-Principal-Resolution-Service/index.htm;performancetest;false

    Enable CSV Processing

    To activate the CSV feedback processing

    CSV File Path

    The path to the CSV file (write permissions required)

    Detailed Configuration of the Prediction Service (Advanced Use Cases)Permanent link for this heading

    This section describes all other special options that are available in the Prediction Service besides the mandatory fields. This section is only relevant for you if you have special use cases that require special configuration. Also, in this section, those options that are not marked by "(Mandatory)" or "(Advanced)" are automatically considered as Advanced.

    Prediction Service ParameterPermanent link for this heading

    Base Path (Mandatory)

    This parameter specifies the path to be used by the prediction service to get the training/test data and the path where the models learned by the service should be stored. The basepath is freely selectable.

    Bind Port (Mandatory)

    Specifies the TCP port on which the Prediction Service will be accessible. It is important that the port is not already in use by another service (cache, index, client,... service).

    Dump Request/Responses (Advanced)

    Here you can specify under which circumstances a dump request/response from the prediction service should be written to the dump path. The following options can be selected:

    "Never" – Never
    "Always" – always

    "On Error" - in case of error

    Dump Path (Advanced)

    Here you can define the path where the dumps are written. Here it is only to be noted that these data lie in the "/data/" partition. The subfolders are self-definable.

    Dataset SettingsPermanent link for this heading

    Dataset Source Query

    This can be used to restrict the training set with a query (e.g.: PDFs only). If the Text Classification Insight Service is used, this setting should be left empty.

    Dataset Source Property

    Currently only “UNIFORM_ITEM_ID” can be selected.

    Train Dataset Source Ratio

    Defines what % of all documents are used for training. If the Text Classification Insight Service is used, this setting should be left empty.

    Label Alias CSV (optional)

    With this extension you can translate the label values if the dataset contains a different value than needed for the classification.

    Tenant ID

    The company name or organizational unit.

    Project ID

    The name of the classification project.

    CSV Path

    Here you can specify the path of the CSV file to rewrite the "SourceLabel" to the desired "DestinationLabel".

    Download PDF

    • Whitepaper - Text Classification Insight Services

    Content

    • Motivation and Overview
    • Preparation
    • Definition of Labels and Manual Labeling
    • Preparing Models for Text Classification
    • Automated Document Labeling
    • Appendix

    Download PDF

    • Whitepaper - Text Classification Insight Services