Index Servlets
Configuration
Copyright ©
Mindbreeze GmbH, A-4020 Linz, 2024.
All rights reserved. All hardware and software names used are brand names and/or trademarks of their respective manufacturers.
These documents are strictly confidential. The submission and presentation of these documents does not confer any rights to our software, our services and service outcomes, or any other protected rights. The dissemination, publication, or reproduction hereof is prohibited.
For ease of readability, gender differentiation has been waived. Corresponding terms and definitions apply within the meaning and intent of the equal treatment principle for both sexes.
Introduction
In order to call index servlets, an index has to be started first; in addition, the checkbox “Disable Unrestricted Privileged Servlets” for this index needs to be unticked. To do this, go to the desired index in the Indices tab of your Mindbreeze InSpire configuration. You’ll need to make sure the “Advanced Settings” are activated first.
Now you can access the index servlets at:
https://<Appliance>:8443/index/<IndexPortNr>
Index Servlets
Aggregate
You can use this index servlet to obtain aggregated values for documents, such as the number of different document titles. The following options can be configured for this:
- Column name: Specifies the column names of the index that are used for the aggregation.
- Query constraint: Specifies various constraints for the query. For example, filtering by document date. The search is restricted to system metadata (category, categoryinstance, fqcategory, mes:key, datasource/mes:key, url, extension, mes:uniformdocid, mes:date, mes:size, mes:lang, mes:nonfilterable, mes:filteredbymetadataonly, store:modificationdate, store:creationdate).
- Aggregation operator: Four operators can be chosen:
COUNT
SUM
AVG
CONCAT
- Concatenation max value count: Specifies the maximum number of values to be concatenated.
- Concatenation value order: Specifies how the values for the “CONCAT” function are sorted. The following sorting options are available:
- UNORDERED
- ORDERED_DESCENDING
- ORDERED_ASCENDING
- Output format: Specifies the output format. The following formats are available: “csv” and “protobuf_textual”.
- Expand Query: In case Expand Query is set to false (or not provided), the internal query expression transformers are not used. This can be helpful for the following use cases:
- Optional Terms is only active when the internal Term Series to Terms Transformer is active. Setting Expand Query to false therefore disables Optional Terms.
- If Natural Language Question Answering (NLQA) is enabled, all query expressions will be automatically transformed to similarity expressions by default. Setting Expand Query to false therefore disables similarity search and falls back to an ordinary keyword search.
Documents
You can use this index servlet to search individual documents by their docID, key, or UniformItemID.
- By docID: Here you can enter the docID of the desired document.
- By key: Here you can enter the key of the desired document.
- By UnifomItemID: Here you can enter the UniformItemID of the document.
- Output format: You can select the format in which the document is to be output and what content is to be displayed.
- Deleted documents: Here you can specify whether deleted documents should also be included in the search.
Find
You can use this index servlet to send search queries to search for documents.
- Query: Specifies a search query like in the standard Mindbreeze search window. The search is restricted to system metadata (category, categoryinstance, fqcategory, mes:key, datasource/mes:key, url, extension, mes:uniformdocid, mes:date, mes:size, mes:lang, mes:nonfilterable, mes:filteredbymetadataonly, store:modificationdate, store:creationdate).
- Order by: Specifies the criteria by which the documents are sorted.
- Order direction: Specifies the direction by which the documents are sorted. Documents can be sorted by ascending or descending order.
- Group by: Specifies if the documents should be grouped and according to which criteria.
- Group by parent reference: Enables the grouping by references.
- Group by parent reference mode: Defines how far to reference.
- Summarize by property: Specifies the property by which documents are combined, such as a name or file extension.
- Order summarized by: Specifies the sort order of the summary.
- Order direction: Specifies the direction of the sort order. Summary can be sorted by ascending or descending order
- Output format: Specifies the output format in which the results are displayed.
- Expand Query:
In case Expand Query is set to false (or not provided), the internal query expression transformers are not used. This can be helpful for the following use cases:- Optional Terms is only active when the internal Term Series to Terms Transformer is active. Setting Expand Query to false disables Optional Terms.
- If Natural Language Question Answering (NLQA) is enabled, all query expressions will be automatically transformed to similarity expressions by default. Setting Expand Query to false disables similarity search and falls back to an ordinary keyword search.
- Diacritic similar terms: Specifies whether diacritical entries should also be included in a search. For example, a search for “possibel” will also include “possible.”
- Requested Properties (CSV): Specifies which specific document properties are to be searched. When multiple properties are to be searched, the properties need to be separated by semicolons like for CSV.
Statistics
Here you can call up statistics for the current index.
- Detail level: You can specify the level of detail in which the statistics will be output.
- Output format: Here you can specify the format in which you want the statistics to be output.
ProcessItems
A detailed instruction manual for the index servlet “processitems” can be found in Configuration - Metadata Enrichment - Privileged Servlets.
Wait
The servlet is used to check the status of the index, especially for scripts, as the connection to the servlet is maintained until the index is ready. The index configuration option "Wait for Event Servlet Update Status Interval (seconds)" determines in which intervals an update is sent. The content type of the response is "text/event-stream" and is set in the header. The data part is a JSON formatted plain text whose content "finished" will be “true” in the last message.
The option "totalBucketCount" returns the number of buckets in the index. This can increase when new documents are being indexed. The options "invertingCompleteFraction", "mergingCompleteFraction" and "totalCompleteFraction" indicate the completeness in fractions and are converted to percentages by the multiplication with 100.
Examples:
data: {"event":"all_finished","finished":false,"invertingCompleteFraction":0.5,"mergingCompleteFraction":0.5,"totalBucketCount":2,"totalCompleteFraction":0.5}
data: {"event":"all_finished","finished":false,"invertingCompleteFraction":1.0,"mergingCompleteFraction":0.5,"totalBucketCount":2,"totalCompleteFraction":0.5}
data: {"event":"all_finished","finished":true,"invertingCompleteFraction":1.0,"mergingCompleteFraction":1.0,"totalBucketCount":2,"totalCompleteFraction":1.0}
The following URL parameters can be used:
- event: Restricts what the index should wait for. The values all_finished and inverting_finished are valid:
- inverting_finished: Waits for inverting only.
- all_finished: Waits for inverting and merging.
- update_interval: Sets a timeout after which an update is written. If this parameter is not set, the "Wait for Event Servlet Update Status Interval (seconds)" from the index configuration is used. The minimum interval is 5 seconds.
- Note: The query is not runtime intensive, but the threads must be synchronised (lock mutex), therefore it is recommended to set this option as high as possible.
See also “Section: Inverter Settings” in Documentation - Mindbreeze InSpire - "Indices" Tab.
Indexingstatus
The servlet sends regular information about the indexing status of the documents in the index. The content type of the response is "text/event-stream" and is set in the header. The data part is a JSON formatted plain with different types of messages:
- The first message upon connecting is a summary of the status of the index. It contains a field named “index”, whose value is a list of buckets, with information about their stored documents.
- When new documents are inverted, a message is emitted. It contains a field named “item”, whose value contains a field named “itemHeaders”, whose value is a list of newly indexed documents (given with key, category, category instance and document ID).
The messages also report the status of the items, currently only “searchable” is supported.
In addition, since the servlet needs to remain open to receive the messages, messages are automatically sent after a period of inactivity. The default timeout for this to happen is 20 seconds, and can be configured with the url parameter idle_event_timeout_ms.
Example:
data: {"statusLevel":"LEVEL_SEARCHABLE","index":{"bucketStatus":[{"bucketId":"0","firstSequenceNr":"0","lastSequenceNr":"50","itemCount":"50","isCurrentBucket":true}]},"statusCode":"OK"}
data: {"statusLevel":"LEVEL_SEARCHABLE","item":{"itemHeaders":[{"category":"Web","categoryInstance":"webtest","key":"https://www.mindbreeze.com/omicron-webinar-3","sequenceNr":"52"}]}}
data: {"statusLevel":"LEVEL_SEARCHABLE","item":{"itemHeaders":[{"category":"Web","categoryInstance":"webtest","key":"https://www.mindbreeze.com/reference-csc.html","sequenceNr":"54"},{"category":"Web","categoryInstance":"webtest","key":"https://www.mindbreeze.com/egovernment.html","sequenceNr":"56"}]}}
data: {"statusLevel":"LEVEL_SEARCHABLE","idle":{"waitForEventTimeoutEllapsedMs":"20000"}}
The servlet needs to be activated in the index settings. See also the table “Section: Inverter Settings” in Documentation - Mindbreeze InSpire - Index Service Settings.